CN115713535A

CN115713535A - Image segmentation model determination method and image segmentation method

Info

Publication number: CN115713535A
Application number: CN202211386108.XA
Authority: CN
Inventors: 许敏丰; 郭恒; 张剑锋
Original assignee: Alibaba China Co Ltd
Current assignee: Alibaba China Co Ltd
Priority date: 2022-11-07
Filing date: 2022-11-07
Publication date: 2023-02-24

Abstract

The embodiment of the specification provides an image segmentation model determining method and an image segmentation method, wherein the image segmentation model determining method comprises the steps of determining a first image sample set and a second image sample set containing a target object; determining a feature extraction model according to a first image sample in a first image sample set; inputting the second image sample into the feature extraction model to obtain a second feature image of the second image sample; and determining an image segmentation model according to the second characteristic image and the sample label. According to the method, a feature extraction model capable of realizing multi-scale image feature extraction is trained through a first image sample set without labels, so that the feature extraction model can learn rich and high-level semantic information of a first image sample; then, the feature extraction model is used as a feature extractor, and an image segmentation model is obtained through training by combining a small number of second image sample sets with labels, so that accurate image segmentation can be subsequently performed on the image segmentation model.

Description

Image segmentation model determination method and image segmentation method

Technical Field

The embodiment of the specification relates to the technical field of computers, in particular to an image segmentation model determining method.

Background

Organ segmentation based on medical images plays an important role in preoperative planning, radiotherapy planning, disease early screening and the like, but due to factors such as performance difference of medical equipment, difference of imaging parameters, difference between patients, difference of lesion forms and the like, diversity of medical image data is complex, labeling of the diversity data is time-consuming and labor-consuming, and a professional medical worker is required to label the diversity data, so that the cost is high.

Therefore, when a large amount of relatively complex non-labeled medical image data is faced, how to accurately segment human tissue data in the medical image data by using the large amount of non-labeled medical image data and a small amount of labeled medical image data is a technical problem which is urgently needed to be solved at present.

Disclosure of Invention

In view of this, the embodiments of the present specification provide two image segmentation model determination methods. One or more embodiments of the present specification relate to two image segmentation model determination apparatuses, an image segmentation method, an image segmentation apparatus, a computing device, a computer-readable storage medium, and a computer program, so as to solve technical drawbacks of the related art.

According to a first aspect of embodiments of the present specification, there is provided an image segmentation model determination method, including:

determining a first image sample set and a second image sample set containing a target object, wherein the second image sample set comprises a second image sample and a sample label corresponding to the second image sample;

determining a feature extraction model according to a first image sample in the first image sample set, wherein the feature extraction model comprises an encoder, the encoder comprises at least two encoding layers for extracting image features with different scales, and a dictionary learning-based vector quantization module arranged behind at least one encoding layer;

inputting the second image sample into the feature extraction model to obtain a second feature image of the second image sample;

and determining an image segmentation model according to the second characteristic image and the sample label.

According to a second aspect of embodiments of the present specification, there is provided an image segmentation model determination apparatus including:

a sample determination module configured to determine a first set of image samples containing a target object and a second set of image samples, wherein the second set of image samples includes a second image sample and a sample label corresponding to the second image sample;

a first model determining module configured to determine a feature extraction model according to a first image sample in the first image sample set, wherein the feature extraction model comprises an encoder including at least two encoding layers for performing image feature extraction of different scales, and a dictionary learning-based vector quantization module disposed after at least one encoding layer;

a first feature image obtaining module configured to input the second image sample into the feature extraction model, and obtain a second feature image of the second image sample;

a second model determination module configured to determine an image segmentation model from the second feature image and the sample label.

According to a third aspect of embodiments herein, there is provided an image segmentation method including:

receiving a CT image of a human target part input by a user, inputting the CT image into a feature extraction model, and obtaining a feature image of the CT image;

inputting the characteristic image into an image segmentation model, obtaining an image segmentation result of the CT image, and displaying the image segmentation result to the user, wherein the characteristic extraction model and the image segmentation model are the characteristic extraction model and the image segmentation model in the image segmentation model determination method.

According to a fourth aspect of embodiments herein, there is provided an image segmentation apparatus including:

the image receiving module is configured to receive a CT image of a human target part input by a user, input the CT image into a feature extraction model and obtain a feature image of the CT image;

and the image classification module is configured to input the feature image into an image segmentation model, obtain an image segmentation result of the CT image, and display the image segmentation result to the user, wherein the feature extraction model and the image segmentation model are the feature extraction model and the image segmentation model in the image segmentation model determination method.

According to a fifth aspect of embodiments of the present specification, there is provided an image segmentation model determination method including:

responding to an image segmentation model processing request sent by a user, and displaying an image input interface for the user;

receiving a first image sample set and a second image sample set which are input by the user through the image input interface and contain target objects, wherein the second image sample set comprises second image samples and sample labels corresponding to the second image samples;

and determining an image segmentation model according to the second characteristic image and the sample label, and returning the image segmentation model to the user.

According to a sixth aspect of embodiments herein, there is provided an image segmentation model determination apparatus including:

the interface display module is configured to respond to an image segmentation model processing request sent by a user and display an image input interface for the user;

the sample receiving module is configured to receive a first image sample set and a second image sample set which are input by the user through the image input interface and contain a target object, wherein the second image sample set comprises a second image sample and a sample label corresponding to the second image sample;

a third model determination module configured to determine a feature extraction model from a first image sample in the first image sample set, wherein the feature extraction model comprises an encoder including at least two encoding layers for performing image feature extraction of different scales, and a dictionary-learning-based vector quantization module disposed after at least one encoding layer;

a second feature image obtaining module configured to input the second image sample into the feature extraction model, and obtain a second feature image of the second image sample;

a fourth model determination module configured to determine an image segmentation model from the second feature image and the sample label, and return the image segmentation model to the user.

According to a seventh aspect of embodiments herein, there is provided a computing device comprising:

a memory and a processor;

the memory is configured to store computer-executable instructions and the processor is configured to execute the computer-executable instructions, which when executed by the processor, implement the image segmentation model determination method or the steps of the image segmentation method described above.

According to an eighth aspect of embodiments herein, there is provided a computer-readable storage medium storing computer-executable instructions that, when executed by a processor, implement the image segmentation model determination method or the steps of the image segmentation method described above.

According to a ninth aspect of embodiments herein, there is provided a computer program, wherein the computer program, when executed in a computer, causes the computer to execute the steps of the image segmentation model determination method or the image segmentation method described above.

One embodiment of the present specification implements an image segmentation model determination method, including determining a first image sample set and a second image sample set including a target object, where the second image sample set includes a second image sample and a sample label corresponding to the second image sample; determining a feature extraction model according to a first image sample in the first image sample set, wherein the feature extraction model comprises an encoder, the encoder comprises at least two encoding layers for extracting image features with different scales, and a dictionary learning-based vector quantization module arranged behind at least one encoding layer; inputting the second image sample into the feature extraction model to obtain a second feature image of the second image sample; and determining an image segmentation model according to the second characteristic image and the sample label.

Specifically, the method trains a feature extraction model for extracting image features based on dictionary learning under different scales through a first image sample set without labels, combines the dictionary learning with an encoder by utilizing the thought of the dictionary learning, so that the feature extraction model can learn rich and high-level semantic information of a first image sample, and can subsequently realize multi-scale image feature extraction; then, the feature extraction model is used as a feature extractor, and a small amount of second image sample sets with labels are combined to train to obtain an image segmentation model, so that the image segmentation model can perform accurate image segmentation on an image containing a target object subsequently.

Drawings

Fig. 1 is a schematic view of a specific scene of an image segmentation method applied to CT image segmentation of a heart of a human body according to an embodiment of the present disclosure;

FIG. 2 is a flowchart of a method for determining an image segmentation model according to an embodiment of the present disclosure;

FIG. 3 is a flowchart illustrating a processing procedure of a method for determining an image segmentation model according to an embodiment of the present disclosure;

FIG. 4 is a schematic network structure diagram of an unsupervised multi-level sparse vector quantization variational automatic encoder SVQ-VAE in an image segmentation model determination method provided in an embodiment of the present specification;

fig. 5 is a schematic diagram of semantic information visualization results learned in the process of reconstructing an original image by different decoding layers in an image segmentation model determination method provided in an embodiment of the present specification;

fig. 6 is a schematic structural diagram of an image segmentation model determination apparatus according to an embodiment of the present disclosure;

FIG. 7 is a flowchart of an image segmentation method provided in one embodiment of the present description;

FIG. 8 is a flow chart of another method for determining an image segmentation model provided in an embodiment of the present description;

fig. 9 is a block diagram of a computing device according to an embodiment of the present disclosure.

Detailed Description

In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present description. This description may be embodied in many different forms and should not be construed as limited to the embodiments set forth herein, as those skilled in the art will be able to make and use the present disclosure without departing from the spirit and scope of the present disclosure.

The terminology used in the description of the one or more embodiments is for the purpose of describing the particular embodiments only and is not intended to be limiting of the description of the one or more embodiments. As used in this specification and the appended claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used in one or more embodiments of the present specification refers to and encompasses any and all possible combinations of one or more of the associated listed items.

It will be understood that, although the terms first, second, etc. may be used herein in one or more embodiments to describe various information, these information should not be limited by these terms. These terms are only used to distinguish one type of information from another. For example, a first can also be referred to as a second and, similarly, a second can also be referred to as a first without departing from the scope of one or more embodiments of the present description. The word "if" as used herein may be interpreted as "at" \8230; "or" when 8230; \8230; "or" in response to a determination ", depending on the context.

First, the noun terms referred to in one or more embodiments of the present specification are explained.

CT: computed Tomography, which is an electronic Computed Tomography, uses a precisely collimated X-ray beam, gamma rays, ultrasound, etc. to scan a section of a human body one after another around a certain part of the body together with a highly sensitive detector.

SVQ-VAE: sparse Vector Quantization variation Auto-Encoder.

MLP: multi-Layer perceivron, a simple forward architecture neural network.

In order to solve the technical problem, a human body target part segmentation method based on pre-training can be adopted: MIM (Masked Image Modeling) implementation; for example, MAE (Masked AutoEncoder): the pre-training and fine-tuning mode is mainly used for the human target part segmentation task, but the problem is that it is not known what the pre-training has learned, and in addition, the decoder used for human target part segmentation is a heavy model, and the realization effect is poor.

Based on this, in the present specification, two image segmentation model determination methods are provided. One or more embodiments of the present specification relate to two kinds of image segmentation model determination apparatuses, an image segmentation method, an image segmentation apparatus, a computing device, a computer-readable storage medium, and a computer program, which are described in detail one by one in the following embodiments.

Referring to fig. 1, fig. 1 is a schematic view illustrating a specific scene of an image segmentation method applied to CT image segmentation of a human heart according to an embodiment of the present disclosure.

Fig. 1 includes a CT scanner 102, a terminal 104, and a server 106.

In specific implementation, the CT scanner 102 performs CT scanning on a user to be subjected to CT image segmentation of the human heart to obtain a CT image of the human heart of the user; the terminal 104 acquires a CT image of the heart from the CT scanner 102, and sends the CT image of the heart to the server 106, the server 106 inputs the CT image of the heart into a pre-trained feature extraction model for feature extraction, inputs the extracted feature image of the CT image into an image processing model, outputs a heart segmentation image (i.e., a Maskt mask image) corresponding to the CT image of the heart through the image processing model, and returns the heart segmentation image to the terminal 104; an operating user (e.g., a doctor) of the terminal 104 can perform a next judgment on the heart condition of the user according to the heart segmentation image, such as heart disease screening, cardiac preoperative planning, and the like. The feature extraction model can be understood as an encoder for dictionary learning by arranging at least two encoding layers, an automatic encoder for vector quantization variation based on sparse multi-level vectors, and an unsupervised characterization learning model obtained by combining with CT image training of unlabeled historical hearts, and the image segmentation model can be understood as a deep learning model which is obtained by combining with a small amount of CT images of labeled learned historical hearts and using the unsupervised characterization learning model as a feature extractor.

The image processing model may output not only the heart segmentation image but also object labels of the CT image of the heart, such as the left atrium, the right atrium, the left ventricle, and the right ventricle, based on the label of each segment in the heart segmentation image.

The image segmentation method provided by the embodiment of the specification is applied to a specific scene of CT image segmentation of a human heart, and provides a multi-level vector quantization variation automatic encoder based on sparseness, wherein the encoder learns overcomplete dictionaries under different scales on a large amount of label-free data, and acquires rich semantic representations through the dictionaries; and finally, training a simple MLP model based on a small amount of labeling data to realize accurate segmentation of the multi-organ image, and improving the segmentation accuracy of the image processing model.

Referring to fig. 2, fig. 2 is a flowchart illustrating an image segmentation model determination method according to an embodiment of the present disclosure, which specifically includes the following steps.

Step 202: a first set of image samples containing the target object and a second set of image samples are determined.

Wherein the second image sample set includes a second image sample and a sample label corresponding to the second image sample.

Specifically, the image segmentation model determination method provided in the embodiments of the present disclosure may be applied not only in the medical field, but also in other fields with complex diversity and high labeling cost, such as satellite images or planetary images in the aerospace field, for segmenting CT images of various human or animal target portions (e.g., human or animal organs such as heart and liver).

For convenience of understanding, the embodiments of the present disclosure will be described in detail by taking the application of the image segmentation model in the medical field as an example.

Then, in the case where the image segmentation model is applied in the medical field, the target object may be understood as a human target portion, such as a human organ like a heart, a liver, etc.; the first image sample set containing the target object may be understood as a first image sample set containing a target portion of a human body, for example, a first image sample set containing a heart of the human body, that is, the first image sample set includes a plurality of first image samples containing the heart of the human body, and each first image sample contains different hearts of the human body; similarly, the second image sample set including the target object may be understood as a second image sample set including a target portion of a human body, for example, a second image sample set including a heart of the human body, that is, the second image sample set includes a plurality of second image samples including a heart of the human body, and each of the second image samples includes a different heart of the human body; namely, the first image sample set and the second image sample set containing the target object are the first CT image sample set and the second CT image sample set containing the target portion of the human body.

The sample label corresponding to the second image sample may be understood as a segmented image of the second image sample, for example, a segmented image including a human heart, for example, in the second image sample including the human heart, the left atrium, the right atrium, the left ventricle, the right ventricle, and the like of the human heart in the second image sample are segmented and labeled, and then the segmented and labeled image may be understood as the sample label corresponding to the second image sample.

In practical applications, there may be a partial overlap between a first image sample in the first image sample set and a second image sample in the second image sample set, or the second image sample in the second image sample set may be the first image sample obtained from the first image sample set; because the first image samples in the first image sample set are subsequently applied to unsupervised feature extraction model training without labeled sample data, and the second image samples in the second image sample set are subsequently applied to supervised image segmentation model training without data labeling to generate sample labels, a large number of unlabeled first image samples can be adopted during unsupervised feature extraction model training in order to improve the model training effect; if the supervised image segmentation model training is carried out, a small number of second image samples with labels are adopted; thus, the number of first image samples in the first set of image samples is larger than the number of second image samples in the second set of image samples.

Taking the target object as a human heart as an example, determining a first image sample set and a second image sample set containing the target object may be understood as acquiring or receiving the first image sample set and the second image sample set containing different human hearts, where the first image sample set includes unlabeled first image samples, and the second image sample set includes labeled second image samples.

Step 204: determining a feature extraction model from a first image sample of the first set of image samples.

The feature extraction model comprises an encoder, wherein the encoder comprises at least two encoding layers for extracting image features of different scales, and a vector quantization module which is arranged behind at least one encoding layer and is based on dictionary learning.

The feature extraction model may be understood as a feature extraction model of an encoder (encoder) -decoder (decoder) structure, for example, a feature extraction model of a Unet structure; the Unet structure is a relatively simple image segmentation algorithm, and the target features are extracted through four down-sampling and then restored to the original size through four up-sampling, namely, the Unet structure is actually an algorithm based on the idea of an encoder-decoder, the encoder comprises four encoding layers, and a decoder corresponding to the encoder comprises four symmetrical decoding layers.

For convenience of understanding, the following embodiments will be described in detail by taking a feature extraction model as an example of a feature extraction model with a Unet structure.

Specifically, a feature extraction model is determined according to a first image sample in the first image sample set; it can be understood that, according to a plurality of first image samples in the first image sample set, a feature extraction model of the Unet structure is obtained by training, and the feature extraction model includes an encoder and a decoder, wherein the encoder includes at least two encoding layers for performing image feature extraction of different scales, and a vector quantization module based on Dictionary Learning, which is disposed behind at least one encoding layer, wherein the goal of Dictionary Learning (Dictionary Learning) is to extract the most essential features of things (similar to words or words in the Dictionary).

In specific implementation, the feature extraction model is a feature extraction model based on an encoder-decoder structure, and then the feature extraction model training is performed according to the first image sample in the first image sample set, which is specifically implemented as follows:

determining a feature extraction model from a first image sample of the first set of image samples, comprising:

inputting a first image sample in the first image sample set into a current coding layer of a coder of a feature extraction model for coding, and obtaining an initial feature image of the first image sample on the current coding layer;

under the condition that the current coding layer is determined to be provided with a dictionary learning-based vector quantization module, determining a target characteristic image of the first image sample on the current coding layer according to the vector quantization module;

after the initial characteristic image of the first image sample on the current coding layer is downsampled, inputting the first image sample into a next coding layer of the current coding layer to obtain a target characteristic image of the first image sample on the next coding layer;

inputting the target characteristic images of the first image sample in all the coding layers into a decoding layer of a decoder of the characteristic extraction model for decoding, and training according to a decoding result to obtain the characteristic extraction model, wherein the coding layer in the coder and the decoding layer in the decoder are symmetrically arranged.

In order to enable the feature extraction effect of the feature extraction model to be better, in the embodiment of the present specification, a multi-level sparse vector quantization automatic encoder (SVQ-VAE) is used to perform unsupervised characterization learning on diversity data, and in practical application, a vector quantization module based on dictionary learning is not arranged behind all coding layers in the SVQ-VAE, and in order to enable the feature extraction effect of the feature extraction model to be better, in the embodiment of the present specification, a vector quantization module based on dictionary learning is arranged in each coding layer of the SVQ-VAE as an example, and a feature extraction model is trained, so that the feature extraction model can learn an over-complete dictionary under different scales of different coding layers of the SVQ-VAE, so as to implement extraction of multi-scale features of an image.

Taking one first image sample in the first image sample set as an example, the training of the feature extraction model is explained in detail.

Specifically, a first image sample in a first image sample set is input into a current coding layer (for example, a first coding layer) of a feature extraction model for coding, and an initial feature image of the first pattern sample on the current coding layer, that is, a coded initial coding feature image, is obtained; under the condition that a vector quantization module based on dictionary learning is arranged on the current coding layer, determining a target characteristic image of the first image sample on the current coding layer according to the vector quantization module; then, down-sampling the initial feature image of the first image sample in the current coding layer, for example, down-sampling 256 pixels to 128 pixels, and inputting the down-sampled initial feature image into a next coding layer of the current coding layer in the case that the current coding layer has a next coding layer (for example, a second coding layer), and obtaining a target feature image of the first image sample in the next coding layer through processing in the next coding layer; and repeating the steps until a target characteristic image of the first image sample on each coding layer of all coding layers is obtained, finally decoding by combining a decoding layer of a decoder of the characteristic extraction model according to the target characteristic image of the first image sample on each coding layer of all coding layers, and training according to a decoding result to obtain the characteristic extraction model.

In the method for determining an image segmentation model provided in the embodiment of the present specification, a sparse vector quantization automatic encoder (SVQ-VAE) is provided in a feature extraction model to perform unsupervised characterization learning on a diversity of first image samples, and a complete dictionary can be learned under different scales of different coding layers by the SVQ-VAE, so that training of the feature extraction model is realized, so that the feature extraction model performs multi-scale feature extraction on an image subsequently, and feature richness is improved.

Specifically, if a vector quantization module based on dictionary learning is also arranged on a next coding layer of the current coding layer, after the initial feature image of the first image sample on the current coding layer is downsampled, the first image sample is input to the next coding layer of the current coding layer to perform a specific implementation manner of the target feature image, which is the same as the implementation step of the first image sample on the current coding layer, so that feature learning of the first image sample on the other scale of the coding layer by the encoder of the next coding layer is completed. The specific implementation steps are as follows:

after the initial feature image of the first image sample on the current coding layer is downsampled, the first image sample is input into a next coding layer of the current coding layer to obtain a target feature image of the first image sample on the next coding layer, and the method includes:

after the initial characteristic image of the first image sample on the current coding layer is downsampled, inputting the first image sample into a next coding layer of the current coding layer;

coding the next coding layer to obtain an initial characteristic image of the first image sample on the next coding layer;

in a case that it is determined that the next coding layer is provided with a dictionary learning-based vector quantization module, determining a target feature image of the first image sample in the next coding layer according to the vector quantization module.

The vector quantization module based on dictionary learning and set in the next coding layer are the same as the vector quantization module based on dictionary learning and set in the current coding layer in the above embodiments, and are not described herein again.

The detailed description will be given by taking an example in which a current coding layer is a first coding layer and a next coding layer of the current coding layer is a second coding layer.

Specifically, after the initial feature image of a first image sample in a first coding layer is downsampled, the downsampled initial feature image is input into a second coding layer; after the second coding layer is coded, obtaining an initial characteristic image of the first image sample in the second coding layer; and then determining the target characteristic image of the first image sample on the second coding layer according to the vector quantization module under the condition that the vector quantization module based on dictionary learning is arranged on the second coding layer.

When the second coding layer also has a next coding layer (such as a third coding layer) and the third coding layer is also provided with a vector quantization module based on dictionary learning, the specific implementation manner of obtaining the target feature image of the first image sample on the third coding layer is the same as the step of obtaining the target feature image of the first image sample on the second coding layer; similarly, by analogy, through the implementation manner of obtaining the target feature image of the first image sample on the second coding layer, the target feature images of the first image sample on all coding layers of the feature extraction model can be obtained.

In practical application, if the feature extraction model is a feature extraction model of a Unet structure, the vector quantization module based on dictionary learning may be provided in each of the four coding layers, which has the advantages that, for example, in the first coding layer, the observed features of the first image sample are more complete, and at this time, the large-scale features of the subsequently segmented overcomplete dictionary are more concerned, and in the fourth coding layer, the observed features of the first image sample are more, and at this time, the detailed features of the subsequently segmented overcomplete dictionary are more concerned, so that the whole is equivalent to an encoder in the four coding layers, and the four overcomplete dictionaries are segmented to respectively concern the image features of the first image sample at different scales, so as to improve the extraction richness of the image features.

In specific implementation, the specific implementation steps of determining the target feature image of the first image sample in the first coding layer by the encoder arranged in the first coding layer and determining the target feature image of the first image sample in the second coding layer by the encoder arranged in the second coding layer are the same, and the specific implementation manner is as follows:

the determining, according to the vector quantization module, a target feature image of the first image sample at the current coding layer includes:

decomposing the initial feature image into an over-complete dictionary and sparse codes according to the vector quantization module;

determining a target feature image of the first image sample on the current coding layer according to the overcomplete dictionary and the sparse coding;

accordingly, the determining a target feature image of the first image sample at the next coding layer according to the vector quantization module comprises:

and determining a target feature image of the first image sample in the next coding layer according to the overcomplete dictionary and the sparse coding.

The detailed explanation will be given by taking the current coding layer as the first coding layer and the next coding layer as the second coding layer.

Specifically, according to a vector quantization module behind a first coding layer, decomposing an initial feature image of a first image sample in the first coding layer into a corresponding overcomplete dictionary and a corresponding sparse code through a preset algorithm (such as an Orthogonal Matching Pursuit algorithm, orthogonal Matching Pursuit, OMP); then, according to the overcomplete dictionary and the sparse coding, a target feature image of the first image sample in the first coding layer is calculated, for example, the overcomplete dictionary and the sparse coding are multiplied, and the obtained product is the target feature image of the first image sample in the first coding layer.

Similarly, in the second coding layer, according to the vector quantization module behind the second coding layer, the initial feature image of the first image sample in the second coding layer is decomposed into a corresponding overcomplete dictionary and a corresponding sparse code through a preset algorithm (such as an Orthogonal Matching Pursuit algorithm, orthogonal Matching Pursuit, OMP); then, according to the overcomplete dictionary and the sparse code, a target feature image of the first image sample at the second coding layer is calculated, for example, the overcomplete dictionary and the sparse code are multiplied, and the obtained product is the target feature image of the first image sample at the second coding layer.

In this embodiment of the present specification, in a case that a dictionary learning-based vector quantization module is disposed behind a coding layer of a feature extraction model, the vector quantization module may decompose an initial feature image of a first image sample in each coding layer into an overcomplete dictionary and a sparse code, and calculate a target feature image of the first image sample in a current scale of a current coding layer according to the overcomplete dictionary and the sparse code, so that a decoding layer of a subsequent decoder learns very stable and rich semantic information of the first image sample on the basis of performing reconstruction of the first image sample according to target feature images in different scales of each coding layer.

Under the condition that the current coding layer and the next coding layer of the current coding layer are not provided with the vector quantization module based on dictionary learning, the specific implementation manner of obtaining the target characteristic image of the first image sample on the current coding layer and the target characteristic image of the first image sample on the next coding layer of the current coding layer is as follows:

the obtaining the first image sample after the initial feature image of the current coding layer further comprises:

under the condition that the current coding layer is determined not to be provided with a vector quantization module based on dictionary learning, determining the initial characteristic image of the first image sample in the current coding layer as a target characteristic image of the first image sample in the current coding layer;

correspondingly, the obtaining the first image sample after the initial feature image of the next encoding layer further includes:

and in the case that the next coding layer is determined not to be provided with a vector quantization module based on dictionary learning, determining the initial characteristic image of the first image sample in the next coding layer as the target characteristic image of the first image sample in the next coding layer.

Specifically, in the case that it is determined that the first coding layer is not provided with the dictionary learning-based vector quantization module, the initial feature image of the first image sample in the first coding layer is determined as the target feature image of the first image sample in the first coding layer.

Similarly, in the case that the second coding layer is determined not to be provided with the dictionary learning-based vector quantization module, the initial feature image of the first image sample in the second coding layer is determined as the target feature image of the first image sample in the second coding layer.

In the embodiment of the present specification, under the condition that no vector quantization module based on dictionary learning is set in any coding layer of the feature extraction model, the initial feature image of the first image sample in each coding layer may be directly used as a target feature image to participate in subsequent decoder training, so as to improve the training efficiency of the feature extraction model.

After the target feature image of the first image sample on each coding layer of the feature extraction model is obtained, training of the feature extraction model can be achieved according to the target feature image of the first image sample on each coding layer of the feature extraction model in combination with a decoder of the feature extraction model; the specific implementation mode is as follows:

inputting the target feature images of the first image sample in all coding layers into a decoding layer of a decoder of the feature extraction model for decoding, and training according to a decoding result to obtain the feature extraction model, wherein the method comprises the following steps:

inputting the target feature image of the first image sample on the last coding layer into a current decoding layer of a decoder of the feature extraction model, which corresponds to the last coding layer, for decoding to obtain an initial decoding feature image of the first image sample on the current decoding layer;

inputting a target feature image of the first image sample in an encoding layer above the last encoding layer and a decoding feature image of the first image sample in the current decoding layer into the last decoding layer of the current encoding layer to obtain an initial decoding feature image of the first image sample in the last decoding layer;

and training to obtain the feature extraction model according to the initial decoding feature images of the first image sample in all decoding layers.

Specifically, a target feature image of a first image sample on a last coding layer (e.g., a fourth coding layer) is input into a current decoding layer (e.g., a first decoding layer) corresponding to the last coding layer for decoding, so as to obtain an initial decoding feature image of the first image sample on the current decoding layer; inputting a target feature image of a first image sample in a last decoding layer (such as a third decoding layer) of the last decoding layer and an initial decoding feature image of the first image sample in the current decoding layer into a last decoding layer (such as a second decoding layer) of the current decoding layer to obtain an initial decoding feature image of the first image sample in the last decoding layer; and finally, training to obtain a feature extraction model according to the initial decoding feature image of the first image sample in each decoding layer.

In the image segmentation model determining method provided in the embodiment of the present specification, the target feature image of each coding layer is combined with the output of the last decoding layer of each decoding layer, and is input to the current decoding layer for decoding processing, and in the process of implementing reconstruction of the first image sample, rich semantic information of the first image sample in different scales of different coding layers is learned.

Specifically, the training to obtain the feature extraction model according to the initial decoding feature images of the first image sample at all decoding layers includes:

and determining a target decoding characteristic image corresponding to the first image sample according to the initial decoding characteristic images of the first image sample in all decoding layers, and training to obtain the characteristic extraction model according to the target decoding characteristic image.

In practical application, after the initial decoding feature images of the first image sample in all decoding layers are obtained, the target decoding feature images corresponding to the first image sample can be obtained according to the initial decoding feature images, for example, the initial decoding feature images of the first image sample in all decoding layers are spliced to obtain the target decoding feature images corresponding to the first image sample; and finally, quickly and accurately training the feature extraction model according to the target decoding feature image. The specific implementation mode is as follows:

the training to obtain the feature extraction model according to the target decoding feature image comprises:

adjusting network parameters of the feature extraction model and the overcomplete dictionary according to the target decoding feature image;

and obtaining the trained feature extraction model under the condition that the preset training ending condition is met.

The preset training ending condition may be set according to actual application, for example, the preset training ending condition may be understood as that the iteration number exceeds a preset threshold (for example, 20000 times), and the like; or dividing a part of training samples in the initial training samples as a verification set, and finishing the training of the feature extraction model under the condition that the loss function of the verification set meets the preset condition.

Specifically, network parameters of the feature extraction model and the overcomplete dictionaries of all coding layers are adjusted according to the target decoding feature images until the training of the feature extraction model meets the preset training ending condition, and the trained feature extraction model capable of obtaining rich semantic information with different scales is obtained.

Step 206: and inputting the second image sample into the feature extraction model to obtain a second feature image of the second image sample.

In the training steps in the above embodiments, after the feature extraction model is obtained through training, after the image segmentation model is trained, the second feature image of each second image sample is extracted according to the feature extraction model, and then the image segmentation model may be trained according to the second feature image and the corresponding sample label, where the specific implementation manner is as follows:

the inputting the second image sample into the feature extraction model to obtain a second feature image of the second image sample includes:

and inputting the second image sample into the feature extraction model, obtaining a target decoding feature image of the second image sample, and determining the target decoding feature image as a second feature image of the second image sample.

A second image sample is taken as an example for illustration.

Specifically, the second image sample is input into the feature extraction model, a target decoding feature image of the second image sample is obtained, and the target decoding feature image of the second image sample is determined as the second feature image of the second image sample.

Step 208: and determining an image segmentation model according to the second characteristic image and the sample label.

Wherein the sample label is an image segmentation result of the second image sample;

correspondingly, the determining an image segmentation model according to the second feature image and the sample label includes:

and training to obtain an image segmentation model according to the second characteristic image and the image segmentation result of the second image sample, wherein the image segmentation model comprises a multilayer perceptron image segmentation model.

Specifically, the image segmentation result of the second image sample may be understood as an image segmentation result implemented by color labeling or the like for the second image sample, for example, a target portion is labeled with different colors in the second image sample, and different symbols are distinguished.

The image segmentation model includes, but is not limited to, a multi-layered perceptron image segmentation model (i.e., MLP), and other lightweight image segmentation models are possible.

In specific implementation, according to the second feature image corresponding to each second image sample in the second image sample set and the sample label corresponding to each second image sample: and training an image segmentation result of the second image sample to obtain the image segmentation model. The image segmentation model based on the feature extraction model can be trained through a small number of second image samples, and a good image segmentation effect can be obtained.

In addition, in order to ensure the training effect of the feature extraction model, semantic visualization may be performed on the output of the decoding layer of the feature extraction model, and the specific implementation manner is as follows:

after obtaining the initial decoding feature image of the last decoding layer of the first image sample, the method further comprises:

and displaying the initial decoding characteristic images of the first image sample at all decoding layers through an image display interface.

According to the image segmentation model determining method provided by the embodiment of the specification, a feature extraction model for extracting image features based on dictionary learning under different scales is trained through a first image sample set without a label, and the dictionary learning and an encoder are combined by utilizing the thought of the dictionary learning, so that the feature extraction model can learn rich and high-level semantic information of a first image sample, and subsequently multi-scale image feature extraction can be realized; then, the feature extraction model is used as a feature extractor, and a small amount of second image sample sets with labels are combined to train to obtain an image segmentation model, so that the image segmentation model can perform accurate image segmentation on an image containing a target object subsequently.

The following describes the image segmentation model determination method further by taking an application of the image segmentation model determination method provided in this specification in the medical field as an example, with reference to fig. 3. Fig. 3 shows a processing flow chart of an image segmentation model determination method provided in an embodiment of the present specification, and specifically includes the following steps.

Specifically, the image segmentation model determination method comprises two parts, wherein the first part is as follows: feature extraction model training, namely Unsupervised multi-level sparse vector quantization variation auto-encoder learning (Unsupervised iterative sparse VQVAE learning), the second part is: image segmentation model training, i.e., semantic segmentation (semantic segmentation).

The feature extraction model in the image segmentation model determination method is a feature extraction model with a Unet structure, the feature extraction model comprises an encoder (namely SVQ-VAE) and a decoder, the encoder comprises four encoding layers, a vector quantization module based on dictionary learning is arranged behind each encoding layer, the decoder comprises four decoding layers, and the image segmentation model is introduced in detail by taking MLP as an example.

The method comprises the following steps: taking an unlabeled cardiac CT image as an example, for the first part: the training of the feature extraction model is described in detail.

Specifically, the cardiac CT image is input into a first coding layer of an encoder to be encoded, so as to obtain a first encoded image (corresponding to the initial feature image in the foregoing embodiment), the first encoded image is decomposed into a first overcomplete dictionary and a first sparse code by a vector quantization module based on dictionary learning, and then a first vector quantization code (corresponding to the target feature image in the foregoing embodiment) of the first encoded image at a feature extraction scale of the first coding layer is calculated according to the first overcomplete dictionary and the first sparse code; then inputting the first vector quantization code into a corresponding fourth decoding layer, down-sampling the first coded image, inputting the first coded image into a second coding layer for coding to obtain a second coded image, decomposing the second coded image into a second overcomplete dictionary and a second sparse code through a dictionary learning-based vector quantization module, and calculating a second vector quantization code of the second coded image under the feature extraction scale of the second coding layer according to the second overcomplete dictionary and the second sparse code; the second vector quantization code is then input to a corresponding third decoding layer.

Meanwhile, after downsampling the second coded image, inputting the second coded image into a third coding layer for coding to obtain a third coded image, decomposing the third coded image into a third overcomplete dictionary and a third sparse code by a dictionary learning-based vector quantization module, and calculating a third vector quantization code of the third coded image under the feature extraction scale of the third coding layer according to the third overcomplete dictionary and the third sparse code; then inputting the third vector quantization code into a corresponding second decoding layer; the fourth coded image is decomposed into a fourth overcomplete dictionary and a fourth sparse code based on a vector quantization module of dictionary learning, and a fourth vector quantization code of the fourth coded image under the feature extraction scale of the fourth coded layer is calculated according to the fourth overcomplete dictionary and the fourth sparse code; and then inputting the fourth vector quantization code into a corresponding first decoding layer for decoding to obtain a first decoded image of the cardiac CT image at the first decoding layer.

The first decoding layer up samples the first decoded image and inputs the first decoded image into the second decoding layer, and the second decoding layer fuses the third vector quantization coding with the up-sampled first decoded image to obtain a second decoded image of the cardiac CT image at the second decoding layer.

The second decoding layer up samples the second decoded image and inputs the second decoded image into the third decoding layer, and the third decoding layer fuses the second vector quantization coding and the up-sampled second decoded image to obtain a third decoded image of the cardiac CT image in the third decoding layer.

And the third decoding layer performs up-sampling on the third decoding image and inputs the third decoding image into a fourth decoding layer, and the fourth decoding layer fuses the first vector quantization coding and the up-sampled third decoding image to obtain a fourth decoding image of the cardiac CT image on the fourth decoding layer.

Finally, splicing the first decoding image, the second decoding image, the third decoding image and the fourth decoding image to obtain a target decoding characteristic image corresponding to the cardiac CT image; and then, adjusting network parameters of the feature extraction model through a target decoding feature image corresponding to the cardiac CT image, adjusting an over-complete dictionary of each coding layer based on other subsequent cardiac CT images, and acquiring the trained feature extraction model when the training meets a preset finishing condition.

Referring to fig. 4, fig. 4 shows a schematic network structure diagram of an unsupervised multi-level sparse vector quantization variational automatic encoder SVQ-VAE in an image segmentation model determination method provided in an embodiment of the present specification.

In fig. 4, X may represent a cardiac CT image X, the original cardiac CT image X is decomposed into an overcomplete dictionary and a sparse code, then a vector quantization code of X is obtained according to the overcomplete dictionary and the sparse code corresponding to X, then a corresponding decoding layer is reconstructed according to the vector quantization code of X to obtain X1, and then a network parameter of the network structure is reversely adjusted according to X1.

Step two: selecting a part of cardiac CT images with labels from the cardiac CT images of the training feature extraction model, and obtaining target feature images spliced according to the first decoded image, the second decoded image, the third decoded image and the fourth decoded image through the feature extraction model; and training to obtain an image classification model according to the target characteristic image corresponding to the cardiac CT image and the sample label corresponding to the cardiac CT image.

Of course, in practical applications, the training of the image classification model may also be implemented by using other labeled cardiac CT images, which is not limited herein.

According to the introduction of the above embodiment, the image segmentation model determination method is implemented in two stages, in the first stage, a large number of unlabeled CT images of a human target part (such as the heart) are used for characterization learning, a multi-level sparse vector quantization variational automatic encoder is provided in a feature extraction model, so that the feature extraction model can learn overcomplete dictionaries under different scales of different coding layers and on the premise of ensuring sparsity, codes (i.e., coded images) of the CT images of the human target part in each coding layer are obtained through the obtained sparse codes and the overcomplete dictionaries, and finally the vector quantization codes are reconstructed through decoders of corresponding decoding layers, so that the decoders learn rich and stable semantic information in the reconstruction process, and thus the feature extraction model is obtained through training.

Meanwhile, in the reconstruction process, the output of the decoding layers can be simply clustered so as to see semantic visualization results learned in the reconstruction process of different decoding layers.

Referring to fig. 5, fig. 5 is a schematic diagram illustrating a semantic information visualization result learned in a process of reconstructing an original image by different decoding layers in an image segmentation model determination method according to an embodiment of the present specification.

Fig. 5 includes the original image, the visualization of the decoder of the second decoding layer, and the visualization of the decoder of the third decoding layer. As can be seen from fig. 5, the decoder of the third decoding layer can learn semantic information of more details, such as fat at the edge of the heart, relevant information of blood vessels, and the like; the decoder of the second decoding layer can learn large-scale semantic information, such as relevant information of the heart as a whole.

And in the second stage, after the training of the feature extraction model, acquiring a small number of CT images of the labeled human target part (such as the heart) and training an MLP classifier based on pixel-level (namely the image segmentation model) so as to realize the subsequent segmentation of the CT image of the human target part according to the image segmentation model.

The method comprises inputting a labeled cardiac CT image into a feature extraction model of a first part to obtain output results of different decoding layers, up-sampling the output results to image sizes specified by different encoding layers, and splicing all the output results together after flatten (i.e., compression) to form an N X M training set X, wherein N is an image feature (such as HxWxD), and M is the sum of output results of decoders of different decoding layers. And flattening a mask corresponding to the CT image of the heart to obtain a Y (sample label) of Nx1, and then training an MLP classifier by using the (X, Y) to obtain a final image segmentation model.

The image segmentation model determining method provided by the embodiment of the description well combines dictionary learning and VQ-VAE by using a dictionary learning idea, and is different from a typical VQ-VAE in that an over-complete dictionary is generated by learning and vector quantization codes are selected by using distances.

In the image segmentation model determination method provided by the embodiment of the description, the feature extraction model is trained through the designed unsupervised multi-level sparse vector quantization variation automatic encoder, so that the segmentation task based on the feature extraction model can be realized only by a very simple MLP model, and the MLP has a simple structure, so that a good training effect can be obtained only by a small amount of labeled data. Specifically, the embodiment of the specification realizes multi-scale representation learning of a feature extraction model through the design of multi-level SVQ-VAE; the problem that iteration of a VQ-VAE over-complete dictionary is difficult is solved by introducing dictionary learning, and a stable and rich semantic representation learning result can be obtained.

Corresponding to the above method embodiment, the present specification further provides an image segmentation model determination apparatus embodiment, and fig. 6 shows a schematic structural diagram of an image segmentation model determination apparatus provided in an embodiment of the present specification. As shown in fig. 6, the apparatus includes:

a sample determination module 602 configured to determine a first set of image samples containing a target object and a second set of image samples, wherein the second set of image samples includes a second image sample and a sample label corresponding to the second image sample;

a first model determining module 604 configured to determine a feature extraction model according to a first image sample in the first image sample set, wherein the feature extraction model includes an encoder including at least two encoding layers for performing image feature extraction of different scales, and a dictionary learning-based vector quantization module disposed after at least one encoding layer;

a first feature image obtaining module 606 configured to input the second image sample into the feature extraction model, and obtain a second feature image of the second image sample;

a second model determination module 608 configured to determine an image segmentation model from the second feature image and the sample label.

Optionally, the first model determining module 604 is further configured to:

inputting a first image sample in the first image sample set into a current coding layer of a coder of a feature extraction model for coding to obtain an initial feature image of the first image sample on the current coding layer;

Optionally, the first model determining module 604 is further configured to:

coding the next coding layer to obtain an initial feature image of the first image sample on the next coding layer;

determining a target feature image of the first image sample at the next coding layer according to a vector quantization module based on dictionary learning if it is determined that the next coding layer is provided with the vector quantization module.

Optionally, the first model determining module 604 is further configured to:

determining a target feature image of the first image sample in the current coding layer according to the overcomplete dictionary and the sparse coding;

Optionally, the apparatus further comprises:

a first image determining module configured to determine an initial feature image of the first image sample at the current coding layer as a target feature image of the first image sample at the current coding layer in a case that it is determined that the dictionary learning-based vector quantization module is not provided at the current coding layer;

accordingly, the apparatus further comprises:

a second image determining module configured to determine the initial feature image of the first image sample at the next coding layer as a target feature image of the first image sample at the next coding layer in case that it is determined that the dictionary learning based vector quantization module is not provided at the next coding layer.

Optionally, the first model determining module 604 is further configured to:

Optionally, the first model determining module 604 is further configured to

Optionally, the first model determining module 604 is further configured to:

Optionally, the first feature image obtaining module 606 is further configured to:

Optionally, the sample label is an image segmentation result of the second image sample;

accordingly, the second model determination module 608 is further configured to:

and training and obtaining an image segmentation model according to the second characteristic image and the image segmentation result of the second image sample, wherein the image segmentation model comprises a multilayer perceptron image segmentation model.

Optionally, the first image sample set and the second image sample set containing the target object are a first CT image sample set and a second CT image sample set containing a human target region.

Optionally, the apparatus further comprises:

and the visualization module is configured to display the initial decoding characteristic images of the first image sample at all decoding layers through an image display interface.

In the image segmentation model determination apparatus provided in the embodiment of the present specification, a feature extraction model including at least two coding layers is trained through a first image sample set without a tag, and each coding layer is provided with a coder that learns through a dictionary, and an overcomplete dictionary of a first image sample can be learned through the coder provided by each coding layer at different scales, so as to implement extraction of multi-scale features of the first image sample, so that the feature extraction model can learn rich and advanced semantic information of the first image sample; then, the feature extraction model is used as a feature extractor, and a small amount of second image sample sets with labels are combined to train to obtain an image segmentation model, so that the image segmentation model can perform accurate image segmentation on an image containing a target object subsequently.

The above is a schematic configuration of an image segmentation model determination apparatus of the present embodiment. It should be noted that the technical solution of the image segmentation model determination apparatus and the technical solution of the image segmentation model determination method described above belong to the same concept, and details that are not described in detail in the technical solution of the image segmentation model determination apparatus can be referred to the description of the technical solution of the image segmentation model determination method described above.

Referring to fig. 7, fig. 7 is a flowchart illustrating an image segmentation method according to an embodiment of the present disclosure, which specifically includes the following steps.

Step 702: receiving a CT image of a human target part input by a user, inputting the CT image into a feature extraction model, and obtaining a feature image of the CT image;

step 704: and inputting the characteristic image into an image segmentation model, obtaining an image segmentation result of the CT image, and displaying the image segmentation result to the user.

The feature extraction model and the image segmentation model are the feature extraction model and the image segmentation model in the image segmentation model determination method.

Specifically, the human target part can be understood as a human organ, such as a heart, a liver, and the like; the CT image of the target region of the human body may be understood as a CT image of an organ of the human body, such as a CT image of the heart, a CT image of the liver, etc.

In the image segmentation method provided in the embodiments of the present specification, after receiving a CT image of a target region of a human body input by a user, the CT image of the target region is input to the feature extraction model trained by the image segmentation model determination method of the above embodiments, so as to obtain a stable and rich feature image of the CT image of the target region, and then the feature image is input to the image segmentation model trained by the image segmentation model determination method of the above embodiments, so that a segmented image of the CT image of the target region by the image segmentation model can be obtained quickly and accurately.

Further, an embodiment of the present specification also provides an image segmentation apparatus including:

The foregoing is a schematic configuration of an image segmentation apparatus of the present embodiment. It should be noted that the technical solution of the image segmentation apparatus belongs to the same concept as the technical solution of the image segmentation method described above, and for details that are not described in detail in the technical solution of the image segmentation apparatus, reference may be made to the description of the technical solution of the image segmentation method described above.

Referring to fig. 8, fig. 8 shows a flowchart of another image segmentation model determination method provided in an embodiment of the present specification, which specifically includes the following steps.

Step 802: responding to an image segmentation model processing request sent by a user, and displaying an image input interface for the user;

step 804: receiving a first image sample set and a second image sample set which are input by the user through the image input interface and contain a target object, wherein the second image sample set comprises a second image sample and a sample label corresponding to the second image sample;

step 806: determining a feature extraction model from a first image sample of the first set of image samples.

The feature extraction model comprises an encoder, wherein the encoder comprises at least two encoding layers for extracting image features of different scales and a vector quantization module which is arranged behind at least one encoding layer and is based on dictionary learning;

step 808: inputting the second image sample into the feature extraction model to obtain a second feature image of the second image sample;

step 810: and determining an image segmentation model according to the second characteristic image and the sample label, and returning the image segmentation model to the user.

In another image segmentation model determination method provided in the embodiment of the present specification, a feature extraction model is trained by a designed unsupervised multi-level sparse vector quantization variation automatic encoder, so that a segmentation task based on the feature extraction model can be implemented only by a very simple MLP image segmentation model, and the image segmentation model has a simple structure, so that a good training effect can be obtained only by a small amount of labeled data, and training efficiency can be greatly improved.

In addition, an embodiment of the present specification further provides an image segmentation model determination apparatus, including:

the sample receiving module is configured to receive a first image sample set and a second image sample set which are input by the user through the image input interface and contain target objects, wherein the second image sample set comprises second image samples and sample labels corresponding to the second image samples;

a third model determining module configured to determine a feature extraction model according to a first image sample in the first image sample set, wherein the feature extraction model includes an encoder including at least two encoding layers for performing image feature extraction of different scales, and a dictionary-learning-based vector quantization module disposed after at least one encoding layer;

The above is a schematic configuration of another image segmentation model determination apparatus of the present embodiment. It should be noted that the technical solution of the image segmentation model determination apparatus and the technical solution of the another image segmentation model determination method belong to the same concept, and details that are not described in detail in the technical solution of the another image segmentation model determination apparatus can be referred to the description of the technical solution of the another image segmentation model determination method.

Referring to fig. 9, fig. 9 is a block diagram illustrating a computing device 900, according to one embodiment of the present disclosure. Components of the computing device 900 include, but are not limited to, a memory 910 and a processor 920. The processor 920 is coupled to the memory 910 via a bus 930, and a database 950 is used to store data.

Computing device 900 also includes access device 940, access device 940 enabling computing device 900 to communicate via one or more networks 960. Examples of such networks include a Public Switched Telephone Network (PSTN), a Local Area Network (LAN), a Wide Area Network (WAN), a Personal Area Network (PAN), or a combination of communication networks such as the internet. The Access device 940 may include one or more of any type of Network interface (e.g., a Network interface controller) that may be wired or Wireless, such as an IEEE802.11 Wireless Local Area Network (WLAN) Wireless interface, a Worldwide interoperability for Microwave Access (Wi-MAX) interface, an ethernet interface, a Universal Serial Bus (USB) interface, a cellular Network interface, a bluetooth interface, a Near Field Communication (NFC) interface, and so forth.

In one embodiment of the present description, the above-described components of computing device 900, as well as other components not shown in FIG. 9, may also be connected to each other, such as by a bus. It should be understood that the block diagram of the computing device architecture shown in FIG. 9 is for purposes of example only and is not limiting as to the scope of the description. Other components may be added or replaced as desired by those skilled in the art.

Computing device 900 may be any type of stationary or mobile computing device, including a mobile Computer or mobile computing device (e.g., tablet, personal digital assistant, laptop, notebook, netbook, etc.), a mobile phone (e.g., smartphone), a wearable computing device (e.g., smartwatch, smartglasses, etc.), or other type of mobile device, or a stationary computing device such as a desktop Computer or Personal Computer (PC). Computing device 900 may also be a mobile or stationary server.

Wherein the processor 920 is configured to execute computer-executable instructions that, when executed by the processor, implement the image segmentation model determination method or the steps of the image segmentation method described above.

The above is an illustrative scheme of a computing device of the present embodiment. It should be noted that the technical solution of the computing device belongs to the same concept as the technical solution of the image segmentation model determination method or the image segmentation method, and details that are not described in detail in the technical solution of the computing device can be referred to the description of the technical solution of the image segmentation model determination method or the image segmentation method.

An embodiment of the present specification further provides a computer-readable storage medium storing computer-executable instructions, which when executed by a processor, implement the image segmentation model determination method or the steps of the image segmentation method described above.

The above is an illustrative scheme of a computer-readable storage medium of the present embodiment. It should be noted that the technical solution of the storage medium belongs to the same concept as the technical solution of the image segmentation model determination method or the image segmentation method, and details that are not described in detail in the technical solution of the storage medium can be referred to the description of the technical solution of the image segmentation model determination method or the image segmentation method.

An embodiment of the present specification further provides a computer program, wherein when the computer program is executed in a computer, the computer is caused to execute the steps of the image segmentation model determination method or the image segmentation method described above.

The above is a schematic scheme of a computer program of the present embodiment. It should be noted that the technical solution of the computer program is the same as the technical solution of the image segmentation model determination method or the image segmentation method, and details that are not described in detail in the technical solution of the computer program can be referred to the description of the technical solution of the image segmentation model determination method or the image segmentation method.

The foregoing description has been directed to specific embodiments of this disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.

The computer instructions comprise computer program code which may be in the form of source code, object code, an executable file or some intermediate form, or the like. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, read-Only Memory (ROM), random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, and the like. It should be noted that the computer-readable medium may contain suitable additions or subtractions depending on the requirements of legislation and patent practice in jurisdictions, for example, in some jurisdictions, computer-readable media may not include electrical carrier signals or telecommunication signals in accordance with legislation and patent practice.

It should be noted that, for the sake of simplicity, the foregoing method embodiments are described as a series of combinations of acts, but it should be understood by those skilled in the art that the embodiments are not limited by the described order of acts, as some steps may be performed in other orders or simultaneously according to the embodiments. Further, those skilled in the art should also appreciate that the embodiments described in this specification are preferred embodiments and that acts and modules referred to are not necessarily required for an embodiment of the specification.

In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.

The preferred embodiments of the present specification disclosed above are intended only to aid in the description of the specification. Alternative embodiments are not exhaustive and do not limit the invention to the precise embodiments described. Obviously, many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles of the embodiments and the practical application, and to thereby enable others skilled in the art to best understand the specification and utilize the specification. The specification is limited only by the claims and their full scope and equivalents.

Claims

1. An image segmentation model determination method, comprising:

determining a feature extraction model according to a first image sample in the first image sample set, wherein the feature extraction model comprises an encoder and a vector quantization module, the encoder comprises at least two encoding layers for extracting image features with different scales, and the vector quantization module is arranged behind at least one encoding layer and is based on dictionary learning;

2. The image segmentation model determination method according to claim 1, wherein determining a feature extraction model according to a first image sample in the first image sample set comprises:

3. The image segmentation model determination method according to claim 2, wherein the down-sampling the initial feature image of the first image sample at the current coding layer and inputting the first image sample to a next coding layer of the current coding layer to obtain the target feature image of the first image sample at the next coding layer comprises:

4. The image segmentation model determination method according to claim 3, the determining, according to the vector quantization module, a target feature image of the first image sample at the current coding layer, comprising:

accordingly, the determining a target feature image of the first image sample at the next encoding layer according to the vector quantization module comprises:

5. The image segmentation model determination method according to claim 3, the obtaining the first image sample after an initial feature image of the current coding layer, further comprising:

correspondingly, the obtaining the first image sample after the initial feature image of the next coding layer further includes:

6. The image segmentation model determination method according to claim 2, wherein the step of inputting the target feature image of the first image sample in all the coding layers into a decoding layer of a decoder of the feature extraction model for decoding, and training the feature extraction model according to a decoding result comprises:

7. The image segmentation model determination method according to claim 6, wherein the training to obtain the feature extraction model according to the initial decoded feature images of the first image sample at all decoding layers comprises:

8. The image segmentation model determination method according to claim 1 or 6, wherein the inputting the second image sample into the feature extraction model to obtain a second feature image of the second image sample comprises:

9. The image segmentation model determination method according to claim 1, wherein the sample label is an image segmentation result of the second image sample;

10. The image segmentation model determination method according to claim 1, wherein the first image sample set and the second image sample set including the target object are a first CT image sample set and a second CT image sample set including a target portion of a human body.

11. The image segmentation model determination method of claim 6, the obtaining the first image sample after an initial decoded feature image of the last decoded layer, further comprising:

12. An image segmentation method, comprising:

inputting the feature image into an image segmentation model, obtaining an image segmentation result of the CT image, and displaying the image segmentation result to the user, wherein the feature extraction model and the image segmentation model are the feature extraction model and the image segmentation model in the image segmentation model determination method according to any one of claims 1 to 11.

13. An image segmentation model determination method, comprising:

14. A computer readable storage medium storing computer executable instructions which, when executed by a processor, implement the image segmentation model determination method according to any one of claims 1 to 11, or the steps of the image segmentation method according to claim 12.