CN112102251B

CN112102251B - Method and device for dividing image, electronic equipment and storage medium

Info

Publication number: CN112102251B
Application number: CN202010844261.7A
Authority: CN
Inventors: 刘立
Original assignee: Shanghai Biren Intelligent Technology Co Ltd
Current assignee: Shanghai Bi Ren Technology Co ltd
Priority date: 2020-08-20
Filing date: 2020-08-20
Publication date: 2023-10-31
Anticipated expiration: 2040-08-20
Also published as: CN112102251A

Abstract

The invention relates to a method and a device for dividing an image, electronic equipment and a storage medium, wherein the method for dividing the image comprises the following steps: acquiring at least one frame of projection data according to a sequence relation of space and time, and generating a corresponding projection sequence based on the at least one frame of projection data; inputting the projection sequence into a segmented image model for processing to generate a segmented image sequence; the method for dividing the image can directly generate the divided image sequence from the projection sequence, reduce the workload of subsequent processing and reduce the error of conversion.

Description

Method and device for dividing image, electronic equipment and storage medium

Technical Field

The present invention relates to the field of image processing technologies, and in particular, to a method and apparatus for dividing an image, an electronic device, and a storage medium.

Background

Image segmentation, which is an important technological branch in image processing, is a technique and a process for dividing an image into a plurality of specific areas with unique properties and presenting objects of interest in popular terms. Image segmentation techniques have important applications in a number of fields, such as medical radiology, light microscopy, holographic imaging, CT, and the like.

Taking the field of electronic computed tomography (Computed Tomography, CT) as an example, CT techniques are processes that solve for pixels in an image matrix based on acquired projection data, and then reconstruct images.

By machine-assisted detection of CT images, the diagnostic efficiency of a physician can be improved. An important step in improving the accuracy of the machine-assisted detection image is to divide and locate different tissues and organs on the generated CT image, and then detect in different divided areas. For example, for chest CT images, the lung region may be segmented from the chest CT image.

In the prior art, the conventional method is that an image is generated by CT projection, content segmentation is performed on the image, then subsequent analysis is performed, the operation time of the method is long, the loss of image information is caused in the process of content segmentation, the error is increased, and the accuracy of a final result is further affected.

Disclosure of Invention

Aiming at the technical problems in the prior art, the embodiment of the invention provides a method and a device for dividing images, electronic equipment and a storage medium, so as to solve the technical defects in the prior art.

The embodiment of the invention provides a method for segmenting an image, which comprises the following steps:

Acquiring at least one frame of projection data according to a sequence relation of space and time, and generating a corresponding projection sequence based on the at least one frame of projection data;

inputting the projection sequence into a segmented image model for processing to generate a segmented image sequence;

the segmented image model is obtained after training based on a projection sample sequence, an image sample sequence corresponding to the projection sample sequence and a segmented image sample sequence.

Optionally, the segmented image model includes: a projection feature extraction model, a feature transformation model and a segmentation image generation model;

inputting the projection sequence into a segmented image model for processing, and generating a segmented image sequence, wherein the method comprises the following steps:

inputting the projection sequence to a projection feature extraction model to generate a projection feature sequence;

inputting the projection characteristic sequence into a characteristic transformation model to generate a segmented image characteristic sequence;

and inputting the segmented image feature sequence into a segmented image generation model to generate a segmented image sequence.

Optionally, the projection feature extraction model includes an input layer, an intermediate layer, and an output layer;

inputting the projection sequence to a projection feature extraction model to generate a projection feature sequence, comprising:

And inputting at least one frame of projection data of the projection sequence into an input layer of the projection characteristic extraction model, and taking at least one frame of projection characteristic data output by the middle layer as a corresponding projection characteristic sequence.

Optionally, the projection feature sequence includes at least one frame of projection feature data;

the feature transformation model includes an encoder and a decoder;

inputting the projection feature sequence to a feature transformation model to generate a segmented image feature sequence, comprising:

dividing the at least one frame of projection characteristic data into a plurality of input sequences, wherein each input sequence comprises at least one frame of projection characteristic data;

inputting each input sequence to an encoder for encoding to generate a corresponding encoding vector;

and inputting the coding vector and the decoding reference vector to a decoder for decoding, generating a decoding vector, and generating a segmented image characteristic sequence corresponding to each input sequence based on the decoding vector, wherein the segmented image characteristic sequence comprises at least one frame of segmented image characteristic data.

Optionally, the encoder includes a plurality of coding layers connected in sequence;

inputting each input sequence to an encoder for encoding, generating a corresponding encoding vector, comprising:

Embedding at least one frame of projection characteristic data corresponding to each input sequence to obtain a first embedded vector;

inputting the first embedded vector to a 1 st coding layer to generate a coding vector output by the 1 st coding layer;

and carrying out iterative processing by taking the coding vector output by the ith coding layer as the input vector of the (i+1) th coding layer until the coding vector output by the last coding layer is taken as the coding vector corresponding to the input sequence, wherein i is an integer greater than 1.

Optionally, the decoder includes a plurality of decoding layers connected in sequence;

inputting the encoded vector and the decoded reference vector to a decoder for decoding, generating a decoded vector, comprising:

inputting the decoding reference vector and the coding vector to a 1 st decoding layer to generate a decoding vector output by the 1 st decoding layer;

and carrying out iterative processing by taking the decoding vector output by the j decoding layer and the encoding vector as input vectors of the j+1th decoding layer until the decoding vector output by the last decoding layer is obtained, wherein j is an integer greater than 1.

Optionally, the segmented image feature sequence includes at least one frame of segmented image feature data, and the segmented image generation model includes a convolution layer and a deconvolution layer;

Inputting the segmented image feature sequence to a segmented image generation model to generate a segmented image sequence, comprising:

and inputting at least one frame of segmented image characteristic data into the deconvolution layer, generating at least one frame of corresponding segmented image data, and taking the at least one frame of segmented image data as a segmented image sequence.

The embodiment of the invention provides a device for dividing an image, which comprises the following steps:

the acquisition module is used for acquiring at least one frame of projection data according to the sequence relation of space and time and generating a corresponding projection sequence based on the at least one frame of projection data;

the processing module is used for inputting the projection sequence into the segmented image model for processing to generate a segmented image sequence;

The embodiment of the invention provides an electronic device, which comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor realizes the steps of the method for dividing images when executing the program.

Embodiments of the present invention provide a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of a method of segmenting an image as described above.

According to the method and the device for dividing the image, provided by the embodiment of the invention, at least one frame of projection data is acquired according to the sequence relation of space and time, so that the projection data has the correlation and sequence of space and time in the generation process, a corresponding projection sequence is generated based on the at least one frame of projection data, and then the projection sequence is input into a divided image model for processing to generate a divided image sequence, thereby realizing the direct generation of the divided image sequence from the projection sequence, reducing the workload of subsequent processing and reducing the conversion error.

Drawings

FIG. 1 is a schematic diagram of a segmented image model according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of a projection feature extraction model according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of a transducer module according to an embodiment of the present invention;

FIG. 4 is a flowchart illustrating a method for segmenting an image according to another embodiment of the present invention;

FIG. 5 is a schematic diagram illustrating a process of a method for segmenting an image according to another embodiment of the present invention;

FIG. 6 is a flowchart illustrating a method for segmenting an image according to another embodiment of the present invention;

FIG. 7 is a schematic diagram of an apparatus for dividing an image according to an embodiment of the present invention;

fig. 8 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

The terminology used in the one or more embodiments of the invention is for the purpose of describing particular embodiments only and is not intended to be limiting of the one or more embodiments of the invention. As used in one or more embodiments of the invention and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used in one or more embodiments of the present invention refers to and encompasses any or all possible combinations of one or more of the associated listed items.

It should be understood that, although the terms first, second, etc. may be used in one or more embodiments of the invention to describe various information, these information should not be limited by these terms. These terms are only used to distinguish one type of information from another. For example, a first may also be referred to as a second, and similarly, a second may also be referred to as a first, without departing from the scope of one or more embodiments of the invention. The word "if" as used herein may be interpreted as "at … …" or "at … …" or "responsive to a determination", depending on the context.

First, terms related to one or more embodiments of the present invention will be explained.

Projection sequence: comprising a plurality of frames of projection data, each frame of projection data comprising a plurality of projection data units. For CT imaging, the probe moves linearly along the longitudinal axis of the object at a plurality of positions, and projection data of a plurality of angles are acquired at each position, wherein the projection data of a plurality of angles at 1 position form 1 frame of projection data, and 512-dimensional projection data are generated as a projection data unit at each angle.

Image sequence: the system comprises a plurality of frames of image data, wherein each frame of image data comprises a plurality of image data units. Images may be generated from the stitching of the image sequences.

Transformer model: it is essentially a structure of an Encoder (Encoder) consisting of at least one encoding layer connected in sequence, and a Decoder (Decoder) consisting of at least one decoding layer connected in sequence. As with all generative models, the encoder receives the original input data and outputs the encoded vector to the decoder, which generates the decoded vector and obtains the final output data.

Encoder (encoder): the multi-frame projection data is converted into encoded vectors.

Decoder (decoder): and generating a decoding vector from the encoding vector, and converting the decoding vector into a multi-frame segmentation image characteristic sequence.

Gray scale: the gray scale uses black hues to represent objects, i.e. black with black as reference color, and different saturation levels to display images. Each gray object has a luminance value from 0% (white) to 100% (black) of the gray bar.

Projection feature extraction model: a convolutional network includes an input layer, an intermediate layer, and an output layer. And training by training data consisting of the corresponding projection sample sequence and the corresponding segmentation image sample sequence. Specifically, the target projection sample sequence is obtained by re-projecting according to the divided image sample sequence, and then the projection feature extraction model is trained based on the projection sample sequence and the target projection sample sequence. In the use stage, the data output by the output layer does not need to be obtained, but the projection characteristic data output by the intermediate layer is used as a corresponding projection characteristic sequence. The projection feature extraction model can be of various types, such as ResNet-50 model.

And (3) generating a model by dividing the image: the training method comprises a convolution layer and a deconvolution layer, and training is performed through training data consisting of an image sample sequence and a segmentation image sample sequence which correspond to each other in a training stage. In the using stage, only the deconvolution layer is used to convert the input segmented image characteristic sequence to generate a segmented image sequence. The segmented image generation model may be various, such as deep lab v3+ model.

In the embodiments of the present invention, a training method and apparatus for dividing an image model, a method and apparatus for dividing an image, an electronic device, and a non-transitory computer readable storage medium are provided, and are described in detail in the following embodiments.

The embodiment of the invention discloses a training method for a segmented image model, wherein training data comprises a projection sample sequence, an image sample sequence corresponding to the projection sample sequence and a segmented image sample sequence, so as to perform supervised training on the segmented image model.

Wherein the projection sample sequence comprises a plurality of frames of projection sample data A ₁ 、A ₂ 、……A _m The image sample sequence comprises a plurality of frames of image sample data B ₁ 、B ₂ 、……B _m The divided image sample sequence comprises multiple frames of divided image sample data C ₁ 、C ₂ 、……C _m 。

The projection sample data comprises a plurality of projection sample data units for each frame, a plurality of image sample data units for each frame of image sample data, and a plurality of divided image sample data units for each frame of divided image sample data. The dimensions of the projection sample data unit and the image sample data unit may be the same or different.

It should be noted that the size of the image sample data and the size of the divided image sample data are identical for each frame, except that only the area data of the block marked by the discrete value in the divided image sample data is the image data composed of the continuous gray values.

The embodiment of the invention exemplarily shows the correspondence between a set of projection sample data, image sample data and divided image sample data, as shown in the following table 1:

TABLE 1

As can be seen from table 1, the dimension of each projection sample data unit is 512 dimensions, the dimension of each image sample data unit is 256 dimensions, and the dimension of each divided image sample data unit is 256 dimensions.

Specifically, the segmented image model in the embodiments of the present invention is not a single end-to-end model, but a combination of three models. Referring to fig. 1, the segmented image model includes: a projection feature extraction model, a feature transformation model and a segmented image generation model.

The projection feature extraction model is used for generating a corresponding projection feature sequence according to input projection data.

The feature transformation model is used for generating a corresponding segmented image feature sequence based on the projection feature sequence.

The function of the divided image generation model is to generate a corresponding divided image sequence based on the divided image characteristic data.

In the training stage, the three models are independently trained, and the trained models are combined into an end-to-end model. The training steps of the three models are described in detail below.

First, a training procedure of the projection feature extraction model will be described.

Training data: a projection sample sequence S and a divided image sample sequence G which correspond to each other.

Wherein the projection sample sequence S comprises m frames of projection sample data S ₁ ～S _m The dimension of the projection sample data per frame is 512 dimensions. The divided image sample sequence G comprises m frames of divided image sample data G ₁ ～G _m The dimension of the divided image sample data of each frame is 256 dimensions. As shown in table 2 below:

TABLE 2

Training:

reprojection is performed based on the segmented image sample sequence G to generate a target projection sample feature sequence S _G Then based on the projection sample sequence S and the target projection sample feature sequence S _G And training the projection characteristic extraction model. Wherein the target projects a sample feature sequence S _G Including at least one frame of target projection sample features, as shown in table 2.

The projection feature extraction model is a convolutional network, such as a CNN network, a resnet50 network, and the like. In general, the projection feature extraction model includes: an input layer, an intermediate layer, and an output layer.

Taking a reset 50 network as an example, the structure of the reset 50 network is relatively complex, see fig. 2, where the middle layer includes: 1 convolution layer, 16 building block structures, each layer is in series connection; the output layer is 1 full connection layer. Each building block structure is layered 3 layers, so there are 48 layers in total. In the training stage, 1 convolution layer and 16 building block structures automatically extract and obtain a target projection sample characteristic S' according to an input projection sample sequence S; the full connection layer is used for combining the target projection sample characteristics S 'to generate an initial target projection sample characteristic sequence S' _G . Calculating an initial target projection sample feature sequence S' _G And a target projection sample feature sequence S _G And (3) adjusting parameters of the resnet50 network according to the errors so as to achieve the purpose of training the resnet50 network.

Next, a training procedure of the feature transformation model will be described.

Training data: mutually corresponding projection sample feature sequences S' _G And segmenting the image sample sequence G. Projection sample feature sequence S' _G Comprising at least one frame of projection sample feature data S' _Gm Segmentation of image samplesThe sequence G comprises at least one frame of divided image sample data G _m 。

The training step includes the encoding and decoding processes:

1) Inputting at least one frame of projection sample characteristic data to an encoder to generate a sample coding vector;

2) Inputting the sample coding vector and at least one frame of divided image sample data to a decoder to generate a sample decoding vector;

3) And calculating errors between the sample decoding vectors and vectors corresponding to the split image sample data, and reversely adjusting parameters of the encoder and the decoder based on the errors of the sample decoding vectors and the vectors so that the errors are smaller than a set threshold value.

Specifically, taking a transducer model as an example, the present embodiment describes a transducer model including 6 encoding layers and 6 decoding layers, as shown in fig. 3.

Wherein, at least one frame of projection sample characteristic data is input to an encoder to generate a sample coding vector, comprising the following steps S12 to S16:

s12, embedding the characteristic data of at least one frame of projection sample to obtain a first sample embedded vector.

As shown in fig. 3, a first embedding layer (embedding) is further connected before the 1 st encoding layer to perform embedding processing on the projection sample feature data input to the encoder.

S14, inputting the first sample embedded vector to the 1 st coding layer to generate a sample coded vector output by the 1 st coding layer.

S16, taking the sample code vector output by the ith code layer as the input vector of the (i+1) th code layer for iterative processing until the sample code vector output by the last code layer is taken as the projection sample characteristic sequence S '' _G And (3) corresponding sample coding vectors, wherein i is an integer greater than 1.

As shown in fig. 3, the 1 st coding layer receives the first sample embedded vector and performs coding processing to obtain a sample coded vector output by the 1 st coding layer; the sample code vector output by the 1 st code layer is input to the 2 nd code layer to obtain the 2 nd code layer inputThe output sample codes the vector; the sample code vector output by the 2 nd code layer is input to the 3 rd code layer … … until the sample code vector output by the 6 th code layer is used as a projection sample characteristic sequence S' _G The corresponding samples encode the vector.

The method includes the steps of inputting a sample encoding vector and at least one frame of divided image sample data to a decoder to generate a sample decoding vector, and includes the following steps S22 to S26:

s22, embedding the current frame segmentation image sample data corresponding to the current frame projection sample feature data to obtain a second embedded vector.

As shown in fig. 3, a second embedding layer (embedding) is further connected before the 1 st decoding layer to perform embedding processing on the divided image sample data input to the decoder.

S24, inputting the second embedded vector and the sample coding vector to the 1 st decoding layer to generate a sample decoding vector output by the 1 st decoding layer.

S26, carrying out iterative processing by taking the sample decoding vector output by the j decoding layer and the sample coding vector as the input vector of the j+1th decoding layer until the decoding vector output by the last decoding layer is taken as the sample decoding vector corresponding to the projection sample characteristic data of the current frame, wherein j is an integer greater than 1.

It should be noted that the sample coding vector input to each decoding layer projects sample feature data S 'for all frames' _G1 ～S’ _Gm Instead of the corresponding code vector of the projection sample feature data of the single frame, each decoding results in a sample decoding vector corresponding to the projection sample feature data of the current frame.

As shown in fig. 3, the sample decoding vector and the sample encoding vector output by the 1 st decoding layer are input to the 2 nd decoding layer, so as to obtain the sample decoding vector output by the 2 nd decoding layer; the sample decoding vector and the sample coding vector output by the 2 nd decoding layer are input to the 3 rd decoding layer to obtain a sample decoding vector … … output by the 3 rd decoding layer until the sample decoding vector output by the 6 th decoding layer as a current frame Projection sample feature data S' _Gm The corresponding samples decode the vector.

Then, the projection sample feature data S 'of each frame is calculated' _Gm Corresponding vector and corresponding each frame of divided image sample data G _m And adjusting parameters of each coding layer and decoding layer according to the errors.

The training stop conditions of the feature transformation model may be various, and for example, may be: each frame projects sample feature data S' _Gm Corresponding vector and corresponding each frame of divided image sample data G _m The error between the corresponding vectors is less than a threshold.

Specifically, the threshold value may be set according to actual requirements, for example, 0.77, 0.88, and the like.

Or may be: each frame projects sample feature data S' _Gm Corresponding vector and corresponding each frame of divided image sample data G _m The rate of change of the error between the corresponding vectors is less than a threshold.

The change rate is characterized by the change of the error, which is different from the error value itself, and the change rate of the error is small, which means that the change of the error is relatively stable, and the model training is considered to be completed.

The threshold value of the change rate may be set according to practical situations, for example, the change rate threshold value of the error is set to 0.34, 0.26, or the like.

Again, a training procedure of the divided image generation model will be described.

Training data: in order to increase the robustness of the model, the model is trained by taking an undivided image sample sequence and a noisy segmented image sample sequence as input data and taking the segmented image sample sequence as output data.

The image sample sequence includes at least one frame of image sample data, the noise-added segmented image sample sequence includes at least one frame of noise-added segmented image sample data, and the segmented image sample sequence includes at least one frame of segmented image sample data, as shown in table 1. Specifically, the training data includes a plurality of pairs of training data, each pair of training data being (1 frame of undivided image sample data, 1 frame of divided image sample data) or (1 frame of noisy divided image sample data, 1 frame of divided image sample data).

The segmented image generation model comprises a convolution layer and a deconvolution layer, such as a deep V < 3+ > network, the training process is supervised training, and the specific steps comprise S32-S36:

s32, inputting sample data of each frame of undivided image or noise-added divided image into a convolution layer to generate convolution data;

s34, inputting the convolution data into the deconvolution layer, and generating pre-segmentation image sample data of each frame.

S36, calculating errors of the pre-segmented image sample data and the segmented image sample data, and adjusting parameters of the convolution layer and the deconvolution layer based on the errors so that the errors are smaller than a set threshold.

According to the training method for the segmented image model, the projection characteristic extraction model, the characteristic transformation model and the segmented image generation model are respectively trained through sample data comprising projection sample data, image sample data and segmented image sample data, and then the three models can be spliced in a subsequent using step to obtain the model capable of directly generating the segmented image sequence from the projection sequence.

The embodiment of the invention discloses a method for segmenting an image, referring to fig. 4, the method comprises the following steps of 402-404:

402. at least one frame of projection data is acquired according to a spatial and temporal sequential relationship, and a corresponding projection sequence is generated based on the at least one frame of projection data.

It should be noted that, since the generation of the projection data is generated according to the sequence of the space and the time in actual use, in this embodiment, the generated projection sequence should use the sequence relationship of the space and the time for generating the projection data, so that the projection data in the projection sequence maintains the original inherent relevance.

Taking CT imaging as an example, the probe rotates along the object through a plurality of angles, each angle collecting projection data once, forming a projection data unit, and the sequence of projection columns of the plurality of angles forming a frame of projection data. Further, the probe moves linearly along the longitudinal axis of the object, one frame of projection data is acquired at each position, and finally multi-frame projection data are obtained, so that a 3D projection sequence is obtained.

Thus, projection data can be seen as a sequence of projection columns. Similar to word sequences in natural language processing tasks, projection sequences are also sequential, with their precedence determined by the positional relationship of projection angles.

For example, the probe is rotated 180 degrees along the object at each position, with projection data acquired 1 degree apart as a 512-dimensional projection data unit. Then the resulting frame of projection data comprises 180 x 512 dimensional vectors. The number of the positions of the probe moving linearly along the longitudinal axis of the object is 100, and the final projection sequence is 100×180×512-dimensional vectors.

404. And inputting the projection sequence into a segmented image model for processing to generate a segmented image sequence.

The segmented image model is obtained after training based on a projection sample sequence, and an image sample sequence and a segmented image sample sequence corresponding to the projection sample sequence, and is specifically described in detail in the foregoing embodiments.

Specifically, step 404 includes:

s42, inputting the projection sequence to a projection feature extraction model to generate a projection feature sequence.

Specifically, the projection feature extraction model includes an input layer, an intermediate layer, and an output layer. In the application stage of the segmented image, the data output by the intermediate layer is not required to be output, but the data output by the intermediate layer is projection characteristic data, and a projection characteristic sequence is generated based on the projection characteristic data.

Specifically, the projection sequence includes at least one frame of projection data, the projection feature sequence includes at least one frame of projection feature data, and the projection feature data of the corresponding frame is output by inputting each frame of projection data to the projection feature extraction model.

Step S42 includes: and inputting at least one frame of projection data of the projection sequence into an input layer of the projection characteristic extraction model, and taking at least one frame of projection characteristic data output by the middle layer as a corresponding projection characteristic sequence.

Referring to fig. 2, the middle layer includes 1 convolution layer and 16 building block structures connected in series, and data output by the last building block structure can be used as projection characteristic data.

In the embodiment of the invention, the projection feature extraction model can extract the features of the input projection sequence to generate the projection feature sequence.

S44, inputting the projection characteristic sequence into a characteristic transformation model to generate a segmented image characteristic sequence.

Specifically, the feature transformation model includes an encoder and a decoder, and step S44 includes the following steps S442 to S446:

s442, dividing the at least one frame of projection characteristic data into a plurality of input sequences, wherein each input sequence comprises at least one frame of projection characteristic data.

Of course, the number of frames of projection characteristic data included in each input sequence may be the same as each other or may be different from each other, and may be selected according to actual requirements.

S444, each input sequence is input to an encoder to be encoded, and a corresponding encoding vector is generated.

Specifically, referring to fig. 3, the encoder includes a plurality of encoding layers connected in sequence, and step S444 includes the following steps S4442 to S4446:

s4442, embedding at least one frame of projection characteristic data corresponding to each input sequence to obtain a first embedded vector.

S4444, inputting the first embedded vector to the 1 st coding layer to generate a coding vector output by the 1 st coding layer.

S4446, performing iterative processing by taking the coded vector output by the ith coded layer as the input vector of the (i+1) th coded layer until the coded vector output by the last coded layer is taken as the coded vector corresponding to the input sequence, wherein i is an integer greater than 1.

As shown in fig. 3, the 1 st coding layer receives the third embedded vector for coding processing to obtain a coding vector output by the 1 st coding layer; inputting the coding vector output by the 1 st coding layer to the 2 nd coding layer to obtain the coding vector output by the 2 nd coding layer; the coding vector output by the 2 nd coding layer is input to the 3 rd coding layer … … until the coding vector output by the 6 th coding layer is used as the coding vector corresponding to the input sequence.

S446, inputting the coding vector and the decoding reference vector to a decoder for decoding, generating a decoding vector, and generating a segmented image feature sequence corresponding to each input sequence based on the decoding vector, wherein the segmented image feature sequence comprises at least one frame of segmented image feature data.

In particular, referring to fig. 3, the decoder includes a plurality of decoding layers connected in sequence; in step S446, the encoded vector and the decoded reference vector are input to a decoder to be decoded, and a decoded vector is generated, which includes the following steps S4462 to S4464:

s4462, inputting the decoding reference vector and the coding vector to the 1 st decoding layer to generate a decoding vector output by the 1 st decoding layer.

In this embodiment, the decoded reference vector is an initial reference vector with a predetermined value.

S4464, carrying out iterative processing by taking the decoding vector output by the j decoding layer and the encoding vector as the input vector of the j+1th decoding layer until the decoding vector output by the last decoding layer is obtained, wherein j is an integer greater than 1.

As shown in fig. 3, the decoding vector output by the 1 st decoding layer and the encoding vector are input to the 2 nd decoding layer, so as to obtain the decoding vector output by the 2 nd decoding layer; and inputting the decoding vector and the encoding vector output by the 2 nd decoding layer to the 3 rd decoding layer to obtain decoding vectors … … output by the 3 rd decoding layer and decoding vectors output by the 6 th decoding layer as decoding vectors corresponding to the input sequence.

As shown in table 3, table 3 shows a correspondence relationship between projection feature data and divided image feature data in the use scene in the present embodiment.

TABLE 3 Table 3

Finally, the obtained segmented image feature sequence is { T ] ₁ ，T ₂ ，T ₃ ，……T _m }。

In the embodiment of the invention, the corresponding segmented image characteristic data can be obtained based on the projection characteristic data through the characteristic transformation model, so as to obtain the segmented image characteristic sequence.

After obtaining the segmented image feature data, the segmented image feature data needs to be further subjected to deconvolution processing to obtain corresponding segmented image data.

S46, inputting the segmented image feature sequence into a segmented image generation model to generate a segmented image sequence.

Specifically, the segmented image generation model includes a convolution layer and a deconvolution layer, and in the segmented image application stage, only the deconvolution layer is used for processing.

Wherein the segmented image feature sequence includes at least one frame of segmented image feature data, step S46 includes: and inputting at least one frame of segmented image characteristic data into the deconvolution layer, generating at least one frame of corresponding segmented image data, and taking the at least one frame of segmented image data as a segmented image sequence.

According to the method for dividing the image, provided by the embodiment of the invention, at least one frame of projection data is acquired according to the sequence relation of space and time, so that the projection data has the correlation and sequence of space and time in the generation process, a corresponding projection sequence is generated based on the at least one frame of projection data, and then the projection sequence is input into a divided image model for processing to generate a divided image sequence, thereby realizing the direct generation of the divided image sequence from the projection sequence, reducing the workload of subsequent processing and reducing the conversion error.

In order to further understand the technical scheme of the present invention, the embodiment of the present invention also discloses a method for segmenting an image for CT imaging, referring to fig. 5, fig. 5 is a schematic diagram of a method for segmenting an image for CT imaging according to an embodiment of the present invention. The method of this embodiment, see fig. 6, includes the following steps 602-606:

602. And generating a projection sequence according to at least one frame of projection data acquired according to the sequence relation of space and time in CT imaging.

For the explanation of the projection data and the projection sequence, reference is made to the foregoing embodiments, and no further description is given here.

Taking X-rays as an example, the X-rays of a traditional CT are all fan-shaped, the detectors are all linear arrays, and the projection data of each projection is only the absorption result of all substances on the cross section on the X-rays, which is equivalent to the result of summing along the direction on the image. In order to acquire the internal cross section of the human body, the CT machine rotates around the human body, so that the rotation is to project towards all directions, and the obtained projection data is used for reconstructing the image of the cross section, thereby realizing the tomographic scanning. To obtain lateral data, it is moved laterally from head to foot. Therefore, CT technology is applied where a person is required to lie there while slowly advancing inside, and the machine is also rotated around the person to achieve a spiral advance.

Typically, projection data is represented by CT values. CT is a unit of measure for determining the density of a certain local tissue or organ of a human body, commonly known as Hensfield Unit (HU). The CT values of the common substances are shown in Table 4.

TABLE 4 Table 4

Referring to fig. 5, in fig. 5, when CT imaging, the probe rotates along the object by a plurality of angles, and projection data is collected once for each angle as projection data units, and a sequence of the projection data units of the plurality of angles forms a frame of two-dimensional projection data. For example, the projection data collected at each angle is a 512-dimensional vector, and the rotation angle of the probe at each section is 180 degrees, and for the section, the corresponding projection data is 180×512-dimensional vector.

Further, the probe moves linearly along the longitudinal axis of the object, a frame of two-dimensional projection data is acquired at each section, and a 3D projection sequence is finally formed. Thus, the projection data of CT can be seen as a sequence of projection matrices of multiple angles. The number of the sections can be set according to actual requirements. For example, a probe provided with CT will move linearly by 100 positions along the longitudinal axis of the object, and the corresponding cross-section is 100, and the projection sequence finally generated is a vector matrix of 100×180×512 dimensions.

The 3-frame projection data F of 3 positions is exemplarily shown in FIG. 5 ₁ 、F ₂ And F ₃ Each frame of projection data comprises 180 x 512 dimensional vectors.

604. And inputting the projection sequence to a projection feature extraction model to generate a projection feature sequence.

Specifically, step 604 includes: and inputting the 100 frames of projection data of the projection sequence to an input layer of the projection feature extraction model, and taking the 100 frames of projection feature data output by the middle layer as a corresponding projection feature sequence.

The 3-frame projection characteristic data F 'is shown in FIG. 5' ₁ 、F’ ₂ And F' ₃ The projection profile data for each frame comprises 180 x 256 dimensional vectors.

For a specific explanation of step 604, refer to the description of the foregoing embodiment, and will not be repeated here.

606. And inputting the projection characteristic sequence into a characteristic transformation model to generate a segmented image characteristic sequence.

And sequentially inputting the 100 frames of projection characteristic data to an encoder and a decoder of the characteristic transformation model, generating one frame of segmented image characteristic data corresponding to each frame of projection characteristic data, and finally obtaining 100 frames of segmented image characteristic data.

The 3-frame projection characteristic data F 'is shown in FIG. 5' ₁ 、F’ ₂ And F' ₃ Corresponding 3-frame segmentation image characteristic data G' ₁ 、G’ ₂ And G' ₃ Each frame of the segmented image feature data comprises 100 x 128 dimensional vectors.

For a specific explanation of step 606, refer to the description of the foregoing embodiments, and will not be repeated here.

608. And inputting the characteristic sequence of the segmented image into a segmented image generation model to generate a segmented image sequence.

Specifically, step 608 includes: inputting at least one frame of segmented image characteristic data into a deconvolution layer of a segmented image generation model, generating at least one frame of corresponding segmented image data, and taking the at least one frame of segmented image data as a segmented image sequence.

Fig. 5 shows 3-frame-divided image feature data G' ₁ 、G’ ₂ And G' ₃ Corresponding 3-frame divided image data G ₁ 、G ₂ And G ₃ Each frame of divided image data includes 64 x 128 dimensional vectors.

In addition, in this embodiment, after the output layer of the projection feature extraction model is removed and the convolution layer of the segmented image generation model is removed, the end-to-end segmented image model of this embodiment may be further formed by using a small amount of sample data transfer learning to implement optimal reconstruction of a specific CT image, for example, optimal reconstruction of CT images of organs such as a heart, a lung, etc. of the chest, and organs such as a stomach, a liver, etc. of the abdomen.

In the process of transfer learning of the sample data, the three models are not required to be trained respectively, but the whole segmented image model is subjected to overall supervised training. Since the three models have been trained by the training data, a large amount of sample data is not required to be trained in the process of transfer learning, but a small amount of sample data is required.

According to the method for dividing the image, provided by the embodiment of the invention, at least one frame of projection data is acquired according to the sequence relation of space and time, so that the projection data has the correlation and sequence of space and time in the generation process, a corresponding projection sequence is generated based on the at least one frame of projection data, and then the projection sequence is input into a divided image model for processing to generate a divided image sequence, so that the divided image sequence is directly generated from the projection sequence in CT imaging, the workload of subsequent processing is reduced, and the conversion error is reduced.

The image segmentation apparatus according to the embodiment of the present invention is described below, and the image segmentation apparatus described below and the image segmentation method described above may be referred to correspondingly.

The embodiment of the invention discloses a device for dividing an image, which is shown in fig. 7, and comprises:

the acquisition module 702 is configured to acquire at least one frame of projection data according to a spatial and temporal sequential relationship, and generate a corresponding projection sequence based on the at least one frame of projection data;

the processing module 704 is configured to input the projection sequence into a segmented image model for processing, so as to generate a segmented image sequence;

the processing module 704 includes:

the first processing unit is used for inputting the projection sequence into a projection feature extraction model to generate a projection feature sequence;

the second processing unit is used for inputting the projection characteristic sequence into a characteristic transformation model to generate a segmented image characteristic sequence;

and the third processing unit is used for inputting the segmented image characteristic sequence into a segmented image generation model to generate a segmented image sequence.

the first processing unit is specifically configured to: and inputting at least one frame of projection data of the projection sequence into an input layer of the projection characteristic extraction model, and taking at least one frame of projection characteristic data output by the middle layer as a corresponding projection characteristic sequence.

Optionally, the projection feature sequence includes at least one frame of projection feature data, and the feature transformation model includes an encoder and a decoder;

A second processing unit comprising:

a dividing subunit configured to divide the at least one frame of projection feature data into a plurality of input sequences, wherein each input sequence includes at least one frame of projection feature data;

the coding subunit is used for inputting each input sequence to the coder for coding and generating a corresponding coding vector;

and the decoding subunit is used for inputting the coding vector and the decoding reference vector into a decoder for decoding, generating a decoding vector, and generating a segmented image characteristic sequence corresponding to each input sequence based on the decoding vector, wherein the segmented image characteristic sequence comprises at least one frame of segmented image characteristic data.

Optionally, the encoder includes a plurality of coding layers connected in sequence, and a coding subunit is specifically configured to:

Optionally, the decoder includes a plurality of decoding layers connected in sequence, and a decoding subunit, specifically configured to:

a third processing unit for: and inputting at least one frame of segmented image characteristic data into the deconvolution layer, generating at least one frame of corresponding segmented image data, and taking the at least one frame of segmented image data as a segmented image sequence.

The device for dividing the image acquires at least one frame of projection data according to the sequence relation of space and time, so that the projection data has the correlation and sequence of space and time in the generating process, generates a corresponding projection sequence based on the at least one frame of projection data, inputs the projection sequence into a divided image model for processing, and generates a divided image sequence, thereby realizing the direct generation of the divided image sequence from the projection sequence, reducing the workload of subsequent processing and reducing the conversion error.

Fig. 8 illustrates a physical structure diagram of an electronic device, as shown in fig. 8, which may include: processor 810, communication interface (Communications Interface) 820, memory 830, and communication bus 840, wherein processor 810, communication interface 820, memory 830 accomplish communication with each other through communication bus 840. The processor 810 may invoke logic instructions in the memory 830 to perform a method of segmenting an image, including:

Further, the logic instructions in the memory 830 described above may be implemented in the form of software functional units and may be stored in a computer-readable storage medium when sold or used as a stand-alone product. Based on this understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

In another aspect, embodiments of the present invention further provide a computer program product, including a computer program stored on a non-transitory computer readable storage medium, the computer program including program instructions which, when executed by a computer, enable the computer to perform the method for segmenting an image provided by the above method embodiments, including:

In still another aspect, an embodiment of the present invention further provides a non-transitory computer readable storage medium having stored thereon a computer program, which when executed by a processor, is implemented to perform the method for segmenting an image provided in the above embodiments, including:

The apparatus embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.

From the above description of the embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by means of software plus necessary general hardware platforms, or of course may be implemented by means of hardware. Based on this understanding, the foregoing technical solution may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a computer readable storage medium, such as ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method described in the respective embodiments or some parts of the embodiments.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

While the invention has been described in detail in the foregoing general description and with reference to specific embodiments thereof, it will be apparent to one skilled in the art that modifications and improvements can be made thereto. Accordingly, such modifications or improvements may be made without departing from the spirit of the invention and are intended to be within the scope of the invention as claimed.

Claims

1. A method of segmenting an image, the method comprising:

The segmented image model is obtained after training based on a projection sample sequence, an image sample sequence corresponding to the projection sample sequence and a segmented image sample sequence;

wherein the segmented image model comprises: a projection feature extraction model, a feature transformation model and a segmentation image generation model; in the training stage, respectively training the three models independently, and combining the trained models into an end-to-end model;

the feature transformation model includes an encoder and a decoder;

the segmented image generation model comprises a convolution layer and a deconvolution layer, and in the using stage, only the deconvolution layer is used for converting an input segmented image characteristic sequence to generate a segmented image sequence;

2. The method of segmenting an image according to claim 1, wherein the projection feature extraction model comprises an input layer, an intermediate layer and an output layer;

3. The method of claim 1, wherein the projection signature sequence includes at least one frame of projection signature data;

4. A method of segmenting an image according to claim 3, wherein the encoder comprises a plurality of coding layers connected in sequence;

5. A method of segmenting an image according to claim 3, wherein the decoder comprises a plurality of sequentially connected decoding layers;

6. The method of claim 1, wherein the segmented image feature sequence comprises at least one frame of segmented image feature data;

7. An apparatus for segmenting an image, comprising:

The feature transformation model includes an encoder and a decoder;

the processing module comprises:

8. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor performs the steps of the method of segmenting an image according to any one of claims 1 to 6 when the program is executed.

9. A non-transitory computer readable storage medium having stored thereon a computer program, which when executed by a processor performs the steps of the method of segmenting an image according to any of claims 1 to 6.