CN116402779A

CN116402779A - Cervical vertebra image segmentation method and device based on deep learning attention mechanism

Info

Publication number: CN116402779A
Application number: CN202310339496.4A
Authority: CN
Inventors: 张逸凌; 刘星宇
Original assignee: Longwood Valley Medtech Co Ltd
Current assignee: Longwood Valley Medtech Co Ltd
Priority date: 2023-03-31
Filing date: 2023-03-31
Publication date: 2023-07-07
Anticipated expiration: 2043-03-31
Also published as: CN116402779B

Abstract

The invention provides a cervical vertebra image segmentation method and device based on a deep learning attention mechanism, wherein the method comprises the following steps: acquiring a medical image dataset to be processed; inputting the medical image data set to be processed into a deep learning network model, and carrying out attention mechanism feature fusion extraction on the coded image and the decoded image corresponding to each medical image based on a plurality of DSA attention mechanism modules in the deep learning network model to obtain a plurality of fused feature fusion images; and based on the SMSA attention mechanism module, performing attention mechanism feature fusion extraction on the target feature fusion image obtained after the splicing treatment of the plurality of feature fusion images, and obtaining a cervical vertebra image segmentation result after convolution operation. According to the cervical vertebra image segmentation method and device based on the deep learning attention mechanism, provided by the invention, the obtained cervical vertebra image segmentation result is relatively accurate, and the segmentation result of the cervical vertebra image is improved.

Description

Cervical vertebra image segmentation method and device based on deep learning attention mechanism

Technical Field

The invention relates to the field of artificial intelligence, in particular to a cervical vertebra image segmentation method and device based on a deep learning attention mechanism.

Background

With the continuous development of technology, artificial intelligence technology is increasingly applied to the medical field. Taking cervical vertebra as an example, the introduction of artificial intelligence technology saves a lot of time for doctors and improves the operation efficiency. However, due to the diversity of cervical spondylosis, the accuracy of cervical vertebra segmentation is not high easily, and especially in cervical vertebra detail characteristics, segmentation deviation is large, so that poor experience is caused.

Therefore, how to solve the above-mentioned problems is considered.

Disclosure of Invention

The invention provides a cervical vertebra image segmentation method and device based on a deep learning attention mechanism, which are used for solving the problems.

In a first aspect, the present invention provides a cervical vertebra image segmentation method based on a deep learning attention mechanism, including: acquiring a medical image data set to be processed, wherein each medical image in the medical image data set comprises a cervical vertebra image area;

inputting the medical image data set to be processed into a deep learning network model, and carrying out attention mechanism feature fusion extraction on the coded image and the decoded image corresponding to each medical image based on a plurality of double-space multiple self-attention DSA attention mechanism modules in the deep learning network model to obtain a plurality of fused feature fusion images;

And based on a scale multiple self-attention SMSA attention mechanism module in the deep learning network model, performing attention mechanism feature fusion extraction on the target feature fusion image obtained after the splicing processing of the feature fusion images, and obtaining a cervical vertebra image segmentation result after convolution operation.

Optionally, the deep learning network model includes five layers of network structures, and DSA attention mechanism modules are arranged on the second layer of network structure to the fourth layer of network structure;

the fifth layer network structure is used for decoding and convoluting the coded medical image of the fourth layer network structure subjected to the downsampling process to obtain a fifth characteristic image;

the fourth layer network structure is used for carrying out attention mechanism feature fusion extraction on the fifth feature image and the coded medical image subjected to downsampling treatment of the third layer network structure to obtain a fourth feature fusion image;

the third layer network structure is used for carrying out attention mechanism feature fusion extraction on the fourth feature fusion image and the coded medical image subjected to downsampling treatment of the second layer network structure to obtain a third feature fusion image;

the second-layer network structure is used for carrying out attention mechanism feature fusion extraction on the third feature fusion image and the coded medical image subjected to downsampling treatment by the first-layer network structure to obtain a second feature fusion image;

The first layer network structure is used for respectively carrying out coding processing and decoding processing on each medical image in the input medical image data set to be processed, and carrying out splicing processing on the characteristic image obtained by the decoding processing and the characteristic image obtained by the up-sampling of the second characteristic fusion image to obtain a first characteristic fusion image.

Optionally, the DSA attention mechanism module includes: a large-scale self-attention feature extraction unit and a small-scale self-attention feature extraction unit;

the large-scale self-attention feature extraction unit and the small-scale self-attention feature extraction unit are respectively used for carrying out large-scale self-attention feature extraction and small-scale self-attention feature extraction on the coded medical images from the second layer network structure to the fourth layer network structure to obtain a first feature coefficient and a second feature coefficient;

and obtaining the second feature fusion image to the fourth feature fusion image based on the first feature coefficient, the second feature coefficient and the fifth feature image.

Optionally, the downsampled encoded images in the second-layer network structure to the fifth-layer network structure further include, before performing the attention mechanism feature fusion extraction:

And performing 3*3 convolution operation, batch normalization operation and activation function processing on the coded images respectively so that the size of each coded image is the same as the size of the corresponding feature fusion image or the feature image.

Optionally, the SMSA attention mechanism module includes: a width multilayer sensor and a height multilayer sensor;

the width convolution multi-layer perceptron and the height convolution multi-layer perceptron are respectively used for performing width processing and height processing on the target feature fusion image to obtain a third feature coefficient and a fourth feature coefficient, and performing attention mechanism feature fusion extraction on the target feature fusion image based on the third feature coefficient and the fourth feature coefficient.

Optionally, the third characteristic coefficient is calculated based on the following manner:

the fourth characteristic coefficient is calculated based on the following mode:

the feature map obtained after the target feature fusion image is subjected to attention mechanism feature fusion extraction is obtained by calculation based on the following mode:

wherein W is _q 、W _k 、W _v Weights of Q (Query), K (Key) and V (Value), respectively, D is a target feature fusion image, and MLP ₁ To perform the MLP operation with D in width, MLP ₂ In order to perform the MLP operation on D at a high level, the MLP is to perform the MLP operation on D directly,

is a dot product operation.

Optionally, the segmentation loss function adopted by the deep learning network model includes at least one of the following:

CELoss loss function; diceLoss loss function.

In a second aspect, the present invention provides a cervical vertebra image segmentation apparatus based on a deep learning attention mechanism, comprising:

the acquisition module is used for acquiring a medical image data set to be processed, wherein each medical image in the medical image data set comprises a cervical vertebra image area;

the processing module is used for inputting the medical image data set to be processed into a deep learning network model, and carrying out attention mechanism feature fusion extraction on the coded image and the decoded image corresponding to each medical image based on a plurality of double-space multiple self-attention DSA attention mechanism modules in the deep learning network model to obtain a plurality of fused feature fusion images;

the processing module is also used for carrying out attention mechanism feature fusion extraction on the target feature fusion image obtained after the splicing processing of the feature fusion images based on the scale multiple self-attention SMSA attention mechanism module in the deep learning network model, and obtaining a cervical vertebra image segmentation result after convolution operation.

In a third aspect, the present invention provides an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing a cervical spine image segmentation method based on a deep learning attention mechanism as described above when executing the program.

In a fourth aspect, the present invention provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements a cervical spine image segmentation method based on a deep learning attention mechanism as described above.

The technical scheme of the invention has at least the following beneficial effects:

according to the cervical vertebra image segmentation method and device based on the deep learning attention mechanism, attention mechanism feature fusion extraction is carried out on the coded images with different sizes and the corresponding decoded images through the plurality of DSA attention mechanism modules, feature information under different sizes corresponding to the medical images to be processed can be obtained, and further the feature information reserved by each feature fusion image in the obtained feature fusion images is different. Since the feature information retained in each feature fusion image is also different, by extracting the feature information in different sizes, as much feature information as possible can be retained. And then, the SMSA attention mechanism module performs attention mechanism feature fusion extraction on the target feature fusion image obtained after the splicing treatment of the plurality of feature fusion images, and after convolution operation, the segmentation result of the cervical vertebra image is more accurate, and the segmentation result of the cervical vertebra image is improved.

Drawings

In order to more clearly illustrate the invention or the technical solutions of the prior art, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are some embodiments of the invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic flow chart of a cervical vertebra image segmentation method based on a deep learning attention mechanism;

FIG. 2 is a schematic diagram of a deep learning model according to the present invention;

FIG. 3 is a schematic diagram of a DSA attention mechanism module according to the present invention;

fig. 4 is a schematic structural diagram of an SMSA attention mechanism module according to the present invention;

FIG. 5 is a schematic view of a cervical vertebra segmentation result provided by the invention;

FIG. 6 is a schematic block diagram of a cervical vertebra image segmentation apparatus based on a deep learning attention mechanism according to the present invention;

fig. 7 is a schematic structural diagram of an electronic device according to the present invention.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

The terms "first," "second," "third," "fourth" and the like in the description and in the claims and in the above drawings, if any, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the invention described herein may be implemented in sequences other than those illustrated or otherwise described herein.

It should be understood that, in various embodiments of the present invention, the sequence number of each process does not mean that the execution sequence of each process should be determined by its functions and internal logic, and should not constitute any limitation on the implementation process of the embodiments of the present invention.

It should be understood that in the present invention, "comprising" and "having" and any variations thereof are intended to cover non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements that are expressly listed or inherent to such process, method, article, or apparatus.

It should be understood that in the present invention, "plurality" means two or more. "and/or" is merely an association relationship describing an association object, and means that three relationships may exist, for example, and/or B may mean: a exists alone, A and B exist together, and B exists alone. The character "/" generally indicates that the context-dependent object is an "or" relationship. "comprising A, B and C", "comprising A, B, C" means that all three of A, B, C comprise, "comprising A, B or C" means that one of the three comprises A, B, C, and "comprising A, B and/or C" means that any 1 or any 2 or 3 of the three comprises A, B, C.

It should be understood that in the present invention, "B corresponding to a", "a corresponding to B", or "B corresponding to a" means that B is associated with a, from which B can be determined. Determining B from a does not mean determining B from a alone, but may also determine B from a and/or other information. The matching of A and B is that the similarity of A and B is larger than or equal to a preset threshold value.

As used herein, "if" may be interpreted as "at … …" or "at … …" or "in response to a determination" or "in response to detection" depending on the context.

The technical scheme of the invention is described in detail below by specific examples. The following embodiments may be combined with each other, and some embodiments may not be repeated for the same or similar concepts or processes.

Referring to fig. 1, the invention provides a cervical vertebra image segmentation method based on a deep learning attention mechanism, which comprises the following steps:

s11: a medical image dataset to be processed is acquired, each medical image in the medical image dataset comprising a cervical image region.

The image data in the medical image data set is in the DICOM format, and is widely used in the field of radiology.

S12: inputting the medical image data set to be processed into a deep learning network model, and carrying out attention mechanism feature fusion extraction on the coded image and the decoded image corresponding to each medical image based on a plurality of double-space multiple self-attention (Dual Spatial Multi Self Attention, DSA) attention mechanism modules in the deep learning network model to obtain a plurality of fused feature fusion images.

In an alternative embodiment, the deep learning network model employs a V-type structure. The DSA attention mechanism module is provided with a plurality of, and the sizes of the coded images processed by different DSA attention mechanism modules are not the same. Therefore, the attention mechanism feature fusion extraction is carried out on the coded images with different sizes and the corresponding decoded images through different DSA attention mechanism modules, so that the feature information under different sizes corresponding to the medical images to be processed can be obtained. The feature information reserved by the feature fusion images under different sizes is also different through the extraction of the feature information under various sizes.

S13: and based on a scale multiple self-attentiveness (Scale Multi Self Attention, SMSA) attentiveness mechanism module in the deep learning network model, performing attentiveness mechanism feature fusion extraction on the target feature fusion image obtained after the splicing treatment of the feature fusion images, and obtaining a cervical vertebra image segmentation result after convolution operation.

Before the stitching process is performed on the plurality of feature fusion images, an upsampling process is performed on a portion of the plurality of feature fusion images so that the size of each feature image after the upsampling process is the same as the size of the original image (image to be processed).

According to the cervical vertebra image segmentation method based on the deep learning attention mechanism, attention mechanism feature fusion extraction is carried out on the coded images with different sizes and the corresponding decoded images through the plurality of DSA attention mechanism modules, feature information under different sizes corresponding to the medical images to be processed can be obtained, and further the feature information reserved by each feature fusion image in the obtained feature fusion images is different. Since the feature information retained in each feature fusion image is also different, by extracting the feature information in different sizes, as much feature information as possible can be retained. And then, the SMSA attention mechanism module performs attention mechanism feature fusion extraction on the target feature fusion image obtained after the splicing treatment of the plurality of feature fusion images, and after convolution operation, the segmentation result of the cervical vertebra image is more accurate, and the segmentation result of the cervical vertebra image is improved.

As an example, as shown in fig. 2, a schematic structural diagram of a deep learning network model is provided, in fig. 2, an arrow numbered 1 represents a conv3x3+bn+relu operation, an arrow numbered 2 represents a Max Pooling operation, an arrow numbered 3 represents a conv1x1+bn+relu operation, an arrow numbered 4 represents a splice Concat operation, and an arrow numbered 5 represents an upsampling Upsample operation.

The deep learning network model comprises five layers of network structures (marked in fig. 2), and DSA attention mechanism modules are arranged on the second layer of network structure to the fourth layer of network structure. Wherein one layer of the input medical image data set is a first layer network structure, and the second layer network structure to the fifth layer network structure are arranged downwards in sequence. The size of the image of the first layer network structure is the largest, and the coded images of the second layer network structure to the fifth layer network structure are respectively obtained by downsampling the coded image of the upper layer adjacent to the coded image, namely the sizes of the images in the network structures of different layers are not the same.

The fifth layer network structure is used for decoding and convolving the coded medical image of the fourth layer network structure subjected to the downsampling process to obtain a fifth characteristic image.

Specifically, conv3x3+bn+relu processing, decoding processing and conv1x1+bn+relu processing are sequentially performed on the encoded medical image, and a fifth feature image is obtained.

The fourth layer network structure is used for carrying out attention mechanism feature fusion extraction on the fifth feature image and the coded medical image subjected to downsampling processing by the third layer network structure to obtain a fourth feature fusion image.

The DSA attention mechanism module in the fourth layer network structure performs upsampling processing on the fifth feature image to make the size of the fifth feature image consistent with the size of the encoded medical image of the third layer network structure after the downsampling processing.

And the third layer network structure is used for carrying out attention mechanism feature fusion extraction on the fourth feature fusion image and the coded medical image subjected to downsampling processing by the second layer network structure to obtain a third feature fusion image.

The DSA attention mechanism module in the third layer network structure performs upsampling processing on the fourth feature fusion image, so that the size of the coded medical image of the fourth feature fusion image and the size of the coded medical image of the second layer network structure which are subjected to downsampling processing are kept consistent.

And the second-layer network structure is used for carrying out attention mechanism feature fusion extraction on the third feature fusion image and the coded medical image subjected to downsampling processing by the first-layer network structure to obtain a second feature fusion image.

The DSA attention mechanism module in the second-layer network structure performs upsampling processing on the third feature fusion image, so that the sizes of the third feature fusion image and the coded medical image of the first-layer network structure subjected to downsampling processing are kept consistent.

It should be noted that, by setting DSA attention mechanism modules in the second-layer network structure to the fourth-layer network structure, attention mechanism fusion can be performed on images with different sizes in the second-layer network structure to the fourth-layer network structure, and more characteristic information is reserved, so that the obtained cervical vertebra image segmentation result is more accurate, and the accuracy of the cervical vertebra image segmentation result is improved.

As illustrated in fig. 3, the DSA attention mechanism module includes: a large-scale self-attention feature extraction unit and a small-scale self-attention feature extraction unit;

It should be noted that the first feature coefficient may also be referred to as a large-scale self-attention coefficient, and the second feature coefficient may also be referred to as a small-scale self-attention coefficient. The DSA attention mechanism module will be described below with reference to fig. 3, taking a fourth layer network structure and a fifth layer network structure as examples.

In the fourth-layer network structure and the fifth-layer network structure, the inputs of the DSA attention mechanism module are the encoded medical image processed by downsampling and the fifth feature image processed by decoding of the third-layer network structure, respectively, and for convenience of description, the two are simply referred to as an encoding part and a decoding part, respectively.

The coding part is subjected to two-scale feature extraction, namely large-scale self-attention feature extraction and small-scale self-attention feature extraction, and is subjected to further feature fusion extraction with the decoding part after the feature extraction of the two scales of features. Through the feature extraction of two scales, the capability of carrying out multi-scale deep information extraction on the feature map is realized.

Specifically, when small-scale self-attention characteristic feature extraction is performed on the coding part, Q (Query) and K (Key) operations are performed, and then dot product operations are performed to obtain small-scale self-attention coefficients, wherein the dot product operations are performed by

And (3) representing. Next, up-sampling (indicated by an arrow with a number of 1 in the figure) is performed on the decoding portion, then V (Value) is performed, and matrix multiplication is performed on the V (Value) and the small-scale self-attention coefficient, so as to obtain a first feature map extracted by the small-scale self-attention feature extraction unit.

Next, after Max Pooling (indicated by an arrow with a number of 2 in the figure) operation is performed on the coding portion, Q (Query) and K (Key) operations are performed respectively, and then dot product operation is performed, so as to obtain a large-scale self-attention coefficient. And then V (Value) operation is carried out on the decoding part, matrix multiplication operation is carried out on the decoding part and the large-scale self-attention coefficient, and the second characteristic diagram is obtained after up-sampling operation. And (3) performing Concat (indicated by an arrow with the number of 3) operation on the first feature map and the second feature map, and performing Conv1x1+BN+Relu (indicated by an arrow with the number of 4) to obtain a fourth feature fusion image.

Further, the small-scale self-attention feature extraction part performs Q (Query), K (Key) and V (Value) operations on the decoding layer and the encoding layer after upsampling, and the specific calculation is as follows:

Q＝W _q ·E _i ；K＝W _k ·E _i ；V＝W _v ·(UpSample(D _i-1 ))。

wherein W is _q 、W _k 、W _v Weights of Q (Query), K (Key), V (Value), E _i Coding layer feature map for ith layer, D _i-1 For the i-1 layer decoding layer feature map, upsample (·) is an upsampling operation.

The large-scale self-attention characteristic extraction part is to perform Q, K, V operation on the coding layer and the decoding layer after downsampling, and perform upsampling operation again after finishing characteristic calculation to obtain a new characteristic diagram. The specific calculation is as follows:

Q＝W _q ·(Maxpooling(E _i ))；K＝W _k ·(Maxpooling(E _i ))；

V＝W _v ·D _i-1

wherein W is _q 、W _k 、W _v Weights of Q (Query), K (Key), V (Value), E _i Coding layer feature map for ith layer, D _i-1 For the i-1 layer decoding layer feature map, maxpooling (·) is a downsampling operation, i.e., a max pooling operation.

And then, carrying out Concat operation of the features on the feature map extracted in the size scale, and carrying out Conv1x1+BN+Relu operation on the new feature map to obtain the final decoding features.

For example, the downsampled encoded images in the second-layer network structure to the fifth-layer network structure further include, before performing the attention mechanism feature fusion extraction:

And when the size of each coded image is the same as the corresponding feature fusion image or the size of the feature image, feature fusion processing is conveniently carried out on the feature fusion image with the same size.

As shown in fig. 4, the SMSA attention mechanism module includes: a width multilayer sensor and a height multilayer sensor;

Alternatively, the third characteristic coefficient may also be referred to as a width self-attention coefficient C1, and the fourth characteristic coefficient may also be referred to as a height self-attention coefficient C2. And when the width processing is carried out on the target feature fusion image, carrying out Q (Query) operation, K (Key) operation and dot product operation to obtain a width self-attention coefficient C1. And when the target feature fusion image is subjected to the height processing, performing Q (Query) operation, K (Key) operation and dot product operation to obtain a high self-attention coefficient C2. And performing V (Value) operation on the target feature fusion image, performing dot product operation on the target feature fusion image and the width self-attention coefficient C1 and the height self-attention coefficient C2, and further performing Conv3x3+BN+Relu (indicated by an arrow with the number of 1 in FIG. 4) and Conv1x1+BN+Relu (indicated by an arrow with the number of 2 in FIG. 4) on the obtained feature fusion image to finally obtain a cervical vertebra image segmentation result (as shown in FIG. 5, the region encircled by a line frame in FIG. 5 is the segmented cervical vertebra part).

Further, the third characteristic coefficient is calculated based on the following manner:

is a dot product operation.

Illustratively, the segmentation loss function employed by the deep-learning network model includes at least one of:

CELoss loss function; diceLoss loss function.

Wherein, the expression of the CELoss loss function is:

CELoss＝-[y log y′+(1-y)log(1-y′)]；

the expression of the DiceLoss loss function is:

if the deep learning network model adopts the two loss functions, the expression of the loss functions is:

Loss＝α·CELoss+(1-α)·DiceLoss

wherein y is a tag value, y' is a predicted value, and α is a loss weight coefficient.

Next, see fig. 6. Based on the same technical conception as the method, another aspect of the invention provides a cervical vertebra image segmentation device based on a deep learning attention mechanism. The functions of the cervical vertebra image segmentation device are the same as those of the method, and are not repeated here.

The cervical vertebra image segmentation device based on the deep learning attention mechanism comprises:

an acquisition module 61, configured to acquire a medical image dataset to be processed, where each medical image in the medical image dataset includes a cervical vertebra image region;

the processing module 62 is configured to input the medical image dataset to be processed into a deep learning network model, and perform attention mechanism feature fusion extraction on the encoded image and the decoded image corresponding to each medical image based on a plurality of dual-space multiple self-attention DSA attention mechanism modules in the deep learning network model, so as to obtain a plurality of fused feature fusion images;

The deep learning network model comprises five layers of network structures, wherein DSA attention mechanism modules are arranged on the second layer of network structure to the fourth layer of network structure;

Illustratively, the DSA attention mechanism module includes: a large-scale self-attention feature extraction unit and a small-scale self-attention feature extraction unit;

Illustratively, the processing module 62 is further configured to, before performing the attention mechanism feature fusion extraction, perform the following processing on the downsampled encoded images in the second-layer network structure to the fifth-layer network structure:

Illustratively, the SMSA attention mechanism module includes: a width multilayer sensor and a height multilayer sensor;

For example, the third characteristic coefficient is calculated based on the following manner:

is a dot product operation.

CELoss loss function; diceLoss loss function.

Referring to fig. 7, still another embodiment of the present invention provides an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing a cervical vertebrae image segmentation method based on a deep learning attention mechanism as described above when executing the program.

Optionally, the electronic device may include: processor 710, communication interface (Communications Interface) 720, memory 730, and communication bus 740, wherein processor 710, communication interface 720, memory 730 communicate with each other via communication bus 740. Processor 710 may invoke logic instructions in memory 730 to perform the cervical spine image segmentation method based on the deep learning attention mechanism provided by the methods described above.

Further, the logic instructions in the memory 730 described above may be implemented in the form of software functional units and may be stored in a computer readable storage medium when sold or used as a stand alone product. Based on this understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

Yet another embodiment of the present invention provides a non-transitory computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements a cervical spine image segmentation method based on a deep learning attention mechanism as described above.

The computer readable storage medium may be a tangible device that can hold and store instructions for use by an instruction execution device. The computer readable storage medium may be, for example, but not limited to, an electronic storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer-readable storage medium would include the following: portable computer disks, hard disks, random Access Memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), static Random Access Memory (SRAM), portable compact disk read-only memory (CD-ROM), digital Versatile Disks (DVD), memory sticks, floppy disks, mechanical coding devices, punch cards or in-groove structures such as punch cards or grooves having instructions stored thereon, and any suitable combination of the foregoing. Computer-readable storage media, as used herein, are not to be construed as transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through waveguides or other transmission media (e.g., optical pulses through fiber optic cables), or electrical signals transmitted through wires.

The computer readable program instructions described herein may be downloaded from a computer readable storage medium to a respective computing/processing device or to an external computer or external storage device over a network, such as the internet, a local area network, a wide area network, and/or a wireless network. The network may include copper transmission cables, fiber optic transmissions, wireless transmissions, routers, firewalls, switches, gateway computers and/or edge servers. The network interface card or network interface in each computing/processing device receives computer readable program instructions from the network and forwards the computer readable program instructions for storage in a computer readable storage medium in the respective computing/processing device.

Computer program instructions for carrying out operations of the present invention may be assembly instructions, instruction Set Architecture (ISA) instructions, machine-related instructions, microcode, firmware instructions, state setting data, or source or object code written in any combination of one or more programming languages, including an object oriented programming language such as Smalltalk, c++ or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The computer readable program instructions may be executed entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (for example, through the Internet using an Internet service provider). In some embodiments, aspects of the present invention are implemented by personalizing electronic circuitry, such as programmable logic circuitry, field Programmable Gate Arrays (FPGAs), or Programmable Logic Arrays (PLAs), with state information for computer readable program instructions, which can execute the computer readable program instructions.

Various aspects of the present invention are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer-readable program instructions.

These computer readable program instructions may be provided to a processing unit of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processing unit of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks. These computer readable program instructions may also be stored in a computer readable storage medium that can direct a computer, programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer readable medium having the instructions stored therein includes an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.

The computer readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer, other programmable apparatus or other devices implement the functions/acts specified in the flowchart and/or block diagram block or blocks.

The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

Note that all features disclosed in this specification (including any accompanying claims, abstract and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise. Thus, unless expressly stated otherwise, each feature disclosed is one example only of a generic set of equivalent or similar features. Where used, further, preferably, still further and preferably, the brief description of the other embodiment is provided on the basis of the foregoing embodiment, and further, preferably, further or more preferably, the combination of the contents of the rear band with the foregoing embodiment is provided as a complete construct of the other embodiment. A further embodiment is composed of several further, preferably, still further or preferably arrangements of the strips after the same embodiment, which may be combined arbitrarily.

It will be appreciated by persons skilled in the art that the embodiments of the invention described above and shown in the drawings are by way of example only and are not limiting. The objects of the present invention have been fully and effectively achieved. The functional and structural principles of the present invention have been shown and described in the examples and embodiments of the invention may be modified or practiced without departing from the principles described.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present disclosure, and not for limiting the same; although the present disclosure has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some or all of the technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit of the corresponding technical solutions from the scope of the technical solutions of the embodiments of the present disclosure.

Claims

1. The cervical vertebra image segmentation method based on the deep learning attention mechanism is characterized by comprising the following steps of:

acquiring a medical image data set to be processed, wherein each medical image in the medical image data set comprises a cervical vertebra image area;

2. The cervical spine image segmentation method according to claim 1, wherein the deep learning network model comprises five layers of network structures, and DSA attention mechanism modules are arranged on the second layer of network structure to the fourth layer of network structure;

3. The method of claim 2, wherein the DSA attention mechanism module comprises: a large-scale self-attention feature extraction unit and a small-scale self-attention feature extraction unit;

4. The method according to claim 2, wherein the downsampled encoded image in the second-layer network structure to the fifth-layer network structure, before performing the attention mechanism feature fusion extraction, further comprises:

5. The method of claim 2, wherein the SMSA attention mechanism module comprises: a width multilayer sensor and a height multilayer sensor;

6. The method of claim 5, wherein the step of determining the position of the probe is performed,

the third characteristic coefficient is calculated based on the following mode:

is a dot product operation.

7. The method of any of claims 1 to 6, wherein the segmentation loss function employed by the deep-learning network model comprises at least one of:

CELoss loss function; diceLoss loss function.

8. A cervical spine image segmentation apparatus based on a deep learning attention mechanism, comprising:

9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the deep learning attention mechanism based cervical spine image segmentation method of any one of claims 1 to 7 when the program is executed.

10. A non-transitory computer readable storage medium having stored thereon a computer program, wherein the computer program when executed by a processor implements the cervical spine image segmentation method based on the deep learning attention mechanism of any one of claims 1 to 7.