Detailed Description
Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
Fig. 1 is a block diagram of an example computing device 100. In a basic configuration 102, computing device 100 typically includes system memory 106 and one or more processors 104. A memory bus 108 may be used for communication between the processor 104 and the system memory 106.
Depending on the desired configuration, the processor 104 may be any type of processor, including but not limited to: a microprocessor (μ P), a microcontroller (μ C), a Digital Signal Processor (DSP), or any combination thereof. The processor 104 may include one or more levels of cache, such as a level one cache 110 and a level two cache 112, a processor core 114, and registers 116. The example processor core 114 may include an Arithmetic Logic Unit (ALU), a Floating Point Unit (FPU), a digital signal processing core (DSP core), or any combination thereof. The example memory controller 118 may be used with the processor 104, or in some implementations the memory controller 118 may be an internal part of the processor 104.
Depending on the desired configuration, system memory 106 may be any type of memory, including but not limited to: volatile memory (such as RAM), non-volatile memory (such as ROM, flash memory, etc.), or any combination thereof. System memory 106 may include an operating system 120, one or more applications 122, and program data 124. In some embodiments, application 122 may be arranged to operate with program data 124 on an operating system. In some embodiments, where computing device 100 is configured to perform method 200 of processing a sagittal image of the spine, program data 124 includes instructions for performing method 200. Additionally, in some embodiments, processing the spine sagittal image is implemented by disposing the apparatus 1200 for processing the spine sagittal image in a memory, the apparatus 1200 being disposed as one of the applications 122 in the computing device 100.
Computing device 100 may also include an interface bus 140 that facilitates communication from various interface devices (e.g., output devices 142, peripheral interfaces 144, and communication devices 146) to the basic configuration 102 via the bus/interface controller 130. The example output device 142 includes a graphics processing unit 148 and an audio processing unit 150. They may be configured to facilitate communication with various external devices, such as a display or speakers, via one or more a/V ports 152. Example peripheral interfaces 144 may include a serial interface controller 154 and a parallel interface controller 156, which may be configured to facilitate communication with external devices such as input devices (e.g., keyboard, mouse, pen, voice input device, image input device) or other peripherals (e.g., printer, scanner, etc.) via one or more I/O ports 158. An example communication device 146 may include a network controller 160, which may be arranged to facilitate communications with one or more other computing devices 162 over a network communication link via one or more communication ports 164.
A network communication link may be one example of a communication medium. Communication media may typically be embodied by computer readable instructions, data structures, program modules, and may include any information delivery media, such as carrier waves or other transport mechanisms, in a modulated data signal. A "modulated data signal" may be a signal that has one or more of its data set or its changes made in a manner that encodes information in the signal. By way of non-limiting example, communication media may include wired media such as a wired network or private-wired network, and various wireless media such as acoustic, Radio Frequency (RF), microwave, Infrared (IR), or other wireless media. The term computer readable media as used herein may include both storage media and communication media. In some embodiments, one or more programs are stored in a computer readable medium, the one or more programs including instructions for performing certain methods.
Computing device 100 may be implemented as part of a small-form factor portable (or mobile) electronic device such as a cellular telephone, a digital camera, a Personal Digital Assistant (PDA), a personal media player device, a wireless web-watch device, a personal headset device, an application specific device, or a hybrid device that include any of the above functions. Of course, the computing device 100 may also be implemented as a personal computer including both desktop and notebook computer configurations, or as a server having the above-described configuration. The embodiments of the present invention are not limited thereto.
FIG. 2 shows a flow diagram of a method 200 of processing a sagittal image of the spine, according to one embodiment of the invention. It should be noted that, in the embodiment according to the present invention, the method 200 represents whether the intervertebral disc in the sagittal image of the spine is healthy by determining whether there is an abnormality in the image. When the spine sagittal image is judged to be not abnormal, the health of the intervertebral disc in the image is represented; when the spine sagittal image is judged to have abnormality, the intervertebral disc in the image is unhealthy (for the sake of simplicity of description, the description is directly made from the viewpoint of whether the intervertebral disc is healthy or not). Meanwhile, the positions of unhealthy intervertebral disc regions can be further predicted by combining the positioning of all the vertebrae in the spine sagittal image, so that reference is provided for professional doctors.
For the positioning of each vertebra in the sagittal image of the spinal column, the positioning can be realized by means of naked eye positioning by a professional doctor, and also can be realized by some other traditional image processing algorithms, which is not limited by the embodiment of the invention. In an implementation according to the invention, a method for locating individual bones in a sagittal image of the spine through deep learning is provided. The method for locating bones can be divided into two parts: the first part is trained to generate a positioning model, and the positioning model can calculate how many bones are contained in a spine sagittal image, the position coordinates of each bone and confirm which bone is the sacrum; the second part further calculates which bones the cervical vertebra, the thoracic vertebra and the lumbar vertebra respectively correspond to according to the positioning result of the positioning model.
The first section, a method 300 of generating a location model according to an embodiment of the present invention (as with method 200, the method 300 is performed by computing device 100 by storing instructions for performing the method 300 in program data 124 of computing device 100). Fig. 3 illustrates a flow diagram of a method 300 of generating a localization model for locating individual bones from a sagittal image of a spine and identifying which of the bones is the sacrum, according to one embodiment of the invention. FIG. 4A shows a partial schematic view of a sagittal view of the spine showing the vertebral bodies in a top-down arrangement as shown in FIG. 4A, with the lowest triangular and 45 degree angled bone being the sacrum (designated S in FIG. 4A), according to one embodiment of the present invention. It can be seen that the characteristics of the sacrum are more pronounced than those of the other vertebrae, which are very similar. In view of this, in an embodiment according to the invention, each bone is located from a sagittal image of the spine using a location model, and the sacrum is identified therefrom, followed by identification of the other vertebrae based on the sacrum and the location of the location.
As shown in fig. 3, the method 300 begins at step S310. And acquiring a marked spine sagittal image as a training image, wherein the training image has corresponding marking data, and the marking data comprises the position of each spine skeleton in the training image and the mark of whether each skeleton belongs to the sacrum. According to one implementation, the bones in the spine sagittal image are labeled by using a labeling software under the guidance of a professional physician, and each piece of bone is labeled by using a rectangular frame (the size of the rectangular frame is determined according to the size of each piece of bone). Thus, the coordinates of the four vertices of the rectangular box represent the location of the bone. As described above, the characteristics of the sacrum are more obvious than those of other vertebrae, and therefore, in the labeling process according to the present invention, it is not necessary to explicitly label the name of each bone, but only the sacrum is used as one type, and the vertebrae such as other lumbar vertebrae, thoracic vertebrae, and cervical vertebrae are used as another type, and which of the two types each bone belongs to is labeled (see fig. 4A, the sacrum is identified by letter S, and the other vertebrae are identified by letter M). Further, when labeling the sacrum, attention is paid to the black disc inclined above and below the sacrum, and when labeling other vertebrae, only a small part of the square bone itself and the upper and lower discs may be labeled. That is, the labeled features of the two types of bones are distinguished significantly during the labeling process. This has the advantage that the positioning model can be made to unambiguously determine the difference between the sacrum and the other vertebrae to improve the positioning accuracy. By way of example, fig. 4B and 4C show images of the noted sacrum and other vertebrae, respectively.
According to another embodiment of the present invention, after the spine sagittal image is labeled, the labeled spine sagittal image is preprocessed, and the preprocessed spine sagittal image is used as a training image. Wherein the pretreatment comprises the following steps: and randomly adjusting the image brightness, vertically turning the image, rotating the image at a small angle and the like to enhance the sample data of the training image, so that the generated positioning model has better generalization capability. Of course, the pre-processing may also include scaling the spinal sagittal image to conform to a predetermined size, which in one embodiment according to the invention is 512 x 512 for the training image.
Subsequently, in step S320, the training image is input to the pre-trained positioning model for processing. Fig. 5 is a schematic diagram illustrating a positioning model 500 according to an embodiment of the present invention. The localization model 500 is based on a convolutional neural network, including a convolutional processing layer 510, a classification processing layer 520, and a regression processing layer 530.
The training image is input to convolution processing layer 510 (in embodiments in accordance with the invention, convolution processing layer 510 typically contains multiple convolution layers), and convolution processing layer 510 convolves, activates, and pools the input image to output the located at least one bone. The classification processing layer 520 and the regression processing layer 530 are coupled to the convolution processing layer 510, and perform classification processing and regression processing on the located bone, respectively, to output a probability of predicting that the bone belongs to the sacrum and a predicted position of the bone, respectively.
According to one implementation, the convolution processing layer 510 includes at least: 12 convolutional layers, 8 pooling layers, and 3 fully-connected layers, and a transition layer between the 12 th convolutional layer and the 1 st fully-connected layer (i.e., a Flatten layer to dimension the multidimensional input). Fig. 5 shows a structure of a convolution processing layer 510 according to an embodiment of the present invention, in which "convolution layer x 3" represents 3 convolution layers connected in series and "pooling layer x 3" represents 3 pooling layers connected in series for simplicity of description. Of course, the convolution processing layer 510 may further include an activation function (e.g., ReLu), and the activation function may be implemented by setting a separate activation layer, or by transferring an activation parameter when constructing a layer object, which is not limited in the embodiment of the present invention. The basic structure of the convolutional neural network is known to those skilled in the art and will not be described herein. In one embodiment according to the invention, the pooling layer is in the form of maximum pooling. Optionally, the convolution kernel size in each convolutional layer is taken to be 3 × 3, and the pooling window size in each pooling layer is taken to be 2 × 2.
The classification processing layer 520 and the regression processing layer 530 are connected in parallel, and the classification processing layer 520 performs classification processing on each piece of bone located to output a probability of predicting that the piece of bone belongs to the sacrum. According to one embodiment of the invention, the classification processing layer uses a softmax network to output two classes based on input data. That is, for each bone located, a probability vector score vector (x, y) is output, where x represents the probability that it belongs to the sacrum and y represents the probability that it belongs to the other vertebrae. In an embodiment according to the invention, the bone is identified as belonging to the sacrum when the output probability that the bone is predicted to belong to the sacrum is greater than a threshold, which according to an embodiment of the invention is typically taken to be between 0.8 and 0.9, such as when x > 0.8.
The regression processing layer 530 performs regression processing on the predicted position of each located bone block according to the labeling data to output the vertex coordinates of the bounding box containing the bone block as its predicted position. In one embodiment according to the invention, the regression processing layer 530 use bounding box regression to return a rectangular box (i.e., bounding box) representing the located piece of bone. Outputting a position matrix box matrix (p) for each positioned bone1,p2,p3,p4) And respectively represent four diagonal vertex coordinates of the rectangular frame.
According to the embodiment of the invention, when a plurality of bones are located in the sagittal image of the spine, the score vector and the box matrix are integrated into an N x 6 large matrix through Reshape and matrix transformation and output, wherein N represents the number of located bone blocks. Assuming N-3, an example of a large matrix of N × 6 is as follows, where each row represents the correlation result (score vector and box matrix) for a block of bone.
Subsequently, in step S330, model training is performed on the pre-trained positioning model according to the labeling data, so as to obtain a trained positioning model as the generated positioning model.
When the positioning model 500 is trained, a joint training mode is adopted to input a training image into the convolution processing layer 510 to obtain a positioned bone block (as a candidate region), the candidate region is input into the classification processing layer 520 and the regression processing layer 530, the network parameters of the convolution processing layer 510 are finely adjusted according to the output matrix result and the labeled data, the network parameters of the classification processing layer 520 and the regression processing layer 530 are finely adjusted by using the result of the convolution processing layer 510, the above steps are repeated until the loss of the output of the positioning model 500 and the labeled data meets a predetermined condition (the embodiment of the invention does not limit the selection of a loss function), that is, the output of the positioning model is close to the labeled data, and the training is finished.
According to another embodiment of the present invention, the training process for the positioning model 500 can refer to the training process for the fast-RCNN network, which is not described herein in detail since the fast-RCNN network is known to those skilled in the art.
Table 1 shows a partial network structure of the convolution processing layer 510 in the trained positioning model 500 according to an embodiment of the present invention. Wherein, for the sake of simplicity, the expression "the number of layer repetitions is 1" means that the number of the layer is only 1, and the number of the layer repetitions is 2 "means that the layer is 2, and two are connected in series.
Table 1 partial network architecture of convolution processing layer 510 in localization model 500
|
Number of convolution kernels
|
Convolution kernel size
|
Convolution step size
|
Size of pooling window
|
Number of layer repetitions
|
Convolutional layer
|
64
|
[3,3]
|
2
|
--
|
1
|
Pooling layer
|
--
|
--
|
--
|
[2,2]
|
1
|
Convolutional layer
|
128
|
[3,3]
|
2
|
--
|
2
|
Pooling layer
|
--
|
--
|
--
|
[2,2]
|
1
|
Convolutional layer
|
256
|
[3,3]
|
2
|
--
|
3
|
Pooling layer
|
--
|
--
|
--
|
[2,2]
|
3
|
Convolutional layer
|
512
|
[3,3]
|
2
|
--
|
3
|
Pooling layer
|
--
|
--
|
--
|
[2,2]
|
3
|
Convolutional layer
|
512
|
[3,3]
|
2
|
--
|
3 |
In addition, 1 transition layer and two fully-connected layers (not shown in table 1) are also included in the convolution processing layer 510, where dropout of each fully-connected layer takes 0.5.
The positioning model 500 according to the embodiment of the present invention is thus trained. From the positioning model 500, the number of pieces of bone contained in the sagittal image of the spine, the probability that each piece of bone belongs to the sacrum, and the position of each piece of bone can be obtained.
The second section is described next: how to position other vertebrae, such as cervical vertebrae, thoracic vertebrae, lumbar vertebrae, etc., in the sagittal image of the spine, except for the sacrum. FIG. 6 illustrates a method 600 of bone positioning of a spinal sagittal image according to one embodiment of the invention (in one embodiment according to the invention, the computing device 100 is caused to perform the method 600 by storing instructions for performing the method 600 in the program data 124 of the computing device 100).
As shown in fig. 6, the method 600 starts in step S610, where the sagittal image of the spine is input into the positioning model, and the number of bones contained in the sagittal image of the spine, the probability that each bone belongs to the sacrum, and the position of each bone are output after the positioning process.
According to one implementation of the present invention, the spine sagittal image is input into the positioning model 500, and a feature map of each bone (i.e., the number of contained bones is located) is obtained after convolution processing, and generally, the number of located bones is greater than 1; then, classifying the characteristic graph of each bone, and outputting and predicting the probability of each bone belonging to the sacrum; and meanwhile, performing regression processing on the feature map of at least one piece of bone, and outputting the vertex coordinates of the boundary box containing each piece of bone. The positioning model 500 can be referred to the description of the method 300, and is not repeated herein.
According to another embodiment of the present invention, before the spine sagittal image is inputted into the positioning model 500, histogram equalization may be performed on the spine sagittal image to ensure that the image has sufficient contrast, and then the processed spine sagittal image is inputted into the positioning model 500. Of course, as mentioned above, the spine sagittal image may also be scaled to obtain an image satisfying a predetermined size and then input into the positioning model 500. Wherein the predetermined size is 512 × 512. The embodiments of the present invention are not limited thereto.
Subsequently, in step S620, the sacrum in the sagittal image of the spine is determined by the probability that each bone belongs to the sacrum. According to one embodiment of the invention, a probability that each bone belongs to the sacrum is traversed, and when the probability is greater than a threshold, the bone is identified as the sacrum in the sagittal image of the spine. Alternatively, the threshold range is set to [0.8,0.9 ]]. Also as an example of the 3 × 6 matrix output by the positioning model 500 above, the x value (x) of each row in the matrix is traversed1,x2,x3) And when the x value is larger than the threshold value, determining that the corresponding bone is the sacrum.
Subsequently, in step S630, the names and positions of the other vertebrae in the sagittal image of the spine are sequentially confirmed from the sacrum.
The sacrum S is considered to be anatomically positioned in view of its very distinctive characteristics, and the names of the vertebrae are determined in order from the sacrum S according to the ordering of the vertebrae. According to an embodiment, the vertebral bodies of the spine are, from bottom to top: 1 sacrum, 5 lumbar vertebrae, 12 thoracic vertebrae and 7 cervical vertebrae, so that the sacrum S is taken as a starting point, and the bones in the sagittal image of the spine are respectively confirmed to be the lumbar vertebrae, the thoracic vertebrae and the cervical vertebrae from bottom to top: according to the position coordinates of each bone, the lumbar adjacent to the sacrum S is a waist 5 vertebral body L5, and the lumbar adjacent to the sacrum S is a waist 4 vertebral body L4 above the L5, so that the lumbar is 3-L3, the lumbar is 2-L2, the lumbar is 1-L1, the chest is 12-T12, the chest is 11-T11 and the chest is … … upwards, namely, the names of other vertebrae can be determined upwards in sequence as long as the sacrum S can be determined. Then, the corresponding positions of the sacrum, lumbar vertebra, thoracic vertebra and cervical vertebra are obtained according to the position coordinates of each bone obtained in step S610.
The flow of method 600 ends. Compared with the method of positioning the bone mark point by naked eyes in the traditional method, the method 600 of the invention adopts a deep learning mode, can avoid errors caused by subjective reasons, saves labor and time, and can quickly and accurately calculate the number of bone blocks contained in any spine sagittal image and the position and name of each bone block. The method plays a good auxiliary role in disease diagnosis of subsequent professional doctors.
Based on the above positioning results, the method 200 first cuts out the corresponding disc-spinal trigones from the sagittal image of the spine for each vertebra in the sagittal image of the spine to generate an image of the disc region in step S210.
According to one embodiment of the present invention, the disc region image is generated as follows: for each vertebra in a sagittal image of the spine, a square is generated by taking a connecting line of central points of adjacent vertebrae as a side, and the square just contains a disc-spinal cord trigone (namely, an intervertebral disc); and then, the square is cut out from the spine sagittal image to be used as an intervertebral disc area image. According to the embodiment of the present invention, the disc region can be accurately cut out from the spine sagittal image through conventional image processing algorithms, such as image rotation, contour detection, and the like, which is not limited by the embodiment of the present invention. Fig. 7 is a schematic diagram showing a sagittal image of the spine according to another embodiment of the present invention, wherein S, L5, L4, L3, L2, L1, T12 and T11 denote the central points of each located vertebra, the central points of adjacent vertebrae are connected to form one side of a square, so as to construct square boxes, as shown in fig. 7, the area marked by one box is an inter-disc area (there may be an overlapping area between adjacent boxes) of the sagittal image of the spine, for the convenience of observation, the square box between L1 and L2 is shown in bold, and the area enclosed by the square is cut out, so as to obtain an inter-disc area image.
Subsequently, in step S220, the disc region image is input to the first prediction model, and the first probability of predicting the health of the disc included in the disc region image is output after the convolution processing. Or in other words, the first probability of predicting whether the input inter-disc region image is abnormal is output through the processing of the first prediction model. If the prediction result indicates that the inter-disc region image is normal, the health of the inter-disc included in the inter-disc region image can be confirmed according to the prediction result; similarly, if the prediction result indicates that the inter-disc region image is abnormal, it can be determined that the inter-disc included in the inter-disc region image is unhealthy. The prediction result can be used as a reference to assist a professional doctor in completing diagnosis of the spine sagittal image.
According to one embodiment, the method 200 further comprises the step of generating the first predictive model using training image training. The training image set used for training the first prediction model is generated in the following mode:
first, a plurality of disc region images are collected according to the method described in step S210, and whether each disc region image is healthy (i.e., whether the disc included in the disc region image is healthy) is marked under the guidance of a professional doctor. Alternatively, whether the inter-disc region image is healthy or not may be represented by adding a suffix to the image name of the inter-disc region image. FIG. 8 is a schematic diagram of a training image set according to an embodiment of the present invention, where each small square image (i.e., the area enclosed by each small square in FIG. 7) cut out in FIG. 8 indicates that the image is healthy if the image name suffix is h (e.g., 29_2_7_6_ h.png); if the picture name suffix is d (e.g., 29_2_7_1_ d.png), it indicates that the picture is not healthy.
According to another embodiment of the invention, the labeling results can be synchronously marked at the cloud and verified by a plurality of professional doctors, so that the labeling accuracy and efficiency are improved.
And secondly, selecting a healthy inter-disc area image and an unhealthy inter-disc area image according to a preset proportion. In one embodiment according to the present invention, the predetermined ratio of the healthy disc region image to the unhealthy disc region image is set to 1: 4.
And thirdly, zooming the selected intervertebral disc area image to a preset size to form a training image set. Alternatively, the predetermined size is set to 60 × 60.
In the embodiment according to the present invention, the training process for the first prediction model may refer to the training process for the VGG16 convolutional network, which is known to those skilled in the art and will not be described herein.
Fig. 9 shows a schematic structure diagram of a first prediction model 900 according to an embodiment of the present invention, and as shown in fig. 9, the first prediction model 900 at least includes a first convolution processing layer 910, a specification layer 920, a second convolution processing layer 930, and a first classification processing layer 940.
Portions of the first predictive model 900 are described in detail below.
The first convolution processing layer 910 performs convolution, activation, and pooling on the input inter-disc region image, and then inputs the generated feature map into the specification layer 920. In an embodiment in accordance with the invention, the first buildup process layer 910 includes at least 13 buildup layers, 5 pooling layers, and 1 fully connected layer. Which are connected in sequence in the order of "buildup layer → pooling layer → buildup layer → whole connecting layer". Of course, the first volume processing layer 910 may further include an activation function (e.g., ReLu), and the activation function may be implemented by setting a separate activation layer, or by transferring an activation parameter when constructing a layer object, which is not limited in the embodiment of the present invention.
The normalization layer (Batchnormalization)920 re-normalizes the activation values of the first convolution layer 910 so that the mean of its output data is close to 0 and its standard deviation is close to 1 to speed up the convergence of the network and control the overfitting.
The second convolution processing layer 930 includes 2 convolution layers, 1 Dropout layer, and 1 pooling layer connected in this order. Wherein, 2 convolution layers all adopt convolution kernels with the size of 3 multiplied by 3, the convolution step length is 2, and the pooling layer adopts a maximum pooling mode. With the above structure, the second convolution processing layer 930 performs further convolution processing on the data output from the specification layer 920, and randomly disconnects the input neurons at a predetermined probability through the Dropout layer when the parameters of the model 900 are updated, so as to prevent overfitting. In one embodiment according to the present invention, the predetermined probability is 0.8.
Finally, the first classification processing layer 940 performs classification processing on the data output from the second convolution processing layer 930, and outputs a first probability of predicting the health of the intervertebral disc included in the input image of the intervertebral disc region. In one embodiment according to the invention, the first classification processing layer 940 employs a softmax network.
In yet another embodiment according to the present invention, a full connection layer 950 is further disposed between the second convolution processing layer 930 and the first classification processing layer 940. A transition layer (Flatten)960 is also disposed between the second convolution processing layer 930 and the fully-connected layer 950, as shown in fig. 9.
Meanwhile, in step S330, the disc region image is input to the second prediction model, and after being subjected to convolution processing, the second probability of predicting the health of the disc included in the disc region image is output. Or in other words, the second probability of predicting whether the input inter-disc region image is abnormal is output through the processing of the second prediction model. If the prediction result indicates that the inter-disc region image is normal, the health of the inter-disc included in the inter-disc region image can be confirmed according to the prediction result; similarly, if the prediction result indicates that the inter-disc region image is abnormal, it can be determined that the inter-disc included in the inter-disc region image is unhealthy. The prediction result can be used as a reference to assist a professional doctor in completing diagnosis of the spine sagittal image.
According to one embodiment, the method 200 further comprises the step of generating a second predictive model using training image training. The training image set during training the second prediction model is consistent with the training image set during training the first prediction model, and the specific manner of generating the training image set may refer to the foregoing description and is not expanded here.
In the embodiment according to the present invention, the Resnet50 convolutional network is used as the pre-trained second prediction model, and the training process for the second prediction model may refer to the training process for the Resnet50 convolutional network, which is known to those skilled in the art and will not be described herein again.
Fig. 10 shows a schematic structure diagram of a second prediction model 1000 according to an embodiment of the present invention, and as shown in fig. 10, the second prediction model 1000 at least includes a third convolution processing layer 1010, a first pooling layer 1020, a first number of fourth convolution processing layers 1030, a second pooling layer 1040, and a second classification processing layer 1050.
Portions of the second predictive model 1000 are described in detail below.
The third convolution processing layer 1010 performs convolution processing on the input disc region image. Alternatively, a convolution kernel of 7 × 7 is employed in the third convolution processing layer, and the convolution step size is 2. The convolved data is input to a first pooling layer 1020.
The first pooling layer 1020 pools the maximum of the output of the third convolution processing layer 1010. Alternatively, the pooling window is taken to be 3 × 3 and the pooling step is taken to be 2.
The data processed by the first pooling layer 1020 is input to a fourth convolution processing layer 1030, wherein the fourth convolution processing layer 1030 has a first number, and each fourth convolution processing layer 1030 contains a different number of convolution units, and the data output by the previous layer is subjected to convolution processing through the convolution units. In one embodiment according to the present invention, the first number is 4, and as shown in fig. 10, the fourth convolution processing layers 1032, 1034, 1036, and 1038 are provided in this order from top to bottom, respectively. Meanwhile, the fourth convolution processing layer 1032 includes 3 convolution units 1100, the fourth convolution processing layer 1034 includes 4 convolution units 1100, the fourth convolution processing layer 1036 includes 6 convolution units 1100, and the fourth convolution processing layer 1038 includes 3 convolution units 1100.
Fig. 11 shows a schematic diagram of a convolution unit 1100 according to an embodiment of the present invention. The convolution unit 1100 includes a first convolution module 1102, a second convolution module 1104 and a third convolution module 1106 which are connected in sequence, wherein the convolution kernel size of the first convolution module 1102 is 1 × 1, the convolution kernel size of the second convolution module 1104 is 3 × 3, the convolution kernel size of the third convolution module 1106 is 1 × 1, and the output of the third convolution module 1106 is combined with the input of the first convolution module 1102 to serve as the output of the convolution unit 1100. According to embodiments of the present invention, an activation function (e.g., ReLu) may be added after the first and second convolution modules 1102, 1104 and before the final output. According to the second prediction model 1000 of the present invention, by providing the convolution unit 1100 in the fourth convolution processing layer 1030, the depth of the convolution network structure is simplified, thereby reducing the complexity of the calculation.
The second pooling layer 1040 performs an average pooling process on the data output from the last fourth convolution processing layer (i.e., the fourth convolution processing layer 1038).
The second classification processing layer 1050 performs classification processing on the data output from the second pooling layer 1040, and outputs a second probability of predicting the health of the intervertebral disc included in the input intervertebral disc region image. In one embodiment according to the invention, the second classification processing layer 1050 employs a softmax network.
According to another embodiment of the present invention, a full connection layer 1060 is further included between the second pooling layer 1040 and the second classification processing layer 1050, as shown in fig. 10.
Subsequently, in step S240, the first probability calculated in step S220 and the second probability calculated in step S230 are combined to calculate the probability that the intervertebral disc included in the intervertebral disc region image is healthy.
According to one embodiment, the first prediction model 900 and the second prediction model 1000 are implemented separately, and both the input disc region image and the input disc region image are normal, that is, the disc included in the disc region image is healthy. In some embodiments according to the present invention, when the first probability is not less than 0.3, the disc health included in the input disc region image is predicted; and when the second probability is not less than 0.5, predicting the health of the intervertebral disc contained in the input intervertebral disc region image. And taking the prediction result as a reference to assist a professional doctor in completing diagnosis of the spine sagittal image.
In another embodiment according to the present invention, the prediction results of the first prediction model 900 and the second prediction model 1000 are weighted, and the weighted result is used as the final prediction result for reference by the professional doctor. Overfitting can be prevented to some extent by weighting the prediction results of the two prediction models. Specifically, a first weighting factor and a second weighting factor are respectively set for the first probability and the second probability, and then a probability c of the disc health contained in the disc region image is calculated through a weighting algorithm, and the probability c of the final prediction can be expressed by a formula as follows:
c=w1×c1+w2×c2
in the formula, c1And c2Respectively representing a first probability and a second probability, w1And w2Respectively represent a first weight factor and a second weight factor, and w1+w2=1。
And judging whether the intervertebral disc contained in the intervertebral disc region image is healthy or not according to the probability value. As described in the above embodiment, when the probability value is not less than the preset value, the health of the intervertebral disc included in the image of the intervertebral disc region is predicted; when the probability value is smaller than the preset value, the intervertebral disc contained in the intervertebral disc region image is unhealthy. In one embodiment according to the present invention, w1And w2When c is not less than 0.4, the health of the intervertebral disc included in the input intervertebral disc region image is predicted, assuming that c is 0.5. In another embodiment according to the present invention w1Take 0.4, w2When c is not less than 0.45, 0.6 is taken out, and the disc health included in the inputted disc region image is predicted. The latter weighting method is preferably adopted, and of course, different weighting factors and preset values may also be set, and the embodiments disclosed herein are only examples, and the present invention is not limited theretoAnd (5) preparing.
According to the embodiment of the present invention, the method 200 is performed for each vertebra located in the sagittal image of the spine, so as to obtain the probability of predicting the health of the intervertebral disc included in each intervertebral disc region image in the sagittal image of the spine, and further determine whether the intervertebral disc included in the intervertebral disc region image is healthy. Further, when the probability is smaller than a preset value (in a preferred embodiment according to the present invention, the preset value takes 0.45), the position of the unhealthy area is obtained from the vertebrae indicated by the disc region image. The professional doctor can use the method as a reference to diagnose the spinal diseases.
Accordingly, embodiments of the present invention also provide an apparatus 1200 for processing a sagittal spine image corresponding to the method 200. FIG. 12A shows a schematic diagram of an apparatus 1200 for processing a spinal sagittal image according to one embodiment of the invention.
As shown in fig. 12A, the apparatus 1200 includes at least: a pre-processing module 1210, a first processing module 1220, a second processing module 1230, and a calculation module 1240.
According to an embodiment of the present invention, for each vertebra in the sagittal image of the spine, the preprocessing module 1210 extracts the disc-spinal trigone corresponding to each vertebra from the sagittal image of the spine to generate an image of the disc region. The individual vertebrae in the sagittal image of the spine can be calculated by the method 600.
According to yet another embodiment of the present invention, the pre-processing module 1210 generates the inter-disc region image as follows: and (3) generating a square by taking a connecting line of central points of adjacent vertebrae as a side, wherein the square just comprises a disc-spinal cord trigone, and then intercepting the square from a sagittal image of the spinal column to be used as an image of a disc region. According to the embodiment of the present invention, the disc region can be accurately cut out from the spine sagittal image through conventional image processing algorithms, such as image rotation, contour detection, and the like, which is not limited by the embodiment of the present invention. Fig. 7 is a schematic diagram showing a sagittal image of the spine according to another embodiment of the present invention, wherein S, L5, L4, L3, L2, L1, T12 and T11 denote the central points of each located vertebra, the central points of adjacent vertebrae are connected to form one side of a square, so as to construct square boxes, as shown in fig. 7, the area marked by one box is an inter-disc area (there may be an overlapping area between adjacent boxes) of the sagittal image of the spine, for the convenience of observation, the square box between L1 and L2 is shown in bold, and the area enclosed by the square is cut out, so as to obtain an inter-disc area image.
The pre-processing module 1210 transfers the disc area image to a first processing module 1220 and a second processing module 1230 coupled thereto. The first processing module 1220 performs convolution processing on the disc region image through the first prediction model 900 to output a first probability of predicting the health of the disc included in the disc region image. Meanwhile, the second processing module 1230 performs convolution processing on the disc region image through the second prediction model 1000 to output a second probability of predicting the health of the disc included in the disc region image.
According to yet another embodiment of the present invention, the apparatus 1200 may further include a training module 1250 for generating the first prediction model 900 and the second prediction model 1000 using training images in addition to the pre-processing module 1210, the first processing module 1220, the second processing module 1230 and the calculation module 1240, as shown in fig. 12B.
The training module 1250 further includes an image acquisition sub-module 1252, which acquires a plurality of inter-disc region images and marks whether each inter-disc region image is healthy (i.e., whether an inter-disc included in the inter-disc region image is healthy), and optionally, may represent whether the inter-disc region image is healthy by adding a suffix to the image name of the inter-disc region image. As FIG. 8 illustrates a schematic diagram of a set of training images, each small square image (i.e., disc region image) truncated in FIG. 8, if the image name suffix is h, indicating that the image is healthy, in accordance with one embodiment of the present invention; if the image name suffix is d, it indicates that the image is not healthy. Meanwhile, the image acquisition sub-module 1252 further selects a healthy disc region image and an unhealthy disc region image according to a predetermined ratio (for example, the predetermined ratio of the healthy disc region image to the unhealthy disc region image is set to 1:4), and scales the selected disc region image to a predetermined size (optionally, the predetermined size is 60 × 60), so as to form a training image set.
The network structure and the algorithm execution process of the first prediction model 900 and the second prediction model 1000 have been described in detail in the foregoing description with reference to fig. 9 and 10, and will not be repeated herein for the sake of brevity.
The first processing module 1220 and the second processing module 1230 transmit the calculated results to the calculating module 1240 coupled thereto, and the calculating module 1240 calculates the probability that the intervertebral disc included in the image of the intervertebral disc region is healthy by combining the first probability and the second probability.
According to an embodiment, the first processing module 1220 and the second processing module 1230 respectively obtain a prediction result about whether the input disc region image is normal, that is, a prediction result about whether the intervertebral disc included in the disc region image is healthy. When the first probability or the second probability is not less than the preset value, the health of the intervertebral disc contained in the image of the intervertebral disc region can be judged. In some embodiments according to the present invention, when the first probability is not less than 0.3, the disc health included in the input disc region image is predicted; and when the second probability is not less than 0.5, predicting the health of the intervertebral disc contained in the input intervertebral disc region image. And taking the prediction result as a reference to assist a professional doctor in completing diagnosis of the spine sagittal image.
In another embodiment of the present invention, the calculating module 1240 sets a first weighting factor and a second weighting factor corresponding to the first probability and the second probability, respectively, and calculates the probability c of the disc health included in the disc region image through a weighting algorithm, where the final probability c may be expressed as follows:
c=w1×c1+w2×c2
in the formula, c1And c2Respectively representing a first probability and a second probability, w1And w2Respectively represent a first weight factor and a second weight factor, and w1+w2=1。
According to one embodiment, device 1200 may also include a location module 1260 and a determination module 1270, as shown in FIG. 12B. The positioning module 1260, coupled to the preprocessing module 1210, may be configured to perform the steps of the method 600 to locate the position of each vertebra from the sagittal image of the spine.
When the probability of judging that the intervertebral disc included in the image of the intervertebral disc region is healthy is smaller than the preset value, the judgment module 1270 predicts that the intervertebral disc included in the image of the intervertebral disc region is unhealthy and obtains the position of the unhealthy region according to the vertebra indicated by the image of the intervertebral disc region. As described in the above embodiment, when the probability is not less than the preset value, the health of the intervertebral disc included in the image of the intervertebral disc region is predicted; and when the probability is smaller than a preset value, predicting that the intervertebral disc contained in the intervertebral disc region image is unhealthy. In one embodiment according to the present invention, w1And w2When c is not less than 0.4, the health of the intervertebral disc included in the input intervertebral disc region image is predicted, assuming that c is 0.5. In another embodiment according to the present invention w1Take 0.4, w2When c is not less than 0.45, 0.6 is taken out, and the disc health included in the inputted disc region image is predicted. The latter weighting method is preferably adopted, and of course, different weighting factors and preset values may also be set, and the embodiments disclosed herein are only examples, and the present invention is not limited thereto.
In summary, according to the solution of the present invention, according to the result of positioning the bone in the sagittal image of the spine, the sagittal image of the spine is further processed by executing the method 200 (or the apparatus 1200) to obtain the probability of whether the intervertebral disc included in each disc region image in the sagittal image of the spine is healthy, so as to give the prediction result. The prediction result can be used as a reference to assist a professional doctor in completing diagnosis of various spinal diseases.
Meanwhile, in order to prevent overfitting, the probability of the healthy intervertebral disc is predicted by combining two convolution networks, namely the first prediction model 900 and the second prediction model 1000, so that the calculation accuracy is ensured while the efficiency is improved.
In addition, in the scheme according to the invention, a scheme for carrying out bone positioning on the spine sagittal image is also provided. A positioning model 500 is generated by training, which is used to locate the sacrum in the sagittal image of the spine and to derive the position coordinates of the other vertebrae. By utilizing the characteristic that the sacrum is obviously different from other vertebrae, the position and the corresponding name of each vertebra can be obtained according to the positioned position coordinates of the sacrum and other vertebrae. The positioning scheme can avoid errors caused by subjective reasons, save labor and time cost, and can quickly and accurately calculate the number of the bone blocks contained in any spine sagittal image and the position and name of each bone. The method plays a good auxiliary role in disease diagnosis of subsequent professional doctors. It should be appreciated that in the foregoing description of exemplary embodiments of the invention, various features of the invention are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure and aiding in the understanding of one or more of the various inventive aspects. However, the disclosed method should not be interpreted as reflecting an intention that: that the invention as claimed requires more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the detailed description are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate embodiment of this invention.
Those skilled in the art will appreciate that the modules or units or components of the devices in the examples disclosed herein may be arranged in a device as described in this embodiment or alternatively may be located in one or more devices different from the devices in this example. The modules in the foregoing examples may be combined into one module or may be further divided into multiple sub-modules.
Those skilled in the art will appreciate that the modules in the device in an embodiment may be adaptively changed and disposed in one or more devices different from the embodiment. The modules or units or components of the embodiments may be combined into one module or unit or component, and furthermore they may be divided into a plurality of sub-modules or sub-units or sub-components. All of the features disclosed in this specification (including any accompanying claims, abstract and drawings), and all of the processes or elements of any method or apparatus so disclosed, may be combined in any combination, except combinations where at least some of such features and/or processes or elements are mutually exclusive. Each feature disclosed in this specification (including any accompanying claims, abstract and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.
The invention also discloses:
a5, the method of any one of A1-4, further comprising the step of generating a first predictive model and a second predictive model using training images. A6, the method of a5, further comprising the step of generating a set of training images: collecting a plurality of disc region images and marking whether each disc region image is healthy or not; selecting healthy disc area images and unhealthy disc area images according to a preset proportion; and scaling the selected disc region image to a preset size to form a training image set. A7, the method as in a6, wherein the predetermined ratio of the healthy disc region image to the unhealthy disc region image is 1: 4. A8, the method of a6, wherein the predetermined size is 60 × 60. A9, the method of any one of A1-8, wherein the first predictive model comprises: a first convolution processing layer, which is suitable for performing convolution, activation and pooling processing on an input image; a normalization layer adapted to re-normalize the activation values of the first convolution processing layer; the second convolution processing layer is suitable for performing convolution processing on the data output by the standard layer and randomly disconnecting input neurons according to a preset probability when the model parameters are updated; and the first classification processing layer is suitable for performing classification processing on the data output by the second convolution processing layer and outputting a first probability of predicting the health of the intervertebral disc contained in the input image. A10, the method as in a9, wherein a full link layer is further included between the second convolution processing layer and the first classification processing layer. A11, the method of a9 or 10, wherein the first volume treated layer comprises at least: 13 convolutional layers, 5 pooling layers, and 1 fully-connected layer. A12, the method of any one of A1-11, wherein the second predictive model comprises: a third convolution processing layer adapted to perform convolution processing on the input image; a first pooling layer adapted to pool the maximum value of the output of the third convolution processing layer; the first number of fourth convolution processing layers, each of which comprises a different number of convolution units and is suitable for performing convolution processing on data output by the previous layer; the second pooling layer is suitable for performing average value pooling on data output by the last fourth convolutional layer; and the second classification processing layer is suitable for performing classification processing on the data output by the second pooling layer and outputting a second probability of predicting the health of the intervertebral disc contained in the input image. A13, the method as in a12, wherein between the second pooling layer and the second sort handling layer, a full link layer is further included. A14, the method of a13, wherein the convolution unit comprises: the convolution device comprises a first convolution module, a second convolution module and a third convolution module which are sequentially connected, wherein the convolution kernel size of the first convolution module is 1 multiplied by 1, the convolution kernel size of the second convolution module is 3 multiplied by 3, the convolution kernel size of the third convolution module is 1 multiplied by 1, and the output of the third convolution module is combined with the input of the first convolution module and then serves as the output of the convolution unit.
B19, the apparatus of any one of B15-18, further comprising: and the training module is suitable for generating a first prediction model and a second prediction model by utilizing training images. The device of B20, as stated in B19, wherein the training module further includes an image acquisition submodule, the image acquisition submodule is adapted to acquire a plurality of disc region images and mark whether each disc region image is healthy, and select healthy disc region images and unhealthy disc region images according to a predetermined proportion; and is further adapted to scale the selected image of the disc region to a predetermined size, constituting a training image set. B21, the device as in B20, wherein the predetermined ratio of the healthy disc area image to the unhealthy disc area image is 1: 4. B22, the device of B21, wherein the predetermined size is 60 × 60. B23, the apparatus of any one of B15-22, wherein the first predictive model comprises: a first convolution processing layer, which is suitable for performing convolution, activation and pooling processing on an input image; a normalization layer adapted to re-normalize the activation values of the first convolution processing layer; the second convolution processing layer is suitable for performing convolution processing on the data output by the standard layer and randomly disconnecting input neurons according to a preset probability when the model parameters are updated; and the first classification processing layer is suitable for performing classification processing on the data output by the second convolution processing layer and outputting a first probability of predicting the health of the intervertebral disc contained in the input image. B24, the apparatus as in B23, wherein a full connection layer is further included between the second convolution processing layer and the first classification processing layer. B25, the apparatus of B23 or 24, wherein the first volume treatment layer comprises at least: 13 convolutional layers, 5 pooling layers, and 1 fully-connected layer. B26, the apparatus of any one of B15-25, wherein the second predictive model comprises: a third convolution processing layer adapted to perform convolution processing on the input image; a first pooling layer adapted to pool the maximum value of the output of the third convolution processing layer; the first number of fourth convolution processing layers, each of which comprises a different number of convolution units and is suitable for performing convolution processing on data output by the previous layer; the second pooling layer is suitable for performing average pooling on the data output by the last fourth convolution processing layer; and the second classification processing layer is suitable for performing classification processing on the data output by the second pooling layer and outputting a second probability of predicting the health of the intervertebral disc contained in the input image. B27 the device as in B26, wherein between the second pooling layer and the second sorting layer, a full connection layer is further included. B28, the method of B27, wherein the convolution unit comprises: the convolution device comprises a first convolution module, a second convolution module and a third convolution module which are sequentially connected, wherein the convolution kernel size of the first convolution module is 1 multiplied by 1, the convolution kernel size of the second convolution module is 3 multiplied by 3, the convolution kernel size of the third convolution module is 1 multiplied by 1, and the output of the third convolution module is combined with the input of the first convolution module and then serves as the output of the convolution unit.
Furthermore, those skilled in the art will appreciate that while some embodiments described herein include some features included in other embodiments, rather than other features, combinations of features of different embodiments are meant to be within the scope of the invention and form different embodiments. For example, in the following claims, any of the claimed embodiments may be used in any combination.
The various techniques described herein may be implemented in connection with hardware or software or, alternatively, with a combination of both. Thus, the methods and apparatus of the present invention, or certain aspects or portions thereof, may take the form of program code (i.e., instructions) embodied in tangible media, such as floppy diskettes, CD-ROMs, hard drives, or any other machine-readable storage medium, wherein, when the program is loaded into and executed by a machine, such as a computer, the machine becomes an apparatus for practicing the invention.
In the case of program code execution on programmable computers, the computing device will generally include a processor, a storage medium readable by the processor (including volatile and non-volatile memory and/or storage elements), at least one input device, and at least one output device. Wherein the memory is configured to store program code; the processor is configured to perform the method of the present invention according to instructions in the program code stored in the memory.
By way of example, and not limitation, computer readable media may comprise computer storage media and communication media. Computer-readable media includes both computer storage media and communication media. Computer storage media store information such as computer readable instructions, data structures, program modules or other data. Communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media. Combinations of any of the above are also included within the scope of computer readable media.
Furthermore, some of the described embodiments are described herein as a method or combination of method elements that can be performed by a processor of a computer system or by other means of performing the described functions. A processor having the necessary instructions for carrying out the method or method elements thus forms a means for carrying out the method or method elements. Further, the elements of the apparatus embodiments described herein are examples of the following apparatus: the apparatus is used to implement the functions performed by the elements for the purpose of carrying out the invention.
As used herein, unless otherwise specified the use of the ordinal adjectives "first", "second", "third", etc., to describe a common object, merely indicate that different instances of like objects are being referred to, and are not intended to imply that the objects so described must be in a given sequence, either temporally, spatially, in ranking, or in any other manner.
While the invention has been described with respect to a limited number of embodiments, those skilled in the art, having benefit of this description, will appreciate that other embodiments can be devised which do not depart from the scope of the invention as described herein. Furthermore, it should be noted that the language used in the specification has been principally selected for readability and instructional purposes, and may not have been selected to delineate or circumscribe the inventive subject matter. Accordingly, many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the appended claims. The present invention has been disclosed in an illustrative rather than a restrictive sense, and the scope of the present invention is defined by the appended claims.