WO2020106393A2

WO2020106393A2 - Skeletal maturity determination using radiographs of portions of a hand

Info

Publication number: WO2020106393A2
Application number: PCT/US2019/057245
Authority: WO
Inventors: Nakul Edula REDDY; Jesse Christ RAYAN; Jason Herman KAN; Ananth Annapragada; Wei Zhang
Original assignee: Baylor College Of Medicine; Texas Children's Hospital
Priority date: 2018-10-23
Filing date: 2019-10-21
Publication date: 2020-05-28
Also published as: WO2020106393A3

Abstract

Skeletal age determination using neural networks can use images of only a portion of a patient's hand. For example, rather than train a neural network using entire hand radiographs, the neural network may be trained with radiographs of only digits of the hand, a few digits of the hand, or a single digit of the hand. The neural network can then be used to process radiographs of a patients' digit or digits to obtain a skeletal maturity for the patient. Processing radiographs of individual digits rather than entire hands allows lower-quality (e.g., higher noise) radiographs, such as low-dose radiographs, to be used to accurately determine skeletal maturity.

Description

SKELETAL MATURITY DETERMINATION

USING RADIOGRAPHS OF PORTIONS OF A HAND

CROSS-REFERENCE TO RELATED PATENT APPLICATIONS

[0001] This application claims the benefit of priority of U.S . Provisional Patent No. 62/749,435 filed October 23, 2018, and entitled “Skeletal Maturity Determination Using Radiographs of Portions of a Hand,” which is hereby incorporated by reference.

FIELD OF THE DISCLOSURE

[0002] The instant disclosure relates to radiograph image processing. More specifically, portions of this disclosure relate to determining skeletal maturity of a patient by processing radiograph images.

BACKGROUND

[0003] Conventionally, a clinical determination of skeletal maturity is obtained by radiologists interpreting a radiograph of the left hand. This is sometimes referred to as a bone-age study. Skeletal maturity determined by hand bone-age is used as an adjunct in the care of pediatric patients with endocrine or metabolic conditions, as well as for determining the correct skeletal age to perform physeal or non-physeal sparing anterior cruciate ligament repairs and the operative or non-operative treatment of scoliosis. Left-hand radiographs are conventionally analyzed by radiologists using, for example, the Greulich and Pyle method or the Tanner- Whitehouse method. Computational techniques can also be used to determine bone age, such as deep learning, by analyzing the entire hand radiograph. Deep learning, usually through convolutional neural networks (CNNs), represents a set of algorithms in machine learning that attempt to learn in multiple levels, corresponding to different levels of abstraction. In the setting of radiology, the algorithm takes pixel intensities and attempts to combine them to obtain features such as lines and shapes, then tissue features such as cortical/trabecular bone, and then the anatomic relation between these features, using the compositional nature of images. However, the accuracy of these computational techniques is proportional to the noise level of the obtained radiographs from the patient’s hand. The noise level of the radiographs is proportional to the quantity of radiation exposure used while obtaining the radiographs. Thus, there is a conflict in computational determination of skeletal maturity between speed and cost of obtaining radiographs for the bone age study and the accuracy of the skeletal maturity determination.

SUMMARY

[0004] Skeletal age determination using neural networks can be improved by processing images of only a portion of a patient’s hand. For example, rather than train a neural network using entire hand radiographs, the neural network may be trained with radiographs of only digits of the hand, a few digits of the hand, or a single digit of the hand. The neural network can then be used to process radiographs of a patients’ digit or digits to obtain a skeletal maturity for the patient. Processing radiographs of individual digits rather than entire hands allows lower- quality (e.g., higher noise) radiographs to be used to accurately determine skeletal maturity. This is because there is less of the hand structure in the radiograph for processing and recognition within the neural network. Thus, low-dose radiographs of individual digits can be used to determine skeletal maturity. Using low-dose radiographs reduces the amount of x-ray exposure to the patient and allows cheaper, faster, more portable x-ray sources to be used to obtain the radiographs. Low dose x-ray machines are not conventionally used in diagnostic radiology because their images are usually not of diagnostic quality to the human eye. Yet, the trained neural network of embodiments of this disclosure can use low-quality radiographs to accurately estimate bone-age. The low- quality of the radiographs, which are too low for human eyes to determine bone-age, would be non-obvious to determine bone-age from. Expert pediatric radiologists rely almost exclusively on the phalanges and interphalangeal joints to determine skeletal maturity. Conventional teaching in pediatric radiology is to evaluate each phalangeal, metacarpal joint, and carpal bones. However, neural networks according to embodiments of this disclosure focus on individual digits and phalanx, and in some embodiments a single digit and phalanx, while achieving accuracy comparable to the accepted method.

[0005] Advantages of embodiments of the disclosure include convenience for doctors because low-dose radiographs do not require shielding, which allows the radiographs to be obtained on site, such as at a provider’s office. Patients also benefit from the embodiments disclosed here in the lack of shielding and dedicated space for the shielding to obtain quicker more convenient radiographs, in some cases without an appointment. Further, the low-dose radiographs can be obtained using mobile field devices to allow on-site bone-age determination at locations remote from a provider’s office.

[0006] According to one embodiment, a method may include receiving, by a processor from a memory device, patient data comprising one or more images of one or more digits of a hand of a patient; and/or executing, by a processor, code for a convolutional neural network (CNN) using trained weights to determine a skeletal maturity of the patient based on the patient data.

[0007] In certain embodiments of the method, the trained weights were obtained based on sample patient data comprising images of one or more digits of a hand of sample patients in a sample patient data set having predetermined skeletal maturity levels; pre-processing the one or more images of one or more digits of a hand of a patient may be performed before executing code for the CNN with the one or more images; the step of pre-processing the one or more images comprises: identifying a second digit of the hand from the one or more images, and cropping the one or more images to a predetermined size with a cropped region that comprises the second digit of the hand; the one or more images does not include an entirety of the hand of the patient; and/or the one or more images comprises low-dose radiographs of the one or more digits of the hand of the patient.

[0008] According to another embodiment, a method may include training, by a processor, weights of a convolutional neural network (CNN) by executing code of the CNN on a training set of data, wherein the training is performed to establish a CNN for determining a skeletal maturity of a patient, and wherein the training is performed using a training set of data comprising images of one or more digits of a hand of sample patients.

[0009] According to another embodiment, an apparatus may include a memory device configured to store radiograph images and to store a convolutional neural network (CNN) comprising code and training weights; and/or a processor coupled to the memory device and configured to perform steps comprising receiving, by the processor from the memory device, patient data comprising one or more images of one or more digits of a hand of a patient, and/or executing, by the processor, the code for the convolutional neural network (CNN) using the training weights to determine a skeletal maturity of the patient based on the patient data.

[0010] The foregoing has outlined rather broadly certain features and technical advantages of embodiments of the present invention in order that the detailed description that follows may be better understood. Additional features and advantages will be described hereinafter that form the subject of the claims of the invention. It should be appreciated by those having ordinary skill in the art that the conception and specific embodiment disclosed may be readily utilized as a basis for modifying or designing other structures for carrying out the same or similar purposes. It should also be realized by those having ordinary skill in the art that such equivalent constructions do not depart from the spirit and scope of the invention as set forth in the appended claims. Additional features will be better understood from the following description when considered in connection with the accompanying figures. It is to be expressly understood, however, that each of the figures is provided for the purpose of illustration and description only and is not intended to limit the present invention. BRIEF DESCRIPTION OF THE DRAWINGS

[0011] For a more complete understanding of the disclosed system and methods, reference is now made to the following descriptions taken in conjunction with the accompanying drawings.

[0012] FIGURE 1 is an example radiograph of a patient’s hand calling out a single digit for processing by a neural network according to embodiments of the disclosure.

[0013] FIGURE 2 is a block diagram illustrating processing of radiographs with a neural network according to embodiments of the disclosure.

[0014] FIGURE 3 is a flow chart illustrating a method of determining skeletal maturity from radiographs according to embodiments of the disclosure.

[0015] FIGURE 4 is a block diagram illustrating training a neural network according to embodiments of the disclosure.

[0016] FIGURE 5 is a flow chart illustrating a method for pre-processing images before being applied to a neural network according to embodiments of the disclosure.

[0017] FIGURE 6A is a graph showing differences in bone-age estimates for a set of patients determined using a neural network processing whole hand radiographs.

[0018] FIGURE 6B is a graph showing differences in bone-age estimates for a set of patients determined using a neural network processing single digit radiographs according to embodiments of the disclosure.

[0019] FIGURE 7A is a table comparing results from whole hand processing and single digit processing of a radiograph to determine bone-age according to embodiments of the disclosure. [0020] FIGURE 7B is a table comparing absolute errors of bone age determined from an index finger by a radiologist and the neural network processing single digit radiographs according to embodiments of the disclosure.

[0021] FIGURE 7C is a table comparing absolute errors of bone age determined from whole hand radiographs by radiologist and the neural network processing whole hand radiographs according to embodiments of the disclosure.

[0022] FIGURE 8 is a block diagram illustrating an apparatus for x-ray skeletal maturity determinations according to embodiments of the disclosure.

DETAILED DESCRIPTION

[0023] FIGURE 1 is an example radiograph of a patient’s hand calling out a single digit for processing by a neural network according to embodiments of the disclosure. A full hand radiograph 102 is conventionally used for bone-age studies. However, radiographs of smaller portions of a hand are used for skeletal maturity determination according to embodiments of this disclosure. For example, a portion 104 of a hand comprising a single digit and phalanx may be used for determining skeletal maturity. Although a single digit is shown as portion 104 embodiments of this disclosure may use images comprising other portions of a hand, which can include more than one digit.

[0024] Radiographs of portions of a hand can be processed with a neural network to obtain a skeletal maturity. FIGURE 2 is a block diagram illustrating processing of radiographs with a neural network according to embodiments of the disclosure. A convolutional neural network (CNN) 220 may include code 222 for processing images and training weights 224 obtained by training the neural network with training data. One example CNN is Xception-v3. The CNN 220 may run on computer systems executed on a processor, such as an ARM-based or x86-based processor. In some embodiments, the CNN 220 may execute on other processors such as a graphics processing unit (GPU) or dedicated circuitry such as an application- specific integrated circuit (ASIC). Patient data 210 may be input to the CNN 220 and processed to obtain a skeletal maturity determination. The patient data 210 may include an image 212 of a single digit or portion of a hand. The patient data 210 may include other contextual information useful for determining skeletal maturity, such as gender 214. In some embodiments, the image 212 may be pre-processed before processing by the CNN 220 to identify a particular digit from the image, such as a second digit from a portion of a hand, and cropping the image 212 to an appropriate portion limited to the second digit.

[0025] A method for determining skeletal maturity is shown in FIGURE 3. FIGURE 3 is a flow chart illustrating a method of determining skeletal maturity from radiographs according to embodiments of the disclosure. A method 300 begins at block 302 with receiving patient data including one or more low-dose radiograph images of one or more digits of a hand of a patient. Then, at block 304, code for the convolutional neural network (CNN) is executed using trained weights to determine a skeletal maturity of the patient based on the received patient data.

[0026] The training weights 224 may be obtained by training the neural network with a set of training data with predetermined skeletal maturities. FIGURE 4 is a block diagram illustrating training a neural network according to embodiments of the disclosure. The CNN 220 may use code 222 to process training data 410. The training data 410 can include data for several sample patients, each of which has a corresponding radiograph image 412, gender 414, and predetermined skeletal maturity 416. The plurality of patients in training data 410 are processed by code 222 to adjust the training weights 224 and improve the accuracy of the determination of skeletal maturity by the CNN 220. In some embodiments, the training data 410 can be high-dose radiographs, even though the CNN 220 will be used to determine skeletal maturity from low-dose radiographs. The higher quality training data may result in more accurate analysis of the low-dose radiographs. “Low-dose radiographs” refer to radiographs obtained with lower x-ray doses than conventional x-ray imaging in healthcare settings. “High-dose radiographs” refer to radiographs with doses higher than the low-dose radiographs, such as used in conventional x-ray imaging in healthcare settings.

[0027] In some embodiments, the CNN, such as Xception, can have an input tailored for the patient models. For example, the CNN can have a single-channel 8-bit input for grayscale radiographs, concatenated with another input layer for gender (which would be either 1 for female or 0 for male). The gender layer can be connected to a two-unit densely-connected layer and then to the Xception grayscale model. By including this bit at the input level, rather than before the final fully connected layers, all layers of the model are provided the gender input. The output of the base Xception model can be fed into two 2048-unit densely-connected layers, into a single linear activation layer to yield the age. Adam (adaptive movement estimation) with AMSGrad can be used as an optimization algorithm. Such optimizers reduce the loss of a neural network by slowly changing the weights (parameters) of the network to minimize a predefined loss or error function. The loss function may be, for example, a mean square error, or the square of the difference between the ground truth age and prediction age.

[0028] When processing images, either as training data or patient data, the processes may be pre-processed. FIGURE 5 is a flow chart illustrating a method for pre-processing images before being applied to a neural network according to embodiments of the disclosure. Pre processing may include cropping an image to a predetermined size as shown in block 502. Next, the image may be rotated to set the digit at a particular angle at block 504. Next, the image may again be cropped and resized to a predetermined size, such as 300 x 300 pixels. The image may then be rotated or flipped at block 508 to obtain a pre-processed image for processing by a CNN.

[0029] Pre-processing may include other image processing steps. For example, patient or training data may include 8-bit PNG grayscale images, so every pixel of these 8-bit images was represented by an integer value in the range from 0 to 255. A regional convolutional neural network such as RetinaNet can be used to extract the second digit from patient data or training data. RetinaNet is a two-stage object detection architecture, although other object detection architectures may be used in some embodiments. The first stage proposes regions of object locations of various scales and aspect ratios (a large number of guesses) in the input image, and the second stage classifies these proposals into foreground or background. After the second digit is extracted by RetinaNet or other techniques, the image is cropped and saved as an image file, such as a portable network graphic (PNG) file, for processing. Images can then undergo augmentation, which involves a series of random transformations prior to being passed into the model throughout training, such as shown in FIGURE 5. Images are then converted into a 300 x 300 array (second digit) of floating-point numbers, where each pixel value is represented in the range of 0.0 to 1.0. Augmentation effectively increases the size of the dataset that is used to train the neural network and is typically done on-the-fly. These random transformations help reduce overfitting, which can be thought of as rote memorization by the CNN, thus increasing the robustness of the model. In some embodiments, zero-value or black pixels are added to the other dimension to create a square image (referred to as padding).

[0030] Processing radiographs of the second digit provides reproducible and similar bone-age results compared with processing radiographs of the whole hand because the epiphyseal bone maturation grows proportionately with the remainder of the hand in the absence of focal congenital or post-traumatic deformity. Furthermore, CNN training may be able to identify subtleties in second digit bone-age determination that may not be readily appreciated by manual interpretation by a radiologist. For a radiologist manually interpreting a bone age using all phalanges, reviewing all epiphyseal growth centers and phalangeal may improve confidence by repetitively seeing the same finding in all rays which is necessary for human interpretation. The second digit model performed with a mean absolute difference of 5.8 months on the thirteen cases above 16 years old in the test set, an age after which the physes of the second digit are already fused. This results from the CNN model training on features different from radiologists utilizing the Greulich and Pyle method including how to evaluate different parts of the hand when the physes are fussed both when evaluating the whole hand and when evaluating the second digit only. [0031] In some embodiments, the input stimuli for a model can be visualized with techniques similar to convolution in reverse, or“deconvolution.” This method inverts the data flow from the final layer to the input layer. An additional modification, filtering out negative gradients, or the elements of the network which decrease the activation of the higher layers for visualizing, allows the reconstruction of a matrix at the input layer that corresponds to regions of the input image which contribute to the model’s output. Assume a CNN with L layers, with input layer fo and output layer f_L, a forward pass may be used to populate f_L, and each of these layers may have a partial derivative, which has been calculated in the forward pass. Next, all the partial derivatives may be initialized to zero except for the final layer, which is set to the output value of interest RL. Guided backpropagation can be described using the formula:

Working backwards to Ro, the input layer yields an array with dimensions of the input image populated with values corresponding to regions of the image that contribute the most to the specified output of the neural network. This can be mapped to a grayscale image and overlaid over the original input image to produce a saliency map.

[0032] Radiographs of a portion of a hand can be used for bone-age determination because different parts of the hand provide redundant, rather than complimentary data. For example, the whole hand CNN model was attentive to all joints of the hand, including the metacarpophalangeal joints and wrist, and paradoxically not as attentive to the second digit growth centers. The second digit model was not provided this“redundant” anatomic detail and predicted bone age with a similar degree of error.

[0033] An example data set from the RSNA bone age challenge was used to test the results of a CNN operating on the second phalanx only according to embodiments of the disclosure against the results of a CNN operating on the whole hand. Bland- Altman plots were used to correlate predictions from the CNN to ground truths. FIGURE 6A is a graph showing differences in bone-age estimates for a set of patients determined using a neural network processing whole hand radiographs. FIGURE 6B is a graph showing differences in bone-age estimates for a set of patients determined using a neural network processing single digit radiographs according to embodiments of the disclosure.

[0034] A summary of the performance of both models on the test set is presented in the tables of FIGURES 7A-C. FIGURE 7A is a table comparing results from whole hand processing and single digit processing of a radiograph to determine bone-age according to embodiments of the disclosure. The mean absolute difference between the ground truth and neural network bone age determination for whole hand and index finger was similar (4.7 vs 5.1 months, p=0.14), and both values were significantly smaller than that for radiologist bone age determination from the single finger radiographs (8.0 months, P O.0001). FIGURE 7B is a table comparing absolute errors of bone age determined from an index finger by a radiologist and the neural network processing single digit radiographs according to embodiments of the disclosure. Consensus interpretation of the second digit by three human radiologists had a mean absolute error of 8 months from the ground truth, which was statistically higher than neural network-determined bone age. FIGURE 7C is a table comparing absolute errors of bone age determined from whole hand radiographs by radiologist and the neural network processing whole hand radiographs according to embodiments of the disclosure. Consensus interpretation for the whole hand by three human radiologists had a mean absolute error of 6.0 months, statistically higher than the neural network-determined bone age for the whole hand. The impact of these results may provide the foundation for changing the way radiologists approach bone age interpretation to be based on a single finger or another portion of a hand. Additionally, the neural network-based determination for age based on a single finger or portion of a hand may allow development of future small footprint point of care hardware tailored to only a single digit that are suitable for space restrictions in the ambulatory care setting particularly when a dedicated pediatric radiographic suite is not readily available or geographically nearby.

[0035] The techniques for processing radiographs to determine skeletal maturity according to embodiments of this disclosure may be implemented in an apparatus for on-site diagnosis. FIGURE 8 is a block diagram illustrating an apparatus for x-ray skeletal maturity determinations according to embodiments of the disclosure. An apparatus 800 may include a processor 802 and memory device 804. The processor 802 may retrieve radiograph images and other patient data from memory device 804 and execute code for a neural network to process the patient data and determine skeletal maturity. The CNN code and CNN training weights may be stored in the memory device 804 along with patient data or in other memory. The processor 802 and memory device 804 may be included in an apparatus with x-ray source 808 and x-ray imaging device 810. The processor 802 may execute an application for obtaining radiographs of a patient’ s digit using x-ray source 808 and x-ray imaging device 810. The source 808 and imaging device 810 may be operated by the processor 802 through controller 806. The imaging device 810 may store recorded radiographs in memory device 804 to be retrieved by processor 802 for determining skeletal maturity. The apparatus 800 may also include a display (not shown) for generating a user interface (UI) to guide in the obtaining of radiograph images and outputting patient diagnostic information such as skeletal maturity. The x-ray source 808 may be a low-yield source that does not require shielding to protect patients and providers. Despite a low-yield source producing high- noise radiographs, the radiographs may still be processible for skeletal maturity determinations by the neural network even though a radiologist would not be able to evaluate such an image. Thus, the apparatus 800 may be a mobile unit capable of remote site operation. [0036] The schematic flow chart diagram of FIGURE 2 is generally set forth as a logical flow chart diagram. Likewise, other operations for the circuitry are described without flow charts herein as sequences of ordered steps. The depicted order, labeled steps, and described operations are indicative of aspects of methods of the invention. Other steps and methods may be conceived that are equivalent in function, logic, or effect to one or more steps, or portions thereof, of the illustrated method. Additionally, the format and symbols employed are provided to explain the logical steps of the method and are understood not to limit the scope of the method. Although various arrow types and line types may be employed in the flow chart diagram, they are understood not to limit the scope of the corresponding method. Indeed, some arrows or other connectors may be used to indicate only the logical flow of the method. For instance, an arrow may indicate a waiting or monitoring period of unspecified duration between enumerated steps of the depicted method. Additionally, the order in which a particular method occurs may or may not strictly adhere to the order of the corresponding steps shown.

[0037] Although the present disclosure and certain representative advantages have been described in detail, it should be understood that various changes, substitutions and alterations can be made herein without departing from the spirit and scope of the disclosure as defined by the appended claims. Moreover, the scope of the present application is not intended to be limited to the particular embodiments of the process, machine, manufacture, composition of matter, means, methods and steps described in the specification. As one of ordinary skill in the art will readily appreciate from the present disclosure, processes, machines, manufacture, compositions of matter, means, methods, or steps, presently existing or later to be developed that perform substantially the same function or achieve substantially the same result as the corresponding embodiments described herein may be utilized. Accordingly, the appended claims are intended to include within their scope such processes, machines, manufacture, compositions of matter, means, methods, or steps.

Claims

CLAIMS What is claimed is:

1. A method, comprising: receiving, by a processor from a memory device, patient data comprising one or more images of a portion of a hand of a patient; and executing, by a processor, code for a convolutional neural network (CNN) using training weights to determine a skeletal maturity of the patient based on the patient data.

2. The method of claim 1 , wherein the trained weights were obtained based on sample patient data comprising images of one or more digits of a hand of sample patients in a sample patient data set having predetermined skeletal maturity levels, wherein the sample patient data comprises radiographs of the hands of a plurality of patients, and wherein the one or more images comprises low-dose radiographs of the portion of the hand of the patient.

3. The method of claim 1, further comprising pre-processing the one or more images before executing code for the CNN with the one or more images.

4. The method of claim 3, wherein the step of pre-processing the one or more images comprises: identifying a second digit of the hand from the one or more images; and cropping the one or more images to a predetermined size with a cropped region that comprises the second digit of the hand.

5. The method of claim 1, wherein the one or more images does not include an entirety of the hand of the patient.

6. The method of claim 1, wherein the one or more images comprises low-dose radiographs of the portion of the hand of the patient.

7. A method, comprising: training, by a processor, weights of a convolutional neural network (CNN) by executing code of the CNN on a training set of data, wherein the training is performed to establish a CNN for determining a skeletal maturity of a patient from an image of a portion of a hand of the patient, and wherein the training is performed using a training set of data comprising images of a portion of a hand of sample patients.

8. The method of claim 7, further comprising executing, by the processor, code for the CNN using the trained weights of the CNN on patient data comprising images of a portion of a hand of the patient to determine the skeletal maturity of a patient.

9. The method of claim 8, further comprising pre-processing the images before training weights of the CNN using the images.

10. The method of claim 9, wherein the step of pre-processing the images comprises identifying a second digit of the hand from the one or more images; and cropping the one or more images to a predetermined size with a cropped region that comprises the second digit of the hand.

11. The method of claim 7, wherein the training set of data comprises images of a second digit of the hand of the patient.

12. The method of claim 7, wherein the training set of data comprises images from one or more low-dose radiographs.

13. The method of claim 12, wherein the training set of data comprises images of a single digit from one or more low-dose radiographs.

14. An apparatus, comprising: a memory device configured to store radiograph images and to store a convolutional neural network (CNN) comprising code and training weights; and a processor coupled to the memory device and configured to perform steps comprising: receiving, by the processor from the memory device, patient data comprising one or more images of a portion of a hand of a patient; and executing, by the processor, the code for the convolutional neural network (CNN) using the training weights to determine a skeletal maturity of the patient based on the patient data.

15. The apparatus of claim 14, wherein the training weights were obtained based on sample patient data comprising images of a portion of a hand of sample patients in a sample patient data set having predetermined skeletal maturity levels.

16. The apparatus of claim 14, wherein the processor is further configured to perform steps one or more digits of the hand of the patient before executing code for the CNN with the one or more images.

17. The apparatus of claim 14, wherein the one or more images does not include an entirety of the hand of the patient.

18. The apparatus of claim 14, wherein the one or more images comprises low-dose radiographs of the portion of the hand of the patient.

19. The apparatus of claim 18, further comprising an x-ray source; and an x-ray imaging device, wherein the x-ray imaging device is configured to obtain the low-dose radiographs using the x-ray source and to store the low-dose radiographs in the memory device.

20. The apparatus of claim 19, wherein the apparatus comprises a portable x-ray device for determining skeletal maturity.