CN115423779A

CN115423779A - Method for predicting bone age of children

Info

Publication number: CN115423779A
Application number: CN202211076541.3A
Authority: CN
Inventors: 简坤元; 杨梦宁; 王思敏
Original assignee: Chongqing University
Current assignee: Chongqing University
Priority date: 2022-09-05
Filing date: 2022-09-05
Publication date: 2022-12-02

Abstract

The invention relates to a method for predicting the bone age of children, which comprises the following steps: randomly selecting partial hand X-ray from a hand X-ray public data set with bone age labels to form a data set, and adjusting all pictures to a specified size; establishing a TENet model and simultaneously establishing a training set; and taking the training set as the input of the TENet model, training the TENet model by using an Adam optimizer, and obtaining the trained TENet model when the maximum iteration times is reached. The model of the invention can accurately predict the bone age of children.

Description

Method for predicting bone age of children

Technical Field

The invention relates to the field of computer image vision and big data medical image application, in particular to a method for predicting the bone age of children.

Background

The traditional bone age assessment is usually performed by taking photographs of the subject's hand and wrist, and then interpreted by a doctor according to the photographed conditions. The explanation method includes simple counting method, atlas method, scoring method and computerized bone age scoring system. The most common methods are G-P spectroscopy and TW3 scoring; under the influence of ethnic difference and long-term trend of growth and development, the bone age standard most suitable for contemporary children in China at present is CHN-05, and in 2006, the Chinese-05 bone age standard is compiled into the industry Standard of the people's republic of China, and becomes the currently only industry bone age standard in China.

Recently, deep neural networks have been widely used in the medical field because the use of deep learning methods to automatically estimate bone age is much faster than manual bone age determination, and the accuracy rate is far better than that of conventional methods. HyunkwangLee et al use google lenet as a backbone to determine final bone age by displaying three to five reference images of G & P atlas. ToanDucBui et al used the fast-RCNN and inclusion-v 4 networks, respectively, for region of interest (ROI) detection and classification. ChuanbinLiu et al introduced an attention agent and an identification agent for suggesting distinct skeletal parts and characteristic learning and age assessment, respectively.

Although these methods can achieve good performance in BAA, they cannot objectively and accurately evaluate the bone age of contemporary children in china in the course of actual testing because of the addition of chronological changes. Based on the above problems, there is a need to find a method for studying a standardized sample based on the china-05, the standard of the china-05 is composed of normal children of chinese urban chinese in 2005, the development trend of contemporary chinese children can be truly reflected, and the palm phalange edge and the palm phalange length of the chinese children are used as reference standards, so as to obtain the correct bone age.

Disclosure of Invention

Aiming at the problems in the prior art, the technical problems to be solved by the invention are as follows: how to accurately predict the bone age of children.

In order to solve the technical problems, the invention adopts the following technical scheme: a method for pediatric bone age prediction comprising the steps of:

s100: selecting a public data set, wherein the public data set comprises a hand X-ray film with a bone age label; randomly selecting partial hand X-ray films to form a data set A, and adjusting the sizes of all pictures in the A to be 560X 560;

s200: establishing a TENet model, wherein the TENet model comprises a hand topology module, an edge feature enhancement module and a deep learning network improvement model D; the edge feature enhancement module is an improved Canny edge detection algorithm, and the improved Canny edge detection algorithm uses a bilateral filtering denoising algorithm and an Otsu algorithm;

s300: constructing a training set of a TENet model:

s310: using an automatic color balance algorithm to carry out picture preprocessing on the pictures in the A to obtain a preprocessed picture corresponding to each picture in the A;

s320: performing feature extraction processing on the pictures in the A by using a hand topology module to obtain a hand topology picture corresponding to each picture in the A;

s330: performing edge enhancement on the preprocessed image corresponding to the picture in the picture A by using an edge feature enhancement module to obtain an edge feature enhancement image corresponding to the picture in the picture A;

s340: repeating S310-S330, traversing all the hand X-ray films in A to obtain a pretreatment picture corresponding to each picture in A, a hand topological picture corresponding to each picture in A and an edge feature enhancement picture corresponding to each picture in A;

s350: all the preprocessing graphs, the hand topological graph and the edge feature enhancement graph obtained in the step S340 jointly form a training set of a TENet model, wherein the preprocessing graph, the hand topological graph and the edge feature enhancement graph corresponding to each graph form a training sample; s400: presetting maximum iteration times, presetting the initial learning rate of D as mu, and training D:

s410: all training samples in the training set are used as input of D, an Adam optimizer is used for training the D to update parameters of the D, and an MAE target loss function is used in the Adam optimizer;

stopping training when the maximum iteration times are reached to obtain a trained deep learning network improved model D';

s420: taking the TENet model using the D' as a trained TENet model;

s500: inputting an X-ray film of a hand part of a child needing to predict the bone age of unknown bone age into the trained TENet model, and outputting the X-ray film to be the predicted value of the bone age of the child.

Preferably, the structure of the deep learning network improved model D in S200 is that a deep learning network is used as a backbone network, the top-most neural network layer is removed sequentially from top to bottom, then a 3X3 down-sampling layer, a 3X3 maximum pooling layer and a full-link layer with 32 neurons are sequentially added after the bottom layer of the original backbone network, and gender information corresponding to the hand X-ray film in a is blended into image features through the full-link layer.

Preferably, the specific step of obtaining the preprocessing chart corresponding to the picture in the picture a in S310 is as follows:

s311: selecting a c picture from A:

s312: calculating the adaptive filtering intermediate variable R of any pixel point p in the c picture _c (p)；

Wherein p and q represent two pixel points on the c picture, I _c (p) and I _c (q) represents the gray level difference of p and q, respectively, d (p, q) represents the distance metric function between p and q, S _α (. Cndot.) represents a luminance expression function;

s313: calculating the L (p) value of the c-th graph, and calculating the expression as follows:

wherein L (p) represents the preprocessing result of the pixel point p, and the mapping interval of the pixel p is [0,255 ]]，[minR _c (p)，maxR _c (p)]A full domain representing L (p);

s314: repeating S312-S313, traversing all pixel points on the c picture, and preprocessing all the pixel points in the c picture to obtain a preprocessed picture corresponding to the c picture;

the problem that the contrast of a background area and ROI of a hand X-ray film is low can be solved by preprocessing the picture.

Preferably, the specific step of obtaining the hand topological graph corresponding to the picture in the picture a in S320 is as follows:

s321: selecting a c picture from A:

s322: performing planar convolution on the t-th pixel point on the c by using the horizontal direction matrix to obtain a brightness difference approximate value G of the t-th pixel point in the transverse direction _xt The specific expression is as follows:

wherein, G _xt Representing an approximation of the difference in luminance in the transverse direction, c, of the t-th pixel _t Expressing the t pixel point;

using the vertical direction matrix to carry out plane convolution with the t-th pixel point on the c to obtain a brightness difference approximate value G on the t-th pixel point in the longitudinal direction _yt The specific expression is as follows:

wherein, G _yt Representing the brightness difference approximate value of the t pixel point in the longitudinal direction;

s323: calculating the gradient approximation G of the t-th pixel point on c _t The expression is as follows:

calculating the direction of the t-th pixel point on the c, wherein the expression is as follows:

wherein, α (x) _t ,y _t ) Representing the direction of the t-th pixel point on the c;

s324: repeating S322-S323, traversing all pixel points on the c to obtain the convolution value of each pixel point, wherein the convolution values of all the pixel points form a matrix B, and the matrix B is a gradient map c';

s325: and (5) carrying out threshold value binarization processing on the c' to obtain a hand diaphysis rough foreground edge image of the c.

S326: and (4) performing position prediction on joint points on the hand diaphysis rough foreground edge graph corresponding to the c by using a MediaPipe application framework to obtain a hand topological graph of the c.

A hand topological structure diagram obtained by utilizing the rough foreground edge diagram of the hand backbone can obtain structured ROI information, and can well express the corresponding semantic feature in CHN-05, namely the length of the finger bones of the hand; can be more beneficial to the evaluation of the bone age in the later period.

Preferably, the specific steps of obtaining the edge feature enhancement map corresponding to the picture in the picture a in S330 are as follows:

s331: selecting a c picture from A:

s332: calculating the spatial proximity and the pixel value similarity of each pixel point on the c picture, and performing bilateral filtering denoising processing on the c picture, wherein the expression of the spatial proximity is as follows:

wherein i, j, k, l represents coordinate values of pixel points a and b, i.e. a (i, j) and b (k, l), delta _d Is the gaussian standard deviation;

the expression of the pixel value similarity is as follows:

wherein f (i, j) represents the pixel value of the image at the point a (i, j), and f (k, l) represents the imageThe pixel value at point b (k, l), δ _r Is a standard deviation of Gauss, W _r Calculating the difference degree between the near point a and the central point b;

setting a graph obtained after the c-th graph is subjected to bilateral filtering and denoising treatment as c ";

s333: setting a threshold value T, and dividing c' into a background and a target;

and (3) calculating to obtain the optimal pixel threshold value of c' by adopting an Otsu algorithm, wherein the calculation formula is as follows:

p ₀ +p ₁ ＝1；(9)

wherein σ ² The inter-class variance is represented as,

mean gray value, p, representing c ″ ₀ Representing the proportion of the background on the picture, p ₁ Represents the proportion of the target on the picture, and the expression is as follows:

where k = {1, \8230;, T }, n denotes the class of gray values, p _n Indicating the frequency of occurrence of each type of grey value, p ₀ Representing the sum of the probability of occurrence of each type of gray value between 0 and k, L representing the pixel level of the image, p ₁ Representing the sum of the occurrence probability of each type of gray value between k +1 and L;

m ₀ mean gray value representing background, m ₁ Representing the level of an objectMean gray value, and the expression is as follows:

s334: substituting the formula (10) into the formula (11) and simplifying to obtain the formula (16), the formula is as follows:

σ ² ＝p ₀ p ₁ (m ₀ -m ₁ ) ² ；(16)

s335: traversing all pixel points on c ', and calculating the pixel points with the same category gray value on c' through a formula (9) -a formula (16) in sequence to obtain N sigma ² ；

S336: will obtain N σ ² Sorting in descending order, selecting the sigma with the largest value ² A pixel optimal threshold as c ";

s337: and c' after 336 processing is taken as an edge feature enhancement map corresponding to the c-th map.

In a traditional Canny edge detection algorithm, linear gaussian filtering is adopted for denoising, wherein convolution with a weighting coefficient is used for performing convolution denoising on an image, which is actually mean value blurring, but gaussian blurring is performed according to weighted average, and the closer points are weighted more, the farther points are weighted less, so that edge information and feature information in the image are blurred by calculating the mean value, and many features are lost. The bilateral filtering is a nonlinear filtering method, which is a compromise treatment combining the spatial proximity and the pixel value similarity of an image, and simultaneously considers the spatial information and the gray scale similarity, namely, the purpose is achieved by introducing a spatial domain kernel and a value domain kernel. The bilateral filter can well retain image edge characteristics and filter out noise of low frequency components.

In the traditional Canny edge detection algorithm, a dual-threshold algorithm is adopted for classifying pixel points, however, the fixed threshold cannot achieve universality for each different image, so that the inherent problem that operators TC, laplacian, prewitt and Roberts cannot identify the boundary of the ossification center and the metacarpal bone end can occur. The Jinjin algorithm which is suitable for the optimal threshold value of different pictures is used for replacing manual threshold value selection, so that the uncertainty of the algorithm performance is minimized, and the final result is more in line with the semantic meaning of CHN-05, namely, the ossification center is clearly visible and is in a disc shape, the edge is smooth and continuous, and the fusion degree of epiphysis and diaphysis can be observed.

Compared with the prior art, the invention has at least the following advantages:

1. according to the model, a hand topological graph, an edge feature enhancement module and a deep learning network improvement model are used for constructing a bone age prediction evaluation model-TENet model, and the hand topological graph is used in the TENet model to enhance the extraction of information features of a hand X-ray film, so that available information is increased; by utilizing the improved Canny edge detection algorithm, namely an edge feature enhancement module, the extraction amount and accuracy of the information features of the edge part of the hand X-ray film are improved, and meanwhile, the uncertainty of the performance of the algorithm is minimized by using the Dajin algorithm to replace manual selection of a threshold, so that the final result is ensured to be more consistent with the belonged semantics of CHN-05.

2. Pulse noise often exists in a hand X-ray film, and the traditional denoising methods such as Gaussian filtering, median filtering, square frame filtering and the like cannot well protect high-frequency information, so that edge details of an image are blurred; the bilateral filtering denoising algorithm used by the invention can compromise the spatial proximity and the pixel similarity of the image, consider spatial domain information and gray level similarity, and retain the edge characteristics of the image while denoising, thereby achieving the effect of edge-preserving denoising.

3. The Otsu algorithm which is suitable for the optimal threshold value of different pictures is calculated to replace the manual threshold value selection, so that the uncertainty of the algorithm performance is minimized.

4. In data preprocessing, an automatic color balance algorithm is used for solving the problem that the contrast of a background area and an ROI of a hand X-ray film is low.

Drawings

Fig. 1 is an overall framework diagram of the tent model.

Fig. 2 is a network architecture used by the present invention.

Fig. 3 is a flow chart of a modified Canny algorithm.

FIG. 4 is a diagram of the effect of different algorithms on processing a hand X-ray film.

FIG. 5 is a graph of the processing results of a data set used by the method of the present invention. (a) - (d) are sample images using a private database, and (e) - (h) are sample images using an RSNA database.

Detailed Description

The present invention is described in further detail below. A novel automatic bone age evaluation model TENet is provided, and the purpose of horizontal fusion of multi-feature expression is achieved through feature abstraction of semantic description of bone age evaluation in CHN-05 and an algorithm. Firstly, a hand topological graph module is designed, length and position structure information of metacarpal-phalangeal bones are provided, and semantics of bone age evaluation in four directions of CHN-05 are met; secondly, a reinforced edge feature module is developed to identify the hand structure and extract effective edge information, so that the requirement of CHN-05 on bone edge definition is met, and the accuracy of bone age assessment is also enhanced.

Referring to fig. 1-5, a method for pediatric bone age prediction comprising the steps of:

s200: establishing a TENet model, wherein the TENet model comprises a hand topology module, an edge feature enhancement module and a deep learning network improvement model D, the deep learning network is the prior art, the edge feature enhancement module is an improved Canny edge detection algorithm, the Canny edge detection algorithm is the prior art, the improved Canny edge detection algorithm uses a bilateral filtering denoising algorithm and an Otsu algorithm, and the bilateral filtering denoising algorithm and the Otsu algorithm are the prior art;

the structure of the deep learning network improved model D in the S200 is that a deep learning network is used as a backbone network, the neural network layer at the top layer is removed sequentially from top to bottom, then a 3X3 down-sampling layer, a 3X3 maximum pooling layer and a full-connection layer with 32 neurons are sequentially added behind the bottom layer of the original backbone network, and gender information corresponding to the hand X-ray film in the A is fused into image characteristics through the full-connection layer.

S300: constructing a training set of the TENet model:

s310: using an automatic color balance algorithm to carry out picture preprocessing on the pictures in the A to obtain a preprocessed picture corresponding to each picture in the A; the automatic color balance algorithm is the prior art, and is used for carrying out regional adaptive filtering on an original image to finish chromatic aberration correction so as to obtain a spatial domain reconstructed image, and characteristics and valuable information in the image are highlighted, so that the image is more in line with the perception of human eyes;

the specific steps of obtaining the preprocessing chart corresponding to the picture in the picture a in the step S310 are as follows:

s311: selecting a c picture from A:

Wherein p and q represent two pixel points on the c-th image, I _c (p) and I _c (q) represents the gray-scale difference of p and q, respectively, representing the quasi-biological side-suppression, d (p, q) represents the distance metric function between p and q, which maps the region-adaptation of the filtering, S _α (. Cndot.) represents a luminance expression function;

wherein L (p) represents the preprocessing result of the pixel point p, and the mapping interval of the pixel p is [0,255 ]]，[minR _c (p)，maxR _c (p)]A full definition field representing L (p), the purpose of the full definition field being to achieve global white balance for the image;

s314: repeating S312-S313, traversing all the pixel points on the c picture, and preprocessing all the pixel points in the c picture to obtain a preprocessed picture corresponding to the c picture;

the specific steps of obtaining the hand topological graph corresponding to the picture in the picture A in the step S320 are as follows:

s321: selecting a c picture from A:

wherein, G _xt Representing an approximation of the difference in luminance in the transverse direction, c, of the t-th pixel _t Representing the t pixel point;

wherein G is _yt Expressing the brightness difference approximate value of the t-th pixel point in the longitudinal direction;

s323: calculating gradient approximate value G of the t-th pixel point on c _t The expression is as follows:

s324: repeating S322-S323, traversing all the pixel points on c to obtain the convolution value of each pixel point, wherein the convolution values of all the pixel points form a matrix B, and the matrix B is a gradient map c';

s325: and (5) carrying out threshold value binarization processing on c', wherein the threshold value binarization processing is the prior art, and obtaining the rough hand backbone foreground edge map of c.

S326: and (4) performing position prediction on joint points on the hand diaphysis rough foreground edge graph corresponding to the c by using a MediaPipe application framework, wherein the MediaPipe application framework is the prior art, and obtaining a hand topological graph of the c.

S330: performing edge enhancement on the preprocessed image corresponding to the image in the A by using an edge feature enhancement module to obtain an edge feature enhancement image corresponding to the image in the A;

the specific steps of obtaining the edge feature enhancement map corresponding to the picture in the picture a in S330 are as follows:

s331: selecting a c picture from A:

wherein i, j, k, l represents coordinate values of pixel points a and b, i.e. a (i, j) and b (k, l), delta _d Is a Gaussian standardA difference;

the expression of the pixel value similarity is as follows:

wherein f (i, j) represents the pixel value of the image at the point a (i, j), f (k, l) represents the pixel value of the image at the point b (k, l), and delta _r Is a Gaussian standard deviation, W _r Calculating the difference degree between the near point a and the central point b;

p ₀ +p ₁ ＝1；(9)

wherein σ ² The inter-class variance is expressed as a variance between classes,

whereinK = {1, \8230;, T }, n denotes a category of gray values, p _n Indicating the frequency of occurrence of each type of grey value, p ₀ Representing the sum of the probability of occurrence of each type of gray value between 0 and k, and L represents the pixel level of the image, typically 255,p ₁ Representing the sum of the occurrence probability of each type of gray value between k +1 and L;

m ₀ mean gray value, m, representing the background ₁ Represents the average gradation value of the object, and the expression is as follows:

s334: substituting equation (10) into equation (11) and simplifying to obtain equation (16), the expression is as follows:

σ ² ＝p ₀ p ₁ (m ₀ -m ₁ ) ² ；(16)

s335: traversing all pixel points on c ', and calculating the pixel points with the same category gray value on c' through formula (9) -formula (16) in turn to obtain N sigma values ² The gray level refers to the gray level to which the pixel belongs;

s336: will obtain N σ ² Sorting in descending order, selecting the sigma with the largest value ² As the pixel optimal threshold of c', in the calculation process, all pixel points are circularly traversed within the gray level range of 0-255, so that the gray level maximizing the formula (16) is obtained as the pixel optimal threshold of the picture;

S340: repeating S310-S330, traversing all the hand X-ray films in A to obtain a preprocessing image corresponding to each picture in A, a hand topological image corresponding to each picture in A and an edge feature enhancement image corresponding to each picture in A;

s350: all the preprocessing graphs, the hand topological graph and the edge feature enhancement graph obtained in the step S340 jointly form a training set of a TENet model, wherein the preprocessing graph, the hand topological graph and the edge feature enhancement graph corresponding to each graph form a training sample;

s400: presetting maximum iteration times, presetting the initial learning rate of D as mu, and training D:

s410: all training samples in a training set are used as input of D, and an Adam optimizer is used for training the D to update parameters of the D, wherein the Adam optimizer is in the prior art, and an MAE target loss function is used in the Adam optimizer;

s420: taking the TENet model using the D' as a trained TENet model;

Experimental verification

Pediatric Bone Age Assessment (BAA) is an important human physical examination that can reflect the growth potential and sexual maturation trends of humans. In clinical practice, the Chinese method for assessing and assessing maturity of carpal bones of young children (CHN-05) is a widely used method for BAA by Chinese radiologists. CHN-05 uses Metacarpophalangeal Length (ML) and metacarpophalangeal edge (MM) as reference criteria to estimate bone age. Inspired by the CHN-05 semantic description, the present invention proposes a new model, called TENet, for automatic bone age assessment. A hand topology module is designed in the TENet model to identify key positions of hands, and the key positions are used for extracting structural semantics; an edge feature enhancement module is designed to provide accurate bone edge information during the training process. The TENet model can detect edge overall information and topological structure local information, so that the aim of evaluating the bone age through multi-feature horizontal fusion is fulfilled.

The experimental result shows that the TENet model achieves the most advanced model performance of 5.35meanabsolute terror (MAE) on the public data set RSNA, and the design of the TENet model follows CHN-05 semantic logic, so that the TENet model has very good reliability and interpretability in clinical application.

The experiment was conducted to evaluate the TENet model on a private dataset and a North American society of radiology bone age assessment (RSNA-BAA) public dataset. The private data set comprises 4954 hand X-rays in the training set, and 275 hand X-rays in the verification set and the test set; the RSNA-BAA public data set contained 5611 hand X-rays in the training set, and 275 hand X-rays in the validation set and the testing set. All hand X-rays were resized to 560X 560 before processing by the tent model. The mean absolute error between the basic real bone age and the corresponding predicted bone age is also reported.

TABLE 1 ablation study of the model of the invention

EXP.	OI	PHI	ImCAT	L1	MAE
						1 (private data set)	√			√	7.15
2 (private data set)		√		√	6.92
						3 (private data set)		√	√	√	5.76
4 (public data set)		√	√	√	5.80

( EXP is Experimental and represents the Experiment; OI is originALImage and represents an original image; PHI is Pre-processdhandimage and represents a hand preprocessing diagram; imCAT is Impravadaptivethroresholdingcandyedgedetection, and represents a hand edge enhancement map; l1 is Loss1 and represents the MAE Loss function; MAE is meanabloluoteerror, representing the mean absolute loss. )

Implementation details

TENET was achieved via Tensorflow1.9 and training was completed on a NVIDIATITAN RTX GPU and 32GRAM system, which took about 8 hours. We used 100 epochs to train the whole of TENetAnd (6) carrying out the process. The batch size was 32. Initial learning rate of 3 × 10 ^-3 After 50 epochs, the decrease is 10 ^-3 The drop was 10-fold after 80 epochs. The optimizer used is Adam.

TENet model ablation experiment

The role of each module in the TENet model, namely artwork, pre-Processed Hand Images (PHI) and edge feature enhancement (ImCAT), was first studied. The experimental results are shown in table 1, without these two modules, the network degenerated to Xception, resulting in an MAE score of 7.19 months; when the preprocessing image is applied to the model instead of the original image, the obtained MAE score is 6.92 months; when the edge feature enhancement module is applied to the model, the obtained MAE score is 5.76 months, and the MAE score is improved by 1.16 months; the necessity of acquiring the edge features and the effectiveness of the ImCAT algorithm are illustrated by actual experimental data.

Combining these two components, the TENet model achieved a 5.80 month MAE on the public dataset. Taken together, the results from the private dataset over the public dataset can prove the correctness of the TENet model design elicited by the CHN-05 wrist bone standard. Note: smaller MAE values indicate better results.

TABLE 2 ablation study of ImCAT assay

Hand edge feature ablation experiment

The TENet model detects hand edge features and combines the topologies for bone age assessment. By observation, imCAT also benefits from its modular design, using peak signal-to-noise ratio (PSNR), mean Square Error (MSE), structural Similarity (SSIM), root Mean Square Error (RMSE), and Mean Absolute Error (MAE) to measure performance, with the results shown in table 2. The Traditional Canny (TC) algorithm is used as a base line, four edge calculators are used as the base line, the ImCAT algorithm used in the method is obviously superior to other algorithms, and the ImCAT has the capability of capturing the whole edge details of the hand better.

The results in the RSNA public and private datasets show that the tent model achieves a good performance. In addition, the TENet model is designed by CHN-05 prior knowledge, so that the TENet model has good interpretability and reliability for clinical practice. In addition, the invention also provides a reliable experimental basis for forming an end-to-end automatic bone age assessment framework by combining the characteristic acquisition module and the scoring regression module in an attempt.

Finally, although the present invention has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that various changes and modifications may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims

1. A method for pediatric bone age prediction, comprising: the method comprises the following steps:

s200: establishing a TENet model, wherein the TENet model comprises a hand topology module, an edge feature enhancement module and a deep learning network improvement model D, the edge feature enhancement module is an improved Canny edge detection algorithm, and the improved Canny edge detection algorithm uses a bilateral filtering denoising algorithm and a Otsu algorithm;

s300: constructing a training set of a TENet model:

s320: using a hand topology module to perform feature extraction processing on the pictures in the A to obtain a hand topology picture corresponding to each picture in the A;

s420: taking the TENet model using D' as a trained TENet model;

s500: inputting an X-ray film of the hand of the child needing to predict the unknown bone age into the trained TENet model, and outputting the X-ray film to be the predicted value of the bone age of the child.

2. The method of claim 1, wherein the bone age of the child is predicted by: the structure of the deep learning network improved model D in the S200 is that a deep learning network is used as a backbone network, the neural network layer at the top layer is removed sequentially from top to bottom, then a 3X3 down-sampling layer, a 3X3 maximum pooling layer and a full-connection layer with 32 neurons are sequentially added behind the bottom layer of the original backbone network, and gender information corresponding to the hand X-ray film in the A is fused into image characteristics through the full-connection layer.

3. A method for pediatric bone age prediction as defined in claim 2, wherein: the specific steps of obtaining the preprocessing chart corresponding to the picture in the picture a in the step S310 are as follows:

s311: selecting a c picture from A:

Wherein p and q represent two pixel points on the c-th image, I _c (p) and I _c (q) represents the gray level difference of p and q, respectively, d (p, q) represents the distance metric function between p and q, S _α () represents a luma expression function;

s314: and repeating S312-S313, traversing all the pixel points on the c picture, and preprocessing all the pixel points in the c picture to obtain a preprocessed picture corresponding to the c picture.

4. A method for pediatric bone age prediction as defined in claim 3, wherein: the specific steps of obtaining the hand topological graph corresponding to the picture in the picture A in the step S320 are as follows:

s321: selecting a c picture from A:

s322: performing planar convolution on the t-th pixel point on the c by using the horizontal direction matrix to obtain a brightness difference approximate value G of the t-th pixel point in the transverse direction _xt Detailed description of the inventionThe formula is as follows:

wherein G is _xt Representing an approximation of the difference in luminance in the transverse direction, c, of the t-th pixel _t Expressing the t pixel point;

wherein, G _yt Expressing the brightness difference approximate value of the t-th pixel point in the longitudinal direction;

5. The method of claim 4, wherein the bone age of the child is predicted by: the specific steps of obtaining the edge feature enhancement map corresponding to the picture in the picture a in S330 are as follows:

s331: selecting a c picture from A:

the expression of the pixel value similarity is as follows:

setting a graph obtained after the c-th graph is subjected to bilateral filtering denoising treatment as c;

p ₀ +p ₁ ＝1； (9)

wherein σ ² The inter-class variance is represented as,

σ ² ＝p ₀ p ₁ (m ₀ -m ₁ ) ² ； (16)

S336: will obtain N σ ² Sorting in descending order, selecting the sigma with the largest value ² Pixel optimal threshold as c ";

s337: c' after 336 processing is taken as an edge feature enhancement map corresponding to the c-th map.