CN115578373A - Bone age assessment method, device, equipment and medium based on global and local feature cooperation - Google Patents

Bone age assessment method, device, equipment and medium based on global and local feature cooperation Download PDF

Info

Publication number
CN115578373A
CN115578373A CN202211350501.3A CN202211350501A CN115578373A CN 115578373 A CN115578373 A CN 115578373A CN 202211350501 A CN202211350501 A CN 202211350501A CN 115578373 A CN115578373 A CN 115578373A
Authority
CN
China
Prior art keywords
image
global
features
feature
local
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211350501.3A
Other languages
Chinese (zh)
Inventor
惠庆磊
洪源
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Bozhao Technology Co ltd
Original Assignee
Hangzhou Bozhao Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Bozhao Technology Co ltd filed Critical Hangzhou Bozhao Technology Co ltd
Priority to CN202211350501.3A priority Critical patent/CN115578373A/en
Publication of CN115578373A publication Critical patent/CN115578373A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/0002Inspection of images, e.g. flaw detection
    • G06T7/0012Biomedical image inspection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • G06V10/765Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects using rules for classification or partitioning the feature space
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10116X-ray image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30004Biomedical image processing
    • G06T2207/30008Bone

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • Medical Informatics (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • Quality & Reliability (AREA)
  • Radiology & Medical Imaging (AREA)
  • Nuclear Medicine, Radiotherapy & Molecular Imaging (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides a bone age assessment method, a device, equipment and a medium based on global and local feature cooperation, which relate to the field of image processing and comprise the following steps: establishing an initial evaluation model and training to obtain a target evaluation model; acquiring a bone image to be evaluated, preprocessing the bone image and inputting the preprocessed bone image into a target evaluation model; extracting features of a first convolutional network line to obtain global features; adopting a pre-trained target detection model to perform recognition and cutting to obtain a plurality of sub-images containing ROI areas of preset categories; extracting the features of each sub-image by adopting a second convolution network to obtain a plurality of local features; performing convolution and normalization on the global features and the local features to obtain global context local features; after each local feature is fused with the global context local feature, the global feature and the local feature are connected, and a bone age assessment result is obtained through a full connection layer processing, so that the problem that a full-automatic bone age assessment method for fully mining data features is lacked in the prior art is solved.

Description

Bone age assessment method, device, equipment and medium based on global and local feature cooperation
Technical Field
The invention relates to the field of image processing, in particular to a bone age assessment method, device, equipment and medium based on global and local feature cooperation.
Background
The growth and development of a human can be expressed in terms of two "ages," namely, the living age (calendar age) and the biological age (bone age). The change of human skeletal development is basically similar, and the development process of each bone is continuous and staged. Bones in different stages have different morphological characteristics, so the bone age assessment can accurately reflect the growth and development level and the maturity of individuals. The bone age is applied to medical pediatrics for the earliest time, the biological age of the children can be determined, the growth and development potential and the sexual maturity trend of the children can be known early through the bone age, and the adult height of the children can be predicted through the bone age. When the difference between the bone age and the life age is within +/-1, the development is normal, and the development is advanced (premature) when the difference is more than 1; and if the person is < -1 year old, the development is laggard (short for late maturity). Abnormalities in bone age are often one aspect of some endocrine disorders in pediatrics. Assessment of bone age is therefore also of great help in the diagnosis of some paediatric endocrine disorders. In addition, bone age factors can provide scientific, objective biological age identification and are often used in athletic athlete identification and case determination in justice.
Bone age assessment is usually performed by taking X-ray photographs of the hand and wrist of the subject and interpreting the photographs by a doctor. The interpretation method can be classified into a GP atlas method compared with the standard atlas, a TW scoring method by grading bone development stages. Clinical bone age interpretation depends on the experience of experts and has the defects of large workload, long measuring period, poor repeatability, strong subjective nature, instability, unreliability and the like. GP mapping is based on standardized radiology maps of children's growth studies. Bone age was assessed by directly comparing the subject's X-ray images to a standard atlas to obtain bone age. The GP method is simple, clear and easy to use, and is widely applied internationally. But it is limited in that it is highly subjective and its accuracy cannot be guaranteed. TW scores maturity for a particular epiphysis and the age of the bone is determined by examining the age scale (currently revised as TW 3). Since the TW3 method scores maturity of each epiphysis independently, the TW3 scoring method is more objective and robust than GP. It is noteworthy, however, that the TW assessment process is very complex, requiring an experienced pediatric radiologist to spend some time completing the bone age assessment. Therefore, many researchers have been working on developing a rapid, accurate, and more objective method of bone age assessment.
As the bone age reading film has strong subjectivity of manual interpretation, the interpretation results of doctors with different ages have great difference. Also, because medical images are more costly to acquire, labeling requires a specialized radiologist, is time consuming and labor intensive, and therefore, data sets that are specialized for bone age prediction and have high quality labels are very limited. Most of the traditional automatic evaluation methods need manual design features as input, cannot meet the requirements of automation, and the performance of the traditional automatic evaluation methods cannot meet the requirements of practical application. Therefore, it is necessary to provide a fully automatic bone age assessment method which not only meets clinical requirements, but also can fully mine data characteristics.
Disclosure of Invention
In order to overcome the technical defects, the invention aims to provide a bone age assessment method, a bone age assessment device, bone age assessment equipment and a bone age assessment medium based on global and local feature cooperation, which are used for solving the problem that a full-automatic bone age assessment method which meets clinical requirements and can also fully mine data features is lacked in the prior art.
The invention discloses a bone age assessment method based on global and local feature cooperation, which comprises the following steps:
establishing an initial evaluation model and training by adopting a training sample to obtain a target evaluation model; the target evaluation model comprises a first convolution network, a target detection model, a second convolution network and a Transformer network;
acquiring a bone image to be evaluated, preprocessing the bone image to be evaluated to obtain an image to be processed, and inputting the image to be processed into the target evaluation model;
performing feature extraction on the image to be processed by adopting a first convolution network to obtain global features;
adopting a pre-trained target detection model to identify and cut the image to be processed to obtain a plurality of sub-images containing ROI areas of preset categories;
extracting the features of each sub-image by adopting a second convolution network to obtain a plurality of local features;
performing convolution and normalization on the global features and the local features by using a Transformer network to obtain global context local features;
and after fusing each local feature with the global context local feature, connecting the global feature and the local feature, and processing through a full connection layer to obtain a bone age evaluation result.
Preferably, the performing convolution and normalization on the global feature and the local feature to obtain a global context local feature includes:
processing each local feature by using a 1 x 1 convolution layer to obtain a plurality of first feature data;
processing the global features by adopting a 1 x 1 convolution layer to obtain second feature data;
processing the second characteristic data by adopting the 1 x 1 convolution layer again to obtain third characteristic data;
and multiplying the first feature data corresponding to each local feature by the second feature data, then normalizing to obtain a global context feature corresponding to each local feature, adding the global context features corresponding to each local feature, and then multiplying the summed global context features by the third feature data to obtain a global context local feature.
Preferably, the global context local feature may be expressed as:
Figure BDA0003918735470000031
wherein the content of the first and second substances,
Figure BDA0003918735470000032
is a matrix representation of the first characteristic data;
Figure BDA0003918735470000033
is a matrix representation of the second characterization data;
Figure BDA0003918735470000034
is a matrix representation of the third characteristic data; t represents the transpose of the matrix; (ii) a i is a category index of the ROI of a preset category; j is the position index of the image to be processed; d is the number of channels.
Preferably, the training with the training samples to obtain the target evaluation model includes:
collecting bone images marked with bone age information from a database, classifying the bone images according to natural years and periods based on the bone age information to form a bone image which comprises at least one group of bone age information positioned in a [ a, b) interval, wherein a and b are positive integers and used as a data set;
performing histogram equalization processing and size adjustment on each image in the data set;
randomly selecting images in the data set to generate a training set, a verification set and a test set so as to generate a training sample, and training the initial evaluation model to obtain a target evaluation model.
Preferably, before generating the training sample, the method further comprises:
performing data augmentation on the training set and the validation set;
wherein the data augmentation comprises image transposition, horizontal mirroring, rotation, translation, scaling, or scaling.
Preferably, the identifying the image to be processed by using the pre-trained target detection model, and the training the target detection model include:
acquiring a plurality of bone images from a database, and marking ROI (region of interest) areas of preset categories in advance to generate training data;
establishing a target detection model based on a YOLO network, and setting model parameters according to a ROI (region of interest) of a preset category;
and training the target detection model by adopting training data, and updating the loss function and the weight parameter to obtain the pre-trained target detection model.
Preferably, the recognizing and cutting the image to be processed by using the pre-trained target detection model to obtain a plurality of sub-images including the ROI of the preset category includes:
identifying the image to be processed by using a pre-trained target detection model to obtain the image to be processed with a plurality of prediction frames, wherein each prediction frame corresponds to an ROI (region of interest) of a preset category;
when the number of ROI (region of interest) areas in the image to be processed is lower than a preset value, discarding the ROI areas;
and when the number of the ROI areas in the image to be processed is not lower than a preset value, cutting according to the ROI areas of a preset category to obtain a plurality of sub-images.
The invention also provides a bone age assessment device based on global and local feature cooperation, which comprises:
the training module is used for establishing an initial evaluation model and training by adopting a training sample to obtain a target evaluation model; the target evaluation model comprises a first convolutional network, a target detection model, a second convolutional network and a Transformer network;
the preprocessing module is used for acquiring a bone image to be evaluated, and acquiring an image to be processed after preprocessing so as to input the image to the target evaluation model;
the global feature extraction module is used for extracting features of the image to be processed by adopting a first convolutional network to obtain global features;
the local feature extraction module is used for recognizing and cutting the image to be processed by adopting a pre-trained target detection model to obtain a plurality of sub-images containing ROI areas of preset categories; extracting the features of each sub-image by adopting a second convolution network to obtain a plurality of local features;
the processing module is used for performing volume sum normalization on the global features and the local features by using a Transformer network to obtain global context local features; and after fusing each local feature with the global context local feature, connecting the global feature and the local feature, and processing through a full connection layer to obtain a bone age evaluation result.
The invention also provides a computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the evaluation method when executing the computer program.
The invention also provides a computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the evaluation method.
After the technical scheme is adopted, compared with the prior art, the method has the following beneficial effects:
the invention provides a bone age assessment method based on a global-local cooperative Transformer, which realizes efficient and full-automatic assessment of bone age. Firstly, detecting and extracting a region of interest (ROI) concerned in a doctor film reading process, namely local information, wherein the whole image is global information, extracting global and local characteristic information respectively by utilizing convolution, and then extracting the characteristic information between the global and the local by adopting a Transformer to predict the bone age and improve the efficiency and the accuracy of bone age evaluation.
Drawings
FIG. 1 is a flow chart of a first embodiment of a bone age assessment method based on global and local feature cooperation according to the present invention;
FIG. 2 is a schematic network structure diagram of a target evaluation model in an embodiment of the bone age evaluation method based on global and local feature cooperation according to the present invention;
FIG. 3 is a schematic structural diagram of a first convolutional network and/or a second convolutional network in an embodiment of the bone age assessment method based on global and local feature cooperation according to the present invention;
FIG. 4 is a schematic processing diagram of a Transformer network obtaining global context local features in the embodiment of the bone age estimation method based on global and local feature cooperation according to the present invention;
FIG. 5 is a schematic structural diagram of a second embodiment of the bone age assessment device based on global and local feature cooperation according to the present invention;
fig. 6 is a schematic diagram of a computer apparatus according to the present invention.
Reference numerals:
8-bone age assessment means based on global and local feature cooperation; 81-a training module; 82-a pre-processing module; 83-a global feature extraction module; 84-local feature extraction module; 85-a processing module; 9-a computer device; 91-a memory; 92-processor.
Detailed Description
The advantages of the invention are further illustrated in the following description of specific embodiments in conjunction with the accompanying drawings.
Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. The following description refers to the accompanying drawings, in which like numerals refer to the same or similar elements throughout the different views, unless otherwise specified. The implementations described in the following exemplary examples do not represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the appended claims.
The terminology used in the present disclosure is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. As used in this disclosure and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items.
It is to be understood that although the terms first, second, third, etc. may be used herein to describe various information, such information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope of the present disclosure. The word "if," as used herein, may be interpreted as "at \8230; \8230when" or "when 8230; \823030when" or "in response to a determination," depending on the context.
In the description of the present invention, it is to be understood that the terms "longitudinal", "lateral", "upper", "lower", "front", "rear", "left", "right", "vertical", "horizontal", "top", "bottom", "inner", "outer", etc. indicate orientations or positional relationships based on those shown in the drawings, and are merely for convenience of description and simplicity of description, but do not indicate or imply that the device or element being referred to must have a particular orientation, be constructed in a particular orientation, and be operated, and thus, are not to be construed as limiting the present invention.
In the description of the present invention, unless otherwise specified and limited, it is to be noted that the terms "mounted," "connected," and "connected" are used in a broad sense, and for example, they may be mechanically or electrically connected, or they may be connected internally to two elements, directly or indirectly through an intermediate, and those skilled in the art will understand the specific meaning of the terms as they are used in the specific case.
In the following description, suffixes such as "module", "part", or "unit" used to indicate elements are used only for facilitating the explanation of the present invention, and do not have a specific meaning per se. Thus, "module" and "component" may be used in a mixture.
The first embodiment is as follows: the embodiment provides a bone age assessment method based on global and local feature cooperation, referring to fig. 1 and 2, including:
s100: establishing an initial evaluation model and training by adopting a training sample to obtain a target evaluation model; wherein the target evaluation model comprises a first convolutional network, a target detection model, a second convolutional network and a Transformer network (i.e. global-local Transformer Blocks in fig. 2 described below);
in this embodiment, the initial evaluation model and the target evaluation model have the same structure, the model parameters are adjusted during the training process, and the target evaluation model is obtained only after the training is completed, specifically, the training with the training samples to obtain the target evaluation model includes:
s110: collecting bone images marked with bone age information from a database, classifying the bone images according to the bone age information and the natural year cycles to form a bone image which comprises at least one group of bone age information positioned in a [ a, b) interval, wherein a and b are positive integers and used as a data set;
in the above steps, the database may be formed by collecting X-ray images from hospital radiology department, and may set male and female proportions to be half each, and classify the data in units of 12 months according to the calibrated bone age, such as [7,8 ] year into one group and [8,9 ] year into one group, thereby generating a data set including several groups.
S120: performing histogram equalization processing and size adjustment on each image in the data set;
in the above steps, each image (X-ray image) in the data set is histogram-equalized to enhance contrast, and then digitally sampled to have a size of 224 × 224, specifically, the same steps as the histogram equalization and the resizing in the step are performed in the preprocessing in step S200 described below to facilitate the subsequent modeling process.
S130: randomly selecting images in the data set to generate a training set, a verification set and a test set so as to generate a training sample, and training an initial evaluation model to obtain a target evaluation model.
In the above steps, the training set is used for inputting the model (the above initial evaluation model), the verification set is used for comparing with the output of the model, and adjusting the model parameters, after adjustment, the test set is used for testing, and if the test is successful, the model parameters can be fixed and the target evaluation model can be obtained, specifically, 80% of the collected wrist bone data can be randomly selected as the training set, 10% as the verification set, and 10% as the test set.
In order to further increase the data amount in the training samples and make the target evaluation model obtained after training more accurate, before generating the training samples, the method further includes: performing data augmentation on the training set and the validation set; wherein the data augmentation includes, but is not limited to, any one or more of image transposition, horizontal mirroring, rotation, translation, scaling, or scaling. Therefore, the data width of the training sample can be increased, and the accuracy of the trained model is improved.
S200: acquiring a bone image to be evaluated, preprocessing the bone image to be evaluated to obtain an image to be processed, and inputting the image to be processed into the target evaluation model;
in the above steps, it should be noted that, the bone influence to be evaluated is a bone image of a subject, one subject may be associated with one or more bone images, and each bone image is processed one by one, so that a plurality of images to be processed may be generated, and thus a plurality of global features may be obtained, and a plurality of local features of any ROI region may correspond to a plurality of images, so as to estimate bone age, thereby improving accuracy. It should be noted that, similar to the sample data preprocessing, the preprocessing is performed by histogram equalization to enhance contrast, and then sampling is performed digitally to size the image into a size of 224 × 224.
S300: performing feature extraction on the image to be processed by adopting a first convolution network to obtain global features;
in the above step, the first convolutional network performs feature extraction, and an S-VGGNet (see fig. 3) is adopted, specifically, the feature extraction may include four times of sampling, each time of sampling may include two 3 × 3 convolutional layers + BN + ACT layers and a max pooling layer (maxporoling), which is used to extract global features
Figure BDA0003918735470000071
j denotes a position index of the global image.
S400: adopting a pre-trained target detection model to recognize and cut the image to be processed to obtain a plurality of sub-images containing ROI areas of preset categories;
in the above steps, the target detection network is mainly used for detecting the ROI, so that corresponding sub-images are obtained subsequently according to the ROI to collect local features; specifically, the identifying the image to be processed by using the pre-trained target detection model, and training the target detection model includes:
acquiring a plurality of bone images from a database, and marking ROI (region of interest) areas of preset categories in advance to generate training data; establishing a target detection model based on a YOLO network, and setting model parameters according to a ROI (region of interest) of a preset category; and training the target detection model by adopting training data, and updating the loss function and the weight parameter to obtain the pre-trained target detection model.
In the training process, 1000 images of each age are randomly selected, ROI areas appointed by standard manual marking based on a TW method are 800 images used for training an object detection network, 200 images used for testing are used, the object detection model uses a YOLO network, categories (namely the number of categories of ROI areas of preset categories) are set to be 18, a frame is predicted to comprise confidence degrees of the object categories and the probability of each frame area on a plurality of categories, a redundant window is eliminated through non-maximum value inhibition, the prepared training set and the image marking categories (namely training data) are input into a designed deep convolution neural network (namely the object detection model) to be trained, loss function values are reduced and network weight parameters are updated, after a plurality of trainings, learned network weight parameters are obtained, and a pre-trained object detection model is generated.
Specifically, the image to be processed is identified and cut by using the pre-trained target detection model to obtain a plurality of sub-images including a preset type of ROI region, and image correction is required to be performed, specifically, the image correction method includes:
s410: identifying the image to be processed by using a pre-trained target detection model to obtain the image to be processed with a plurality of prediction frames, wherein each prediction frame corresponds to an ROI (region of interest) of a preset category;
in the above steps, after passing through the target detection model, the output image to be processed has prediction frames, each prediction frame corresponds to one of the ROI regions in the preset category, and as described above, the number of ROI regions in the preset category is set to 18, and the number of prediction frames should be 18.
S420: when the number of ROI (region of interest) areas in the image to be processed is lower than a preset value, discarding the ROI areas;
after the detection in step S410, each prediction frame corresponds to an ROI region, and the data with the ROI category number <14 (i.e. the number of the prediction frames is less than 14) is deleted, so that a part of the pictures with dysplasia or lesions is excluded, i.e. the pictures with poor development are discarded, so as to improve the accuracy of the subsequent evaluation result.
S430: and when the number of the ROI areas in the image to be processed is not lower than a preset value, cutting according to the ROI areas of a preset category to obtain a plurality of sub-images.
Specifically, 18 detected ROI regions are cropped from the corrected image (i.e., after the above-mentioned discarding operation), and the ROI blocks (i.e., the image including each ROI region) are unified into the same size through digital sampling: 64. x 64 for feature extraction in step S500 described below.
S500: extracting the features of each sub-image by adopting a second convolution network to obtain a plurality of local features;
in the above steps, the second convolution network may be set to be identical to the first convolution network, that is, S-VGGNet, or may be set to be a neural network capable of implementing feature extraction, and according to the above, each sub-image includes a predetermined ROI, feature extraction is performed on each sub-image, that is, a local feature, which may be marked as "local feature", may be obtained by performing feature extraction on each sub-image
Figure RE-GDA0003957020730000081
S600: performing convolution and normalization on the global features and the local features by using a Transformer network to obtain global context local features;
in this embodiment, referring to fig. 4, the global features obtained in step S300 and the local features obtained in step S500 are used, and then the Transformer is used to extract feature information between global and local to predict bone age. Specifically, the performing convolution and normalization on the global feature and the local feature to obtain a global context local feature includes:
s610: processing each local feature by using a 1 × 1 convolution layer to obtain a plurality of first feature data;
the above step of characterizing the above
Figure BDA0003918735470000091
A1 × 1 convolution operation is performed once to map to a new space, denoted
Figure BDA0003918735470000092
S620: processing the global features by adopting a 1 x 1 convolution layer to obtain second feature data;
the above step of characterizing the feature f G Performing a 1 × 1 convolution operation once maps to a space, denoted
Figure BDA0003918735470000093
S630: processing the second characteristic data by adopting the 1 x 1 convolution layer again to obtain third characteristic data;
the above-mentioned step is characterized by
Figure BDA0003918735470000094
Performing a convolution operation 1 × 1 once maps to a space, denoted as
Figure BDA0003918735470000095
I.e. to the characteristic f G Performing two 1 × 1 convolution operations maps to two spaces, respectively denoted by
Figure BDA0003918735470000096
(the second characteristic data) of the image,
Figure BDA0003918735470000097
(third feature data);
and S640: and multiplying the first feature data corresponding to each local feature by the second feature data, then normalizing to obtain a global context feature corresponding to each local feature, adding the global context features corresponding to each local feature, and then multiplying the summed global context features by the third feature data to obtain a global context local feature.
In the above steps, the convolution operation in S610-S630 is obtained
Figure BDA0003918735470000098
And then normalization is carried out, namely:
Figure BDA0003918735470000099
namely the global context feature corresponding to each local feature; after the global context feature information set corresponding to each local feature is obtained, the global context local feature can be obtained by the following calculation:
Figure BDA00039187354700000910
the above feature data are all summarized as matrix operation, specifically, the global context local feature may be expressed as:
Figure BDA00039187354700000911
wherein the content of the first and second substances,
Figure BDA00039187354700000912
is a matrix representation of the first characteristic data;
Figure BDA00039187354700000913
is a matrix representation of the second characterization data;
Figure BDA00039187354700000914
is a matrix representation of the third characteristic data; t represents the transpose of the matrix; i is a category index of the ROI of a preset category; j is the position index of the image to be processed; d is the number of channels.
Specifically, the computation implemented by the network structure is continuously adjusted in a target evaluation model by training the training samples in the S100 until the parameters of the network structure are determined by using an encode-decode framework of the transform network to implement the fusion computation of the local features and the global features.
S700: and after fusing each local feature with the global context local feature, connecting the global feature and the local feature, and processing through a full connection layer to obtain a bone age evaluation result.
In the above steps, based on the above S600, the global context local features may be obtained, fused with the local features, and then connected to the global features and the local features, and then mapped into a bone age range through the full connection layer, where the full connection layer is located at the end of the whole model and is responsible for converting the two-dimensional feature map output by convolution into a one-dimensional vector or information and a bone age evaluation result. It should also be noted that the local features are fused with the global context local features, and the global features and the local features are concatenated after the fused features, i.e. after the global features and the local features are fused, the local features are merged
Figure BDA0003918735470000101
And F G-L After the fusion, add
Figure BDA0003918735470000102
And
Figure BDA0003918735470000103
and not all features are fused, thereby improving the accuracy of the evaluation result.
The embodiment provides a bone age assessment method based on a global-local cooperative Transformer, and the method can be used for realizing efficient and full-automatic assessment of the bone age. Firstly, detecting and extracting a region of interest (ROI) concerned in a doctor film reading process, namely local information, wherein the whole image is global information, extracting global and local characteristic information respectively by utilizing convolution (a first convolution network and a second convolution network), and then extracting the characteristic information between the global and the local by adopting a Transformer to predict bone age, improve the efficiency and the accuracy of bone age evaluation and reduce the burden of an expert doctor.
Example two: the invention also provides a bone age assessment device 8 based on global and local feature cooperation, referring to fig. 5, comprising:
the training module 81 is used for establishing an initial evaluation model and training by adopting a training sample to obtain a target evaluation model; wherein the target evaluation model comprises a first convolutional network, a target detection model, a second convolutional network and a Transformer network (i.e. global-local Transformer Blocks in fig. 2 described below);
specifically, the database may be formed by collecting X-ray images from a hospital radiology department, half of the male and female proportions may be set, and data may be classified in units of 12 months according to the calibrated bone age, for example, the age of [7,8 ] is one group, the age of [8,9 ] is one group, thereby generating a data set including a plurality of groups, and 80% of the collected wrist bone data may be randomly selected as a training set, 10% may be a validation set, and 10% may be a test set, thereby generating a training sample for training.
The preprocessing module 82 is used for acquiring a bone image to be evaluated, and acquiring an image to be processed after preprocessing so as to input the image to the target evaluation model;
specifically, the preprocessing operation includes histogram equalization or digital sampling, etc., for enhancing contrast, or adjusting image size for subsequent processing in the target evaluation model, and can also be set as a normalization preprocessing step.
A global feature extraction module 83, configured to perform feature extraction on the image to be processed by using a first convolutional network to obtain a global feature;
specifically, the S-VGGNet can be used for feature extraction, and the global feature is the feature extracted from the whole image.
The local feature extraction module 84 is configured to identify and cut the to-be-processed image by using a pre-trained target detection model, and obtain a plurality of sub-images including a preset type of ROI region; extracting the features of each sub-image by adopting a second convolution network to obtain a plurality of local features;
specifically, the second convolutional network may be set to be consistent with or inconsistent with the first convolutional network, and the target detection network is mainly used for detecting a preset type ROI, so that a corresponding sub-image is obtained according to the ROI subsequently to acquire local features, after the target detection, an image to be processed with a prediction frame including the preset type ROI is obtained, when the ROI is lower than a preset value, the image may be a poorly developed or diseased image, and the image is discarded, so that the influence of the image on the extraction of the local features is reduced, and further, the influence on the accuracy of the evaluation result is reduced.
The processing module 85 is configured to perform convolution and normalization on the global features and the local features by using a Transformer network to obtain global context local features; and after fusing each local feature with the global context local feature, connecting the global feature and the local feature, and processing through a full connection layer to obtain a bone age evaluation result.
Specifically, the local feature is subjected to sequential 1 × 1 convolution operation, and the global feature is subjected to 1 × 1 convolution operation twice, so that first feature data, second feature data and third feature data are obtained respectively. And multiplying the first feature data corresponding to each local feature by the second feature data, then normalizing to obtain a global context feature corresponding to each local feature, adding the global context features corresponding to each local feature, and then multiplying the sum by the third feature data to obtain a global context local feature.
The method comprises the steps of establishing an initial evaluation model by using a training module and training to obtain a target evaluation module, preprocessing a bone image acquired by a subject in a preprocessing module so as to be input into the target evaluation module, detecting and extracting a region of interest (ROI) concerned in a doctor radiographing process by using a local feature extraction module and a global feature extraction module in the target evaluation model, namely extracting a local image (sub-image), taking the whole image as a global image, respectively extracting global and local feature information by using convolution (a first convolution network and a second convolution network), then extracting feature information between the global and the local by using a Transformer network in a processing module, predicting bone age, and improving efficiency and accuracy of bone age evaluation.
Example three: in order to achieve the above object, the present invention further provides a computer device 9, the computer device may include a plurality of computer devices, the components of the bone age estimation apparatus 8 based on global and local feature cooperation according to the second embodiment may be distributed in different computer devices 9, and the computer device 9 may be a smartphone, a tablet computer, a laptop computer, a desktop computer, a rack-mounted server, etc. executing a program. The computer device of the embodiment at least includes but is not limited to: a memory 91, a processor 92 and a bone age assessment means 8 cooperating based on global and local features, which may be communicatively connected to each other via a system bus. With reference to fig. 6, it is noted that fig. 6 only illustrates a computer device having components, but it is to be understood that not all illustrated components are required to be implemented, and that more or fewer components may alternatively be implemented.
In this embodiment, the memory 91 may include a program storage area and a data storage area, wherein the program storage area may store an application program required for at least one function of the system; the storage data area may store data of a user on the computer device. Further, memory 91 may include high speed random access memory, and may also include non-volatile memory, and in some embodiments memory 91 may optionally include memory 91 located remotely from the processor, which may be connected via a network. Examples of such networks include, but are not limited to, the internet, local area networks, and the like.
Processor 92 may be a Central Processing Unit (CPU), controller, microcontroller, microprocessor, or other data Processing chip in some embodiments. The processor 92 is typically used to control the overall operation of the computer device. In this embodiment, the processor 92 is configured to run the program code stored in the memory 91 or process data, for example, run the bone age assessment apparatus 8 based on global and local feature cooperation, so as to implement the bone age assessment method based on global and local feature cooperation according to the first embodiment.
It is noted that only a computer device 9 having components 91-92 is shown, but it is understood that not all of the shown components need be implemented, and that more or fewer components may be implemented instead.
Example four:
to achieve the above objects, the present invention also provides a computer-readable storage medium including a plurality of storage media such as a flash memory, a hard disk, a multimedia card, a card type memory (e.g., SD or DX memory, etc.), a magnetic disk, an optical disk, a server, etc., on which a computer program is stored, which when executed by the processor 92, implements corresponding functions. The computer-readable storage medium of the present embodiment is used for storing the bone age assessment apparatus 8 based on global and local feature cooperation, and when being executed by the processor 92, the bone age assessment method based on global and local feature cooperation of the first embodiment is implemented.
It should be noted that the embodiments of the present invention have been described in terms of preferred embodiments, and not limited to any particular form, and those skilled in the art may modify and modify the above-described embodiments in accordance with the principles of the present invention without departing from the scope of the present invention.

Claims (10)

1. A bone age assessment method based on global and local feature cooperation is characterized by comprising the following steps:
establishing an initial evaluation model and training by adopting a training sample to obtain a target evaluation model; the target evaluation model comprises a first convolution network, a target detection model, a second convolution network and a Transformer network;
acquiring a bone image to be evaluated, preprocessing the bone image to be evaluated to obtain an image to be processed, and inputting the image to be processed into the target evaluation model;
performing feature extraction on the image to be processed by adopting a first convolution network to obtain global features;
adopting a pre-trained target detection model to recognize and cut the image to be processed to obtain a plurality of sub-images containing ROI areas of preset categories;
extracting the features of each sub-image by adopting a second convolution network to obtain a plurality of local features;
performing convolution and normalization on the global features and the local features by using a Transformer network to obtain global context local features;
and after fusing each local feature with the global context local feature, connecting the global feature and the local feature, and processing through a full connection layer to obtain a bone age evaluation result.
2. The evaluation method according to claim 1, wherein the performing convolution and normalization on the global feature and the local feature to obtain a global context local feature comprises:
processing each local feature by using a 1 × 1 convolution layer to obtain a plurality of first feature data;
processing the global features by adopting a 1 x 1 convolution layer to obtain second feature data;
processing the second characteristic data by adopting the 1 × 1 convolution layer again to obtain third characteristic data;
and multiplying the first feature data corresponding to each local feature by the second feature data, then normalizing to obtain a global context feature corresponding to each local feature, adding the global context features corresponding to each local feature, and then multiplying the summed global context features by the third feature data to obtain a global context local feature.
3. The evaluation method according to claim 2, wherein:
the global context local feature may be expressed as:
Figure FDA0003918735460000011
wherein, the first and the second end of the pipe are connected with each other,
Figure FDA0003918735460000012
is a matrix representation of the first characteristic data;
Figure FDA0003918735460000013
is a matrix representation of the second characteristic data;
Figure FDA0003918735460000014
is a matrix representation of the third characteristic data; t represents the transpose of the matrix; i is a category index of the ROI of a preset category; j is the position index of the image to be processed; d is the number of channels.
4. The method of claim 1, wherein the training with the training samples to obtain the target evaluation model comprises:
collecting bone images marked with bone age information from a database, classifying the bone images according to natural year periods based on the bone age information, and forming a bone image which comprises at least one group of bone age information positioned in a [ a, b) interval, wherein a and b are positive integers as a data set;
performing histogram equalization processing and size adjustment on each image in the data set;
randomly selecting images in the data set to generate a training set, a verification set and a test set so as to generate a training sample, and training the initial evaluation model to obtain a target evaluation model.
5. The evaluation method of claim 4, further comprising, prior to generating the training samples:
performing data augmentation on the training set and the validation set;
wherein the data augmentation comprises image transposition, horizontal mirroring, rotation, translation, scaling, or scaling.
6. The evaluation method according to claim 1, wherein the identifying the image to be processed by using the pre-trained target detection model, and the training the target detection model comprises:
acquiring a plurality of bone images from a database, and marking ROI (region of interest) areas of preset categories in advance to generate training data;
establishing a target detection model based on a YOLO network, and setting model parameters according to a ROI (region of interest) of a preset category;
and training the target detection model by adopting training data, and updating the loss function and the weight parameter to obtain the pre-trained target detection model.
7. The evaluation method according to claim 1, wherein the cropping after the recognition of the image to be processed by using the pre-trained target detection model to obtain a plurality of sub-images including the ROI of the preset category comprises:
identifying the image to be processed by using a pre-trained target detection model to obtain the image to be processed with a plurality of prediction frames, wherein each prediction frame corresponds to an ROI (region of interest) of a preset category;
when the number of ROI (region of interest) areas in the image to be processed is lower than a preset value, discarding the ROI areas;
and when the number of the ROI areas in the image to be processed is not lower than a preset value, cutting according to the ROI areas of a preset category to obtain a plurality of sub-images.
8. An automatic bone age assessment device, comprising:
the training module is used for establishing an initial evaluation model and training by adopting a training sample to obtain a target evaluation model; the target evaluation model comprises a first convolutional network, a target detection model, a second convolutional network and a Transformer network;
the preprocessing module is used for acquiring a bone image to be evaluated, and acquiring an image to be processed after preprocessing so as to input the image to the target evaluation model;
the global feature extraction module is used for extracting features of the image to be processed by adopting a first convolutional network so as to obtain global features;
the local feature extraction module is used for recognizing and cutting the image to be processed by adopting a pre-trained target detection model to obtain a plurality of sub-images containing ROI areas of preset categories; extracting the features of each sub-image by adopting a second convolution network to obtain a plurality of local features;
the processing module is used for performing convolution and normalization on the global features and the local features by using a Transformer network to obtain global context local features; and after fusing each local feature with the global context local feature, connecting the global feature and the local feature, and processing through a full connection layer to obtain a bone age evaluation result.
9. A computer device, characterized in that the computer device comprises a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the evaluation method of any one of claims 1 to 7 when executing the computer program.
10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the evaluation method of claims 1 to 7.
CN202211350501.3A 2022-10-31 2022-10-31 Bone age assessment method, device, equipment and medium based on global and local feature cooperation Pending CN115578373A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211350501.3A CN115578373A (en) 2022-10-31 2022-10-31 Bone age assessment method, device, equipment and medium based on global and local feature cooperation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211350501.3A CN115578373A (en) 2022-10-31 2022-10-31 Bone age assessment method, device, equipment and medium based on global and local feature cooperation

Publications (1)

Publication Number Publication Date
CN115578373A true CN115578373A (en) 2023-01-06

Family

ID=84589371

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211350501.3A Pending CN115578373A (en) 2022-10-31 2022-10-31 Bone age assessment method, device, equipment and medium based on global and local feature cooperation

Country Status (1)

Country Link
CN (1) CN115578373A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116245832A (en) * 2023-01-30 2023-06-09 北京医准智能科技有限公司 Image processing method, device, equipment and storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116245832A (en) * 2023-01-30 2023-06-09 北京医准智能科技有限公司 Image processing method, device, equipment and storage medium
CN116245832B (en) * 2023-01-30 2023-11-14 浙江医准智能科技有限公司 Image processing method, device, equipment and storage medium

Similar Documents

Publication Publication Date Title
CN109389584A (en) Multiple dimensioned rhinopharyngeal neoplasm dividing method based on CNN
CN112184617B (en) Spine MRI image key point detection method based on deep learning
CN111553892B (en) Lung nodule segmentation calculation method, device and system based on deep learning
CN110021425B (en) Comparison detector, construction method thereof and cervical cancer cell detection method
CN108062749B (en) Identification method and device for levator ani fissure hole and electronic equipment
CN111598875A (en) Method, system and device for building thyroid nodule automatic detection model
CN115578372A (en) Bone age assessment method, device and medium based on target detection and convolution transformation
CN111784704B (en) MRI hip joint inflammation segmentation and classification automatic quantitative classification sequential method
CN114565613B (en) Pancreas postoperative diabetes prediction system based on there is study of supervision degree of depth subspace
CN115131642B (en) Multi-modal medical data fusion system based on multi-view subspace clustering
CN112699868A (en) Image identification method and device based on deep convolutional neural network
US20230005138A1 (en) Lumbar spine annatomical annotation based on magnetic resonance images using artificial intelligence
CN112508884A (en) Comprehensive detection device and method for cancerous region
CN112241961A (en) Chest X-ray film auxiliary diagnosis method and system based on deep convolutional neural network
CN111325754B (en) Automatic lumbar vertebra positioning method based on CT sequence image
CN115578373A (en) Bone age assessment method, device, equipment and medium based on global and local feature cooperation
CN116579975A (en) Brain age prediction method and system of convolutional neural network
CN111383222A (en) Intervertebral disc MRI image intelligent diagnosis system based on deep learning
CN111028940A (en) Multi-scale lung nodule detection method, device, equipment and medium
CN112819765A (en) Liver image processing method
CN114972278A (en) Training method based on complementary attention
CN115393351B (en) Method and device for judging cornea immune state based on Langerhans cells
CN116664932A (en) Colorectal cancer pathological tissue image classification method based on active learning
CN116862836A (en) System and computer equipment for detecting extensive organ lymph node metastasis cancer
CN116228660A (en) Method and device for detecting abnormal parts of chest film

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination