CN111340760A - Knee joint positioning method based on multitask two-stage convolutional neural network - Google Patents
Knee joint positioning method based on multitask two-stage convolutional neural network Download PDFInfo
- Publication number
- CN111340760A CN111340760A CN202010097868.3A CN202010097868A CN111340760A CN 111340760 A CN111340760 A CN 111340760A CN 202010097868 A CN202010097868 A CN 202010097868A CN 111340760 A CN111340760 A CN 111340760A
- Authority
- CN
- China
- Prior art keywords
- knee
- knee joint
- image
- feature map
- layer
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/0002—Inspection of images, e.g. flaw detection
- G06T7/0012—Biomedical image inspection
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
- G06T7/11—Region-based segmentation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/70—Determining position or orientation of objects or cameras
- G06T7/73—Determining position or orientation of objects or cameras using feature-based methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10116—X-ray image
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30004—Biomedical image processing
- G06T2207/30008—Bone
Abstract
The invention discloses a knee joint positioning method based on a multitask two-stage convolutional neural network, and aims to improve the knee joint detection accuracy. The knee joint area positioning network based on the multitask two-stage convolutional neural network is constructed by two stages of networks, wherein the first stage of network is composed of a first feature extraction module and a preliminary target detection module, and the second stage of network is composed of a second feature extraction module and a final target detection module; training the knee joint area positioning network to obtain a trained knee joint area positioning network model; preprocessing a double-knee X-ray film to be detected, and dividing a double-knee image into single-knee images; and performing knee joint positioning on the single knee image based on the trained knee joint area positioning network to obtain a final boundary frame of the single knee image and a final key point of the knee joint area. The invention can improve the detection accuracy of knee joint positioning.
Description
Technical Field
The invention relates to the field of image processing, computer vision and target positioning, in particular to a knee joint positioning method based on a multitask two-stage convolutional neural network.
Background
Gonarthritis is a very common joint disease and can even lead to disability. The knee joint arthritis is a disease which needs to be paid high attention because the knee joint arthritis has high morbidity among the old, the obese and the sedentary population, if the knee joint arthritis is serious, the pain feeling of the sufferer can be affected and even the total joint replacement can be caused, and the annual knee joint treatment cost of each patient suffering from the knee joint arthritis can reach 19000 Euro. Currently, physicians are primarily clinically diagnosing gonarthritis on the basis of X-ray film, although there are other medical imaging categories, such as: magnetic resonance imaging, ultrasound imaging, but X-ray film is a cheap and widely used medical imaging. Clinically, the X-ray images taken by patients are medical images including legs, and gonarthritis mainly shows pathological features of narrowing of knee joint gaps, osteophyte formation and hardening, so that only knee joint areas need to be concerned. Since doctors are subjective and time-consuming and labor-consuming when reviewing X-ray films, with the development of computer technology, computer-aided methods also begin to step into the medical field, but for X-ray films including knees, whether doctors perform visual diagnosis or computer-aided diagnosis, positioning of knee joint regions is a prerequisite for diagnosis and is also a crucial step.
At present, knee joint areas are positioned on X-ray films containing double knees in a manual marking mode, namely, the knee joint areas are manually marked by checking each X-ray film based on human eyes, and the manual marking mode is time-consuming and labor-consuming. Computer-aided methods of positioning the knee joint area have emerged later, for example (document "Early d)etection of radiographic kneeosteoarthritis using computer-aided analysis[J]Osteoarthritis and Cartilage,2009,17(10): 1307-: the computer aided analysis is utilized to discover radioactive gonarthritis in early stage, and the osteoarthritis and cartilage journal) proposes that a template matching method is adopted to automatically detect knee joint parts, and the method is slow for large data sets and low in knee joint detection accuracy rate. (document "Quantifying radiographic marking using decreasing volumetric network [ C ]]//201623rdInternational Conference on Pattern Recognition (ICPR). IEEE,2016: 1195-: the severity of the radioactive gonarthritis is quantified by using a deep convolutional neural network, and a method based on Sobel horizontal image gradients and a Support Vector Machine (SVM) is proposed in the 23rd international conference on pattern recognition in 2016. firstly, the knee joint center is automatically detected, and then a fixed area is extracted according to the detected knee joint center to serve as the interested knee joint part. The highest detection rate on 4496X-ray films of the knee joints in the public database OAI (https:// OAI. epi-ucsf. org/datarelease /) was 81.8%, and the highest detection rate using the template matching method was 54.4%. (document "A novel method for automatic localization of joint area on knee panel radiograms [ C)]Springer, Cham,2017: 290-: a new method for automatically positioning the knee joint area in a polished section, a Scandinavian image analysis conference, proposes a method for completing the positioning of the knee joint area by combining a direction gradient Histogram (HOG) and a Support Vector Machine (SVM). The mean IOU (IOU is cross-over ratio, i.e., the IOU cross-over ratio) was measured on 473X-ray films of the knee joints in the public data set MOST (http:// MOST. ucsf. edu)Is 0.84. The knee joint area is positioned by adopting the modes of extracting traditional characteristics and combining a traditional classifier, the missing detection rate and the false detection rate are high, and the knee joint detection rate is still to be improved. With the development of deep learning, the deep learning technology can effectively represent image features, and has been widely applied to the fields of target recognition, target detection and the like in images.
Disclosure of Invention
The technical problem to be solved by the invention is as follows: in order to further improve the knee joint detection accuracy, a knee joint positioning method based on a multitask two-stage convolutional neural network is provided.
The technical scheme of the invention is as follows: the knee joint positioning method based on the multitask two-stage convolutional neural network mainly comprises the processes of network construction, network training, image preprocessing of an X-ray film to be detected and knee joint positioning in the X-ray film to be detected, and the overall steps are as shown in figure 1:
firstly, a knee joint area positioning network based on a multitask two-stage convolutional neural network is built.
The knee joint area positioning network based on the multitask two-stage convolution neural network comprises two stages of networks: the system comprises a first-level network and a second-level network, wherein the output of the first-level network is used as the input of the second-level network.
The first-level network is composed of a first feature extraction module and a preliminary target detection module. The first feature extraction module receives the single knee image I from the outside, extracts first image features of the single knee image I and sends the first image features to the preliminary target detection module; and the preliminary target detection module detects the first image characteristics and outputs a preliminary knee joint area in the single knee image I.
The first feature extraction module consists of 5 convolutional layers including 4 convolutional layers of 3 × 3 and 1 convolutional layer of 2 × 2 and 3 max pooling layers including 2 × 2 max pooling layers and 1 max pooling layer of 3 × 3.
The method comprises the steps of performing convolution operation on a single knee image I by a first 3 × 3 convolution layer, performing pooling operation on the single knee image I which is subjected to the convolution operation by a first 2 × 2 maximum pooling layer to obtain a feature map F1, performing convolution operation on a feature map F1 by a second 3 × 3 convolution layer, performing pooling operation on a feature map F1 which is subjected to the convolution operation by a first 3 × 3 maximum pooling layer to obtain a feature map F2, performing convolution operation on a feature map F2 by a third 3 × 3 convolution layer, performing pooling operation on a feature map F2 which is subjected to the convolution operation by a second 2 × 2 maximum pooling layer to obtain a feature map F3, performing convolution operation on a feature map F3 by a fourth 3 × 3 convolution layer to obtain a feature map F4, performing convolution operation on a fifth 2 × 2 convolution map F4 to obtain a feature map F5, and obtaining a feature map F5 which is the extracted first image feature.
The preliminary target detection module comprises a1 × 1 convolution layer, a knee joint boundary frame coordinate operation layer and a first non-maximum value inhibition screening layer, wherein the 1 × 1 convolution layer convolves the feature map F5 to obtain two groups of vectors, namely a probability vector A1 of a knee joint and a knee joint boundary frame coordinate offset vector B1, the knee joint boundary frame coordinate operation layer determines a knee joint boundary frame according to A1 and B1 to obtain a knee joint area in the single-knee image I, the first non-maximum value inhibition screening layer conducts non-maximum inhibition screening on the knee joint area in the single-knee image I to obtain a preliminary knee joint area in the single-knee image I, and the single-knee image I and the preliminary knee joint area in the single-knee image I are sent to the second-level network.
The second-level network is composed of a second feature extraction module and a final target detection module. The second feature extraction module is used for carrying out feature extraction on the single knee image I received from the first-level network and the preliminary knee joint region in the single knee image I to obtain second image features; and the final target detection module performs target detection on the second image characteristics to obtain final knee joint boundary frame coordinates of the single knee image I and knee joint region key point coordinates of the single knee image I.
The second feature extraction module is composed of 4 convolutional layers, 3 maximum pooling layers and 1 fully-connected layer, the 4 convolutional layers include 3 × 3 convolutional layers and 12 × 2 convolutional layers, and the 3 maximum pooling layers include 2 × 2 maximum pooling layers and 13 × 3 maximum pooling layers.
The first 3 × convolutional layer in the second feature extraction module performs convolution operation on the preliminary knee joint region in the single knee image I, the first 2 × maximum pooling layer performs pooling operation on the preliminary knee joint region in the single knee image I which is subjected to the convolution operation to obtain a feature map F6, the second 3 × convolutional layer performs convolution operation on a feature map F6, the second 3 × maximum pooling layer performs pooling operation on the feature map F6 which is subjected to the convolution operation to obtain a feature map F7, the third 3 × convolutional layer performs convolution operation on a feature map F7, the third 2 × maximum pooling layer performs pooling operation on the feature map F7 which is subjected to the convolution operation to obtain a feature map F8, the fourth 3 × convolutional layer performs convolution operation on the feature map F8 to obtain a feature map F9, the first full-connection layer performs full-connection operation on the feature map F9 to obtain a feature map F10, and the feature map F10 is the second feature image extracted by the feature map F10.
The final target detection module comprises a second full-connection layer, a knee joint boundary frame, a key point coordinate operation layer of a knee joint area and a second non-maximum value inhibition screening layer. The second full-connection layer performs full-connection on the feature map F10 to obtain three groups of vectors, namely a probability vector A2 of the knee joint, a coordinate offset vector B2 of a boundary frame of the knee joint and a coordinate offset vector C of six key points of a knee joint region. And the key point coordinate operation layer of the knee joint boundary box and the knee joint area determines the coordinates of the knee joint boundary box and the coordinates of the key points of the knee joint area according to A2, B2 and C for the single knee image I. And the second non-maximum inhibition screening layer carries out non-maximum inhibition screening on the knee joint region and the knee joint key point coordinates in the single knee image I, and outputs the single knee image I and the final knee joint region of the single knee image I and the knee joint region key point coordinates of the single knee image I.
And secondly, training the knee joint area positioning network based on the multitask two-stage convolutional neural network.
2.1 prepare the data of the knee joint area location network based on the multitask two-stage convolution neural network.
2.1.1 preprocessing M (M is a positive integer and M is more than 2000) original images to obtain 2M single knee images subjected to histogram equalization processing, wherein the method comprises the following steps:
2.1.1.1 randomly selected M raw images from the OAI baseline public database (https:// OAI. epi-ucsf. org/datarelease/, version 11 months 2008). The original image is an X-ray medical image containing the left knee and the right knee, and due to the influence of different illumination when the X-ray medical image is shot, the X-ray medical image can be a double-knee image with a bright background and dark legs, and can also be a double-knee image with a dark background and bright legs.
2.1.1.2 uniformly transforms M original images into an image with dark background and bright legs. Firstly, double knee images with bright background and dark legs are selected from M original images, then image pixel inversion is carried out, namely, 255 is used for subtracting original pixel values, and the M original images are uniformly converted into images with dark background and bright legs.
2.1.1.3 convert M pixels of an image with a dark background and light legs to [0, 255%]I.e., processing the dark background and light legs images into a fluid 8 image by: the pixel value of each pixel in M images with dark background and bright legs is processed as follows:where P is the pixel value of any pixel in the pre-processed image, PmaxFor the maximum pixel value in the pre-processed image, PminFor minimum pixel value in the pre-processed image, PnewIs the pixel value of any one pixel in the processed image.
2.1.1.4 convert the M images of the uit 8 into 2M single knee images by: first, find the width W and half of the width of the M images of the fluid 8, respectivelyThen half the widthGet the whole and mark asFinally, from the image of the agent 8 according to the width coordinateAnd (4) cutting, namely dividing the diagram of each agent 8 into 2 single knee images, and finally obtaining 2M single knee images.
2.1.1.5 histogram equalization processing is performed on each of the 2M single knee images. The histogram equalization processing method is shown in (document "anyhow.
2.1.2 annotate the knee joint real bounding box in the 2M single knee images after histogram equalization, the method is as follows:
2.1.2.1 initializing variable m ═ 1;
2.1.2.2 manually and manually labeling 6 key points of the knee joint region of the mth single knee image. Since the doctor or the computer mainly focuses on the knee joint gap and the osteophyte part, 6 points on the boundary of the knee joint gap and the osteophyte are manually marked as main key points of the knee joint area, such as 6 key points of the right knee shown in fig. 3(a), namely a femur inner osteophyte point (FM), a femur outer osteophyte point (FL), a tibia inner osteophyte point (TM), a tibia outer osteophyte point (TL), a joint gap inner point (JSM) and a joint gap outer point (JSL). As shown in fig. 3(b), 6 key points of the left knee are also the femoral medial osteophyte point (FM), the femoral lateral osteophyte point (FL), the tibial medial osteophyte point (TM), the tibial lateral osteophyte point (TL), the joint space medial point (JSM), and the joint space lateral point (JSL), respectively. As shown, the 6 key points of the left and right knees are mirror images. The subsequent steps are processing for 6 key points, without concern for the left knee or the right knee.
2.1.2.3 marking the boundary box of the knee joint in the mth single knee image according to the key points marked manually, the method is:
2.1.2.3.1 center point coordinates (x) of 6 key points in the mth single knee image are calculated respectivelymid,ymid). Let the coordinates of 6 key points be (x)1,y1)、(x2,y2)、(x3,y3)、(x4,y4)、(x5,y5)、(x6,y6) Center point coordinates (x) of 6 key pointsmid,ymid) Comprises the following steps:
2.1.2.3.2 calculating the width w of the kneekneeSquare, squareThe method comprises the following steps: the maximum abscissa (x) of 6 key points is calculatedmax) And minimum abscissa (x)min) Maximum abscissa (x)max) And minimum abscissa (x)min) The difference therebetween is taken as the knee joint width wkneeNamely:
xmax=max(x1,x2,x3,x4,x5,x6)
xmin=min(x1,x2,x3,x4,x5,x6)
wknee=xmax-xmin
2.1.2.3.3 marking the knee joint area to obtain the real knee joint area boundary box coordinateThe coordinates represent the bounding box of the real knee joint region (upper left abscissa, upper left ordinate, lower right abscissa, lower right ordinate). The center points of the 6 key points are used as the center to expand the knee joint area by 0.65 times (the knee joint area on each graph can be framed by 0.65 times) of the width of the knee joint up, down, left and right, and the knee joint area is used as the knee joint area really interested, namely the labeled boundary box, and the calculation formula is as follows:
2.1.2.4 if M is more than or equal to 2M, rotating to 2.2; if M <2M, let M be M +1, go to 2.1.2.2.
2.2 prepare training samples for the first stage network in the knee joint area positioning network based on the multitask two-stage convolution neural network, the method is as follows:
2.2.1 initializing variable m ═ 1;
2.2.2 prepare training samples for the first level network on the mth single knee image by:
2.2.2.1 initializing variable k ═ 1;
2.2.2.2 randomly taking the kth point on the mth single knee image, and enabling the coordinate of the kth point to beBy pointAs the top left coordinate point of the randomly selected bounding box.
2.2.2.2 taking the width of the kth randomly chosen bounding boxAnd heightLet the width of the mth single knee image be wmHeight of hmAnd take wmAnd hmIs half of the minimum value of (a), and a rounding operation is performed, i.e., int (min (w)m,hm)/2),int(min(wm,hm) [ 2 ] denotes the ratio of min (w)m,hm) Rounded at [48, int (min (w) ]m,hm)/2)]Randomly taking an integer in the value range of (a) as the width of the randomly selected bounding boxAnd height
2.2.2.3 determine the coordinates of the kth randomly chosen bounding box. Order pointFor the right side of a randomly selected bounding boxThe lower coordinate point is set to be the lower coordinate point,randomly selected bounding box coordinates ofRepresenting the kth randomly chosen bounding box (upper left abscissa, upper left ordinate, lower right abscissa, lower right ordinate).
2.2.2.4 calculating the intersection ratio of the kth randomly selected bounding box and the labeled bounding box on the mth single knee imageIf it isThe kth randomly chosen bounding box is usedAs a positive sample of training, go to 2.2.2.5; if it isThe kth randomly chosen bounding box is usedAs part of the training, go to 2.2.2.5; if it isThe kth randomly chosen bounding box is usedAs a negative example of training, go 2.2.2.5.
2.2.2.5 if K < K (K >50, K is the total number of randomly selected bounding boxes on the mth single knee image), let K be K +1, turn 2.2.2.2; if K is more than or equal to K, the operation is switched to 2.2.2.6.
2.2.2.6 if M is more than or equal to 2M, turning to 2.2.3; if M <2M, let M be M +1, turn 2.2.2.
2.2.3 downscaling the training samples (including positive, partial, negative) generated in step 2.2.2 to 48 × 48 × 3 (image length × image width × image channels) since it is an RGB image, the number of image channels is 3.
2.2.4 the training sample after 2.2.3 steps of size reduction is amplified by mirror image operation, the method is: horizontally turning each training sample left and right, namely performing mirror image operation on each training sample to generate a training sample of a first-level network, wherein the training sample comprises N11Positive sample, N12Partial sample, N13A negative sample, N11、N12And N13All positive integers are determined by randomly selecting a boundary box on the knee image and calculating the size of the intersection ratio.
2.3 training the first-stage network by adopting the first-stage network training sample to obtain the trained first-stage network, wherein the method comprises the following steps:
2.3.1 initializing variable q to 1, setting values of model parameters, including setting learning rate to 0.001 and batch size to 500;
2.3.2 inputting the training sample of the first network into the first network, and outputting an overall loss function Lq,LqIs calculated as follows, with the model parameters set in step 2.3.1, by LqThe back propagation of the value updates the parameters in the first-level network to obtain the first-level network NET with updated parameters for the q-th timeq;
Where N is the number of training samples for the first stage network, and N is N11+N12+N13,Represents the loss of the ith training sample on the jth task, j is more than or equal to 1 and less than or equal to 3, and totally has 3 tasks of knee joint detection, knee joint boundary box positioning, knee joint key point positioning, αjCoefficient of importance of j-th task and weight of knee joint detection taskDegree of importance coefficient α11, importance coefficient α for knee joint bounding box positioning task20.5, importance coefficient α for knee joint key point localization task3=0;
2.3.3, if Q is more than or equal to 2 and less than or equal to Q +1, Q is the number of times of network training, Q is 10, and then 2.3.4 is carried out; otherwise, after training, the first-stage network after training, namely NET, is obtainedQTurning to 2.4;
2.3.4 input training samples of the first level network into NETq-1Outputting the q-th overall loss function LqPassing through L under model parameters set in step 2.3.1qReverse propagation of values to update NETq-1The parameter in (1) is obtained, and the first-level network NET with the parameter updated for the q-th time is obtainedqTurning to 2.3.3;
2.4 prepare training samples for the second-stage network in the knee joint area positioning network based on the multitask two-stage convolutional neural network, the method comprises the following steps:
2.4.1 initializing variable m ═ 1;
2.4.2 positioning the preliminary knee joint region of the mth single knee image I by using the trained first-level network, wherein the method comprises the following steps:
2.4.2.1 generating a picture pyramid of the mth single-knee image I, the method comprises the steps of multiplying the mth single-knee image I by different scaling factors s (s is a positive real number, and 0< s is less than or equal to 1), scaling the mth single-knee image I in different scales to obtain PP (PP is a positive integer and 0< PP <100) single-knee images with different sizes, and forming the picture pyramid of the single-knee image I, wherein the size of each image in the Picture Pyramid (PP) is recorded as H × W × 3 (the number of channels of × images of the high × images of the images is RGB images, and 3 channels exist).
2.4.2.2 initializing variable p ═ 1;
2.4.2.3 the first trained feature extraction module extracts the features of the p-th picture in the picture pyramid to obtain a first image feature F5, where F5 is(Picture height × Picture Width × feature map channel number).
2.4.2.4 the preliminary target detection module detects the first image feature F5 by using a knee joint region detection method to obtain a knee joint region in the single knee image I, wherein the knee joint region detection method is as follows:
2.4.2.4.1 the 1 × 1 convolution layer in the preliminary target detection module performs 1 × 1 convolution on the first image feature F5 of the p picture in the picture pyramid, and outputs two groups of vectors, namely the probability vector A1 of the knee jointp,A1pIs composed of Representing the number of vectors, 1 representing the dimension of each vector), vector a1pStored are probability values (in the range of 0, 1) for what is considered to be the knee joint]) When the probability value is not lower than α, the knee joint is considered, α is a first threshold value, α∈ [0.5, 1]α preferably having a value of 0.6, knee joint bounding box coordinate offset vector B1p,B1pIs composed of Representing the number of vectors, 4 representing the dimension of each vector), 4 dimensions (x) of each vector1,y1,x2,y2) Representing the bounding box (upper left abscissa offset, upper left ordinate offset, lower right abscissa offset, lower right ordinate offset).
2.4.2.4.2 Knee Joint bounding Box coordinate operation layer pairs of preliminary object detection Module A1pAnd B1pAnd screening and calculating to obtain the knee joint area in the single knee image I. Found A1pN vector positions with medium probability value not lower than αAt the same time as B1pFind the corresponding N vector positionsAnd 4-dimensional coordinate offset vector corresponding to the N positionsWherein the content of the first and second substances,and (3) coordinate offsets (upper left horizontal coordinate offset, upper left vertical coordinate offset, lower right horizontal coordinate offset, lower right vertical coordinate offset) of the nth bounding box. Calculating the coordinates of N knee joint region bounding boxes in the original single knee image IAnd framing N knee joint areas on the original drawing I according to the coordinates of the boundary frame, wherein,coordinates (upper left horizontal coordinate, upper left vertical coordinate, lower right horizontal coordinate, lower right vertical coordinate) representing the nth bounding box are calculated as follows:
wherein the content of the first and second substances,coordinate offset (upper left horizontal coordinate offset, upper left vertical coordinate offset, lower right horizontal coordinate offset, lower right vertical coordinate offset) of the nth bounding box, s (0)<s≤1) Representing the scaling factor of the original single knee image I.
2.4.2.5 if p is not less than PP, turning to 2.4.2.6; if p < PP, let p be p +1, go to 2.4.2.3.
2.4.2.6 the first non-maximum suppression screening layer employs a non-maximum suppression (NMS) algorithm to filter all knee joint regions of the mth single knee picture I. Non-Maximum Suppression (NMS) algorithm reference (document "Efficient Non-Maximum Suppression [ C ]]//18th International Conference on Pattern Recognition (ICPR 2006),20-24August 2006, Hong Kong, China. Efficient non-maximum suppression, international conference on pattern recognition at 18th conference), setting the non-maximum suppression filter threshold delta to 0.6, and screening the knee joint boundary region by the threshold to obtain the N of the mth single knee image I1(N1Is a positive integer and 0-N1≦ 100) preliminary knee joint regions.
2.4.3 prepare training samples for the second level network on the mth single knee image by:
2.4.3.1 initializing variable n1=1,k=1;
2.4.3.2 calculating the nth of the mth Single Knee image I1The intersection and combination ratio of the preliminary knee joint region and the knee joint region marked by the mth single knee image IIf it isWill n be1Taking the preliminary knee joint area as a positive sample of the second-level network training, turning to 2.4.3.3; if it isWill n be1Taking the preliminary knee joint area as a partial sample of the second-stage network training, turning to 2.4.3.3; if it isWill n be1The individual preliminary knee joint regions are used as negative examples for the second level of network training, turn 2.4.3.3.
2.4.3.3 if n1≥N1(N1Is the number of the preliminary knee joint areas in the mth single knee joint image I, and N is more than or equal to 01Less than or equal to 100), rotating 2.4.3.4; if n is1<N1Let n be1=n1+1, 2.4.3.2.
2.4.3.4 randomly selecting the kth point on the mth single knee imageWill be dottedAs the top left coordinate point of the randomly selected bounding box.
2.4.3.5 taking the width of the kth randomly chosen bounding boxAnd heightLet the width of the mth single knee image be wmHeight of hmTaking wmAnd hmIs half of the minimum value of (d), and a rounding operation is performed, i.e., int (min (w)m,hm) At [48, int (min (w) ]2)m,hm)/2)]Randomly taking an integer as the width of the randomly selected bounding boxAnd height
2.4.3.6 determine the coordinates of the kth randomly chosen bounding box: dotMarked as the bottom right coordinate point of the kth randomly chosen bounding box,the kth randomly selected bounding boxIs marked asRepresenting the kth randomly chosen bounding box (upper left abscissa, upper left ordinate, lower right abscissa, lower right ordinate).
2.4.3.7 calculating the intersection ratio of the kth randomly selected bounding box and the labeled bounding box on the mth single knee imageIf it isTaking the kth randomly selected bounding box as training data for positioning the key points of the knee joint area, and turning to 2.4.3.8; if it isGo directly to 2.4.3.8.
2.4.3.8 if K < K, let K equal to K +1, go to 2.4.3.4, if K ≧ K, go to 2.4.4.
2.4.4 if M is more than or equal to 2M, rotating to 2.4.5; if M <2M, let M be M +1, turn 2.4.2.
2.4.5 the positive, partial, and negative samples generated in steps 2.4.3-2.4.4 were down-scaled to 48 × 48 × 3 (image length × image width × image channels) since they are RGB images, the number of image channels is 3.
2.4.6 the positive sample, partial sample and negative sample after 2.4.5 steps of size reduction are subjected to mirror image operation and amplification, and the method comprises the following steps: turning each training sample in the left-right direction, namely performing mirror image operation to obtain training samples of a second-level network, wherein the training samples comprise N21Positive sample, N22Partial sample, N23A negative sample, N21、N22And N23All positive integers are determined by the initial knee joint area selected on the single knee image and the size of the intersection ratio obtained by calculation.
2.5 training the second-level network by adopting the second-level network training sample generated in the step 2.4.6 to obtain the trained second-level network, wherein the method comprises the following steps:
2.5.1 initializing variable t ═ 1, setting the values of the model parameters, including setting the learning rate to 0.0001 and the batch size batch _ size to 500;
2.5.2 inputting the training sample of the second network into the second network, and outputting an overall loss function Lt,LtIs calculated as follows, with the model parameters set in step 2.5.1, by LtThe back propagation of the values updates the parameters in the second level network to obtain the second level network NET with updated parameters for the t timet;
Wherein N is2Is the number of training samples, N2=N21+N22+N23;Represents the loss of the ith training sample on the jth task, j is more than or equal to 1 and less than or equal to 3, and totally has 3 tasks of knee joint detection, knee joint boundary box positioning, knee joint key point positioning, αjAn importance degree coefficient α showing the importance degree coefficient of the jth task and the knee joint detection task10.8, importance of knee positioning task α20.6, importance coefficient α for knee joint key point localization task3=1.5;
2.5.3, if T is equal to or more than 2 and is equal to or less than T, T is the number of times of network training, T is equal to 10, and then 2.5.4 is carried out; otherwise, after training, the second-level Network (NET) after training is obtainedTTurning to the third step;
2.5.4 inputting training sample of second-level network into NETt-1Outputting the t-th overall loss function LtPassing through L under model parameters set in step 2.5.1tReverse propagation of values to update NETt-1The parameter in (1) is obtained, and the second-level network NET with the parameter updated for the t time is obtainedt(ii) a Turn 2.5.3.
Thirdly, preprocessing the X-ray film of the double knees to be detected to obtain 2 single knee images to be detected which are processed by histogram equalization, wherein the method comprises the following steps:
3.1 if the X-ray film of the double knees to be detected is that the background is bright and the two legs are dark, converting the X-ray film of the double knees to be detected into an image that the background is dark and the two legs are bright, namely subtracting the original pixel value of the X-ray film of the double knees to be detected from 255 to be used as a new image pixel value, converting the new image pixel value into an image that the background is dark and the two legs are bright, and converting the image into 3.2; if the X-ray film of the double knees to be detected is dark background and bright legs, directly rotating to 3.2.
3.2 converting the pixels of the X-ray film to be detected into [0, 255 ]]Processing the double-knee X-ray film to be detected into double-knee images of the uint 8; the new pixel value corresponding to each pixel in the double knee image of the agent 8 is P is the pixel value of each pixel of the pre-processed image, PmaxFor maximum pixel value of pre-processed image, PminIs the minimum pixel value of the pre-processed image.
3.3 converting the double knee image of the agent 8 into a single knee image by: finding the width W and half of the width of the uint8 bipartite knee image, respectivelyThen half the widthGet the whole and mark asFinally, according to width coordinates [0, int (W/2) ] from the double knee image]、[int(W/2)+1,W-1]The truncated, i.e., double knee image of the fluid 8 is divided into 2 single knee images.
3.4, performing histogram equalization processing on 2 single knee images, wherein the 2 single knee images subjected to the histogram equalization processing are used as the single knee images to be detected.
Fourthly, performing knee joint positioning on the 2 to-be-detected single knee images obtained in the third step based on the trained knee joint area positioning network of the multitask two-stage convolutional neural network, wherein the method comprises the following steps of:
4.1 initializing variable d ═ 1;
and 4.2, generating a picture pyramid of the d-th single knee image to be detected, zooming the single knee image in different scales, wherein the zooming factor is s (0< s is less than or equal to 1), obtaining PP (0< PP <100) single knee images in different sizes, and setting the size of each image as H × W × 3 (the number of channels of the wide × image of the high × image of the image), wherein the picture pyramid of the single knee image to be detected is called as the picture pyramid of the single knee image to be detected.
4.3 processing the image pyramid of the d-th single knee image to be detected by the first-stage network in the knee joint area positioning network based on the trained multitask two-stage convolutional neural network to obtain a preliminary knee joint area, wherein the method comprises the following steps:
4.3.1 initializing variable p ═ 1;
4.3.2 the first feature extraction module extracts the features of the p image in the image pyramid of the d single knee image to be detected, and the method comprises the following steps:
4.3.2.1 convolution operation is carried out on the p picture in the picture pyramid of the d single knee image to be detected by the first 3 × 3 convolution layer in the first feature extraction module, pooling operation is carried out on the p picture after convolution operation by the first 2 × 2 maximum pooling layer, a feature image F1 is output, wherein the convolution operation step length is 1, the maximum pooling operation step length is 2, the size of the p picture in the picture pyramid is set as H × W × 3 (the number of picture channels is × pictures and × pictures, and is 3 because of RGB images), and the convolution operation is carried out on the p picture pyramid through the first convolution and maximum pooling operation layer to obtain the final image(number of channels of signature height × signature width × signature F1) size signature F1.
4.3.2.2 the second 3 × 3 convolutional layer in the first feature extraction module performs convolution operation on the feature map F1, the second 3 × 3 maximal pooling layer performs pooling operation on the feature map F1 which completes the convolution operation, and obtains a feature map F2, wherein the step size of the convolution operation is 1, the step size of the maximal pooling operation is 2,(number of channels in feature map height × feature map width × feature map F1) size feature map F1 is obtained by second convolution and maximum pooling operation layer (number of channels of signature height × signature width × signature F2) size signature F2.
4.3.2.3 the third 3 × 3 convolutional layer in the first feature extraction module performs convolution operation on the feature map F2, the third 2 × 2 maximal pooling layer performs pooling operation on the feature map F2 which completes the convolution operation, and obtains a feature map F3, wherein the step size of the convolution operation is 1, the step size of the maximal pooling operation is 2,(the number of channels of feature map height × feature map width × feature map F2) is obtained by a third convolution and a maximum pooling operation layer(number of channels of signature height × signature width × signature F3) size signature F3.
4.3.2.4 the fourth 3 × 3 convolutional layer in the first feature extraction module performs convolution operation on the feature map F3 to obtain a feature map F4, wherein the step size of the convolution operation is 1,(number of channels in feature map height × feature map width × feature map F3) feature map F3 is obtained through a fourth convolution operation layer(number of channels of signature height × signature width × signature F4) size signature F4.
4.3.2.5 the fifth 2 × 2 convolutional layer in the first feature extraction module performs convolution operation on the feature map F4 to obtainFeature map F5: the convolution operation has a step size of 1,(number of channels of feature map height × feature map width × feature map F4) feature map F4 is obtained through a fifth convolution operation layer(the feature map height × and the feature map width × are the number of channels of the feature map F5) of the size of the feature map F5, wherein the feature map F5 is the first image feature extracted by the first-level network input of the pth picture of the picture pyramid of the pth single knee image to be detected.
4.3.3 the preliminary target detection module of the first-level network detects F5 by adopting the knee joint region detection method described in 2.4.2.4 to obtain the knee joint region of the single knee image to be detected;
4.3.4 if p is less than PP, let p be p +1, change to 4.3.2; if p is not less than PP, then 4.3.5 is turned.
And 4.3.5, filtering all knee joint bounding boxes in the d-th single knee image to be detected by a non-maximum suppression (NMS) algorithm through the second non-maximum suppression screening layer. Setting a filtering threshold delta of a non-maximum suppression (NMS) algorithm to be 0.7, screening a plurality of knee joint boundary frames in the single knee image to be detected through the threshold, framing a knee joint area on the d-th single knee image to be detected by the filtered boundary frames, and outputting N as N output by the first-level network1A preliminary knee joint region.
4.4 second-level network processing the No. d single knee image to be detected1Obtaining a final knee joint region and key points of the region by the initial knee joint region, wherein the method comprises the following steps:
4.4.1 initializing variable n1=1;
4.4.2 outputting the nth image of the d single knee to be detected from the first-level network1The preliminary knee region dimensions were normalized to 48 × 48 × 3 (number of channels for image height × image width × image).
4.4.3 second-level network extracting nth image on the d-th single knee image to be detected1Opening the characteristics of the initial knee joint region image, wherein the method comprises the following steps:
4.4.3.1 the first 3 × 3 convolutional layer pair in the second feature extraction Module1Performing convolution operation on the initial knee joint region image, and performing convolution operation on the nth layer of the first 2 × 2 maximum pooling layer1Pooling is carried out on the initial knee joint region images, the convolution operation step of the output feature map F6. is 1, the maximum pooling operation step is 2, the size of the scale-normalized initial knee joint region image is 48 × 48 × 3 (the number of channels of the picture × picture width × picture is 3 because the picture is an RGB image), and a feature map F6 (the number of channels of the feature map × feature map width × feature map F6) with the size of 23 × 23 × 32 is obtained through a first convolution and maximum pooling operation layer.
4.4.3.2 the second 3 × 3 convolutional layer in the second feature extraction module performs convolution operation on the feature map F6, the second 3 × 3 largest pooling layer performs pooling operation on the feature map F6 after the convolution operation is completed, the output feature map F7. has convolution operation step size of 1, the largest pooling operation step size is 2, and the feature map F6 with the size of 23 × 23 × 32 (feature map height × feature map width × feature map F6 channel number) passes through the second convolution and largest pooling operation layer to obtain the feature map F7 with the size of 10 × 10 × 64 (feature map height × feature map width × feature map F7 channel number).
4.4.3.3 the third 3 × 3 convolutional layer in the second feature extraction module performs convolution operation on the feature map F7, the third 2 × 2 maximal pooling layer performs pooling operation on the feature map F7 which has completed the convolution operation, the output feature map F8. has a convolution operation step size of 1, the maximal pooling operation step size is 2, the feature map F7 with the size of 10 × 10 × 64 (the number of channels of the feature map F7 with the feature map height × and the feature map width ×) passes through the third convolution and maximal pooling operation layers to obtain the feature map F8 with the size of 4 × 4 × 64 (the number of channels of the feature map F8 with the feature map height × and the feature map width ×).
4.4.3.4 the feature map F8 is convolved by the fourth 3 × 3 convolution layer in the second feature extraction module, and the feature map F8 with the size of the convolution operation step of the fourth layer of the output feature map F9. is 1, 4 × 4 × 64 (the number of channels of the feature map F8 with feature map height × and feature map width ×) is processed by the fourth convolution operation layer to obtain the feature map F9 with the size of 2 × 2 × 128 (the number of channels of the feature map F9 with feature map height × and feature map width ×).
4.4.3.5 the first full-connection layer in the second feature extraction module performs full-connection operation on the feature map F9, and outputs a feature map F9 of the size of the feature map F10.2 × 2 × 128 (the number of channels of the feature map F9 with the feature map height × and the feature map width ×) to obtain a feature map F10 containing 256-dimensional vectors through the first full-connection layer, where the feature map F10 is the extracted second image feature.
4.4.4.4 the final target detection module of the second level network processes the second image feature, i.e. feature map F10, and outputs the final knee joint region and the key point coordinates of the d-th single knee image to be detected, the method is:
4.4.4.1 the second full-connection layer of the final target detection module of the second-level network performs full connection on the feature map F10 and outputs three groups of vectors, namely probability vectors of knee jointsKnee joint bounding box coordinate offset vectorKnee joint key point coordinate offset vectorWhereinThere is a 1-dimensional vector that is,there is a 4-dimensional vector that is,there is a 12-dimensional vector.
4.4.4.2 Knee joint boundary box coordinate and knee joint region key point coordinate operation layer pair of final target detection module of second-level networkAnd screening and calculating the vectors to obtain the knee joint area of the d-th single knee image to be detected and the key points of the knee joint area. Vector quantityStored are probability values (in the range of 0, 1) for what is considered to be the knee joint]) When the probability value is not lower than β, the knee joint is considered, β is a second threshold value, β∈ [0.5, 1]β preferably has a value of 0.7, ifIs greater than β, then the nth value1Zhang preliminary Knee region is preserved, n1Zhang preliminary knee joint area bounding box coordinates asDenotes the n-th1Stretching the boundary box of the primary knee joint area (left upper horizontal coordinate, left upper vertical coordinate, right lower horizontal coordinate, right lower vertical coordinate), and reserving the coordinate offset vector of the boundary box of the knee jointAnd knee joint key point coordinate offset vectorIs provided withDenotes the n-th1The offset coordinates (the offset of the upper left horizontal coordinate, the offset of the upper left vertical coordinate, the offset of the lower right horizontal coordinate, and the offset of the lower right vertical coordinate) of the boundary frame of the knee joint are setDenotes the n-th1Offset coordinates (offset abscissa, offset ordinate) of key point FM points of individual knee joint regions,denotes the n-th1Offset coordinates (offset abscissa, offset ordinate) of the critical point FL point of the individual knee joint region,denotes the n-th1Offset coordinates (offset abscissa, offset ordinate) of the key point TM points of the individual knee joint regions,denotes the n-th1Offset coordinates (offset abscissa, offset ordinate) of the critical point TL point of the individual knee joint region,denotes the n-th1Offset coordinates (offset abscissa, offset ordinate) of JSM points of individual knee joint regions,denotes the n-th1Offset coordinates (offset abscissa, offset ordinate) of JSL points of individual knee joint regions. Final n th1Boundary frame coordinates of individual knee joint(denotes the n-th1Coordinates of the bounding box of the individual knee joints (upper left abscissa, upper left ordinate, lower right abscissa, lower right ordinate)) and the coordinates of the 6 key points(denotes the n-th1The coordinates of the FM points of the individual knee regions),(denotes the n-th1The coordinates of FL points for individual knee joint regions),(denotes the n-th1The coordinates of the TM points of the individual knee joint regions),(denotes the n-th1Coordinates of TL points of the individual knee joint regions),(denotes the n-th1Coordinates of JSL points of individual knee joint regions),(denotes the n-th1Coordinates of JSM points of individual knee joint regions) are calculated as follows:
4.4.4.3 if n1<N1Let n be1=n1+1, rotating to 4.4.2; if n is1≥N1Turn 4.4.4.4.
4.4.4.4.4 the second non-maximum suppression screening layer screens the knee joint bounding box in the d-th single knee image to be detected by adopting a non-maximum suppression (NMS) algorithm to obtain a final bounding box of the d-th single knee image to be detected and a final key point of the knee joint area, wherein the filtering threshold value delta of the non-maximum suppression (NMS) algorithm is 0.7.
4.4.4.5 if d is less than 2, let d be d +1, turn 4.2; if d is more than or equal to 2, the fifth step is carried out.
And fifthly, ending.
The invention can achieve the following beneficial effects: the knee joint positioning method based on the multitask two-stage convolutional neural network improves the detection rate of knee joint positioning. The invention is compared with an HOG + SVM method (documents of' A novel method for automatic localization of joint area on knee joint plane radiograms [ C ]// Scandinavian Conference on Image analysis. Springer, Cham,2017: 290: ubuntu16.04, GPU 2080Ti, python3.6 and pytorch 0.4. When the method is tested on an OAI database (45110 single knee images), the average detection accuracy rate measured by the method is 99.93 percent, and the average detection accuracy rate measured by an HOG + SVM method is 91.73 percent; when the method is tested on an MOST database (19383 single knee images), the average detection accuracy rate measured by the method is 99.02 percent, and the average detection accuracy rate measured by an HOG + SVM method is 98.20 percent. Therefore, the knee joint positioning detection method effectively improves the knee joint positioning detection rate.
Drawings
FIG. 1 is an overall flowchart of the present invention for positioning a knee joint based on a multitasking two-stage convolutional neural network;
FIG. 2 is a logical block diagram of the multitasking two-stage convolutional neural network-based present invention;
FIG. 3 is an illustration of the present invention showing 6 key points of the left and right knees at step 2.1.2.2.
Detailed Description
FIG. 1 is an overall flowchart of the present invention for positioning a knee joint based on a multitasking two-stage convolutional neural network; as shown in fig. 1, the present invention comprises the steps of:
firstly, a knee joint area positioning network based on a multitask two-stage convolutional neural network is built.
The knee joint area positioning network based on the multitask two-stage convolutional neural network is shown in fig. 2 and comprises two stages of networks: the system comprises a first-level network and a second-level network, wherein the output of the first-level network is used as the input of the second-level network.
The first-level network is composed of a first feature extraction module and a preliminary target detection module. The first feature extraction module receives the single knee image I from the outside, extracts first image features of the single knee image I and sends the first image features to the preliminary target detection module; and the preliminary target detection module detects the first image characteristics and outputs a preliminary knee joint area in the single knee image I.
The first feature extraction module consists of 5 convolutional layers including 4 convolutional layers of 3 × 3 and 1 convolutional layer of 2 × 2 and 3 max pooling layers including 2 × 2 max pooling layers and 1 max pooling layer of 3 × 3.
The method comprises the steps of performing convolution operation on a single knee image I by a first 3 × 3 convolution layer, performing pooling operation on the single knee image I which is subjected to the convolution operation by a first 2 × 2 maximum pooling layer to obtain a feature map F1, performing convolution operation on a feature map F1 by a second 3 × 3 convolution layer, performing pooling operation on a feature map F1 which is subjected to the convolution operation by a first 3 × 3 maximum pooling layer to obtain a feature map F2, performing convolution operation on a feature map F2 by a third 3 × 3 convolution layer, performing pooling operation on a feature map F2 which is subjected to the convolution operation by a second 2 × 2 maximum pooling layer to obtain a feature map F3, performing convolution operation on a feature map F3 by a fourth 3 × 3 convolution layer to obtain a feature map F4, performing convolution operation on a fifth 2 × 2 convolution map F4 to obtain a feature map F5, and obtaining a feature map F5 which is the extracted first image feature.
The preliminary target detection module comprises a1 × 1 convolution layer, a knee joint boundary frame coordinate operation layer and a first non-maximum value inhibition screening layer, wherein the 1 × 1 convolution layer convolves the feature map F5 to obtain two groups of vectors, namely a probability vector A1 of a knee joint and a knee joint boundary frame coordinate offset vector B1, the knee joint boundary frame coordinate operation layer determines a knee joint boundary frame according to A1 and B1 to obtain a knee joint area in the single-knee image I, the first non-maximum value inhibition screening layer conducts non-maximum inhibition screening on the knee joint area in the single-knee image I to obtain a preliminary knee joint area in the single-knee image I, and the single-knee image I and the preliminary knee joint area in the single-knee image I are sent to the second-level network.
The second-level network is composed of a second feature extraction module and a final target detection module. The second feature extraction module is used for carrying out feature extraction on the single knee image I received from the first-level network and the preliminary knee joint region in the single knee image I to obtain second image features; and the final target detection module performs target detection on the second image characteristics to obtain final knee joint boundary frame coordinates of the single knee image I and knee joint region key point coordinates of the single knee image I.
The second feature extraction module is composed of 4 convolutional layers, 3 maximum pooling layers and 1 fully-connected layer, the 4 convolutional layers include 3 × 3 convolutional layers and 12 × 2 convolutional layers, and the 3 maximum pooling layers include 2 × 2 maximum pooling layers and 13 × 3 maximum pooling layers.
The first 3 × convolutional layer in the second feature extraction module performs convolution operation on the preliminary knee joint region in the single knee image I, the first 2 × maximum pooling layer performs pooling operation on the preliminary knee joint region in the single knee image I which is subjected to the convolution operation to obtain a feature map F6, the second 3 × convolutional layer performs convolution operation on a feature map F6, the second 3 × maximum pooling layer performs pooling operation on the feature map F6 which is subjected to the convolution operation to obtain a feature map F7, the third 3 × convolutional layer performs convolution operation on a feature map F7, the third 2 × maximum pooling layer performs pooling operation on the feature map F7 which is subjected to the convolution operation to obtain a feature map F8, the fourth 3 × convolutional layer performs convolution operation on the feature map F8 to obtain a feature map F9, the first full-connection layer performs full-connection operation on the feature map F9 to obtain a feature map F10, and the feature map F10 is the second feature image extracted by the feature map F10.
The final target detection module comprises a second full-connection layer, a knee joint boundary frame, a key point coordinate operation layer of a knee joint area and a second non-maximum value inhibition screening layer. The second full-connection layer performs full-connection on the feature map F10 to obtain three groups of vectors, namely a probability vector A2 of the knee joint, a coordinate offset vector B2 of a boundary frame of the knee joint and a coordinate offset vector C of six key points of a knee joint region. And the key point coordinate operation layer of the knee joint boundary box and the knee joint area determines the coordinates of the knee joint boundary box and the coordinates of the key points of the knee joint area according to A2, B2 and C for the single knee image I. And the second non-maximum inhibition screening layer carries out non-maximum inhibition screening on the knee joint region and the knee joint key point coordinates in the single knee image I, and outputs the single knee image I and the final knee joint region of the single knee image I and the knee joint region key point coordinates of the single knee image I.
And secondly, training the knee joint area positioning network based on the multitask two-stage convolutional neural network.
2.1 prepare the data of the knee joint area location network based on the multitask two-stage convolution neural network.
2.1.1 preprocessing M (M is a positive integer and M is more than 2000) original images to obtain 2M single knee images subjected to histogram equalization processing, wherein the method comprises the following steps:
2.1.1.1 randomly selected M raw images from the OAI baseline public database (https:// OAI. epi-ucsf. org/datarelease/, version 11 months 2008). The original image is an X-ray medical image containing the left and right knees.
2.1.1.2 uniformly transforms M original images into an image with dark background and bright legs. Firstly, double knee images with bright background and dark legs are selected from M original images, then image pixel inversion is carried out, namely, 255 is used for subtracting original pixel values, and the M original images are uniformly converted into images with dark background and bright legs.
2.1.1.3 convert M pixels of an image with a dark background and light legs to [0, 255%]I.e., processing the dark background and light legs images into a fluid 8 image by: the pixel value of each pixel in M images with dark background and bright legs is processed as follows:where P is the pixel value of any pixel in the pre-processed image, PmaxFor the maximum pixel value in the pre-processed image, PminFor minimum pixel value in the pre-processed image, PnewIs the pixel value of any one pixel in the processed image.
2.1.1.4 convert the M images of the uit 8 into 2M single knee images by: first, find the width W and half of the width of the M images of the fluid 8, respectivelyThen half the widthGet the whole and mark asFinally, from the image of the agent 8 according to the width coordinateAnd (4) cutting, namely dividing the diagram of each agent 8 into 2 single knee images, and finally obtaining 2M single knee images.
2.1.1.5 histogram equalization processing is performed on each of the 2M single knee images. The histogram equalization processing method is disclosed in the document "anybrillphi. application of histogram equalization to image processing [ J ] scientific and technical information, 2007(04): pages 39-40 ].
2.1.2 annotate the knee joint real bounding box in the 2M single knee images after histogram equalization, the method is as follows:
2.1.2.1 initializing variable m ═ 1;
2.1.2.2 manually and manually labeling 6 key points of the knee joint region of the mth single knee image. Since the doctor or computer is primarily concerned with the knee joint gaps and osteophyte locations, 6 points of the knee joint gaps and osteophyte boundaries are manually marked as the main key points of the knee joint area. Fig. 3 is an illustration of labeling 6 key points of the left and right knees at step 2.1.2.2 according to the present invention, wherein the 6 key points of the right knee shown in fig. 3(a) are the medial femoral osteophyte point (FM), the lateral femoral osteophyte point (FL), the medial tibial osteophyte point (TM), the lateral tibial osteophyte point (TL), the medial articular joint space point (JSM), and the lateral articular space point (JSL), respectively. As shown in fig. 3(b), 6 key points of the left knee are also the femoral medial osteophyte point (FM), the femoral lateral osteophyte point (FL), the tibial medial osteophyte point (TM), the tibial lateral osteophyte point (TL), the joint space medial point (JSM), and the joint space lateral point (JSL), respectively. As shown, the 6 key points of the left and right knees are mirror images. The subsequent steps are processing for 6 key points, without concern for the left knee or the right knee.
2.1.2.3 marking the boundary box of the knee joint in the mth single knee image according to the key points marked manually, the method is:
2.1.2.3.1 center point coordinates (x) of 6 key points in the mth single knee image are calculated respectivelymid,ymid). Let the coordinates of 6 key points be (x)1,y1)、(x2,y2)、(x3,y3)、(x4,y4)、(x5,y5)、(x6,y6) Center point coordinates (x) of 6 key pointsmid,ymid) Comprises the following steps:
2.1.2.3.2 calculating the width w of the kneekneeThe method comprises the following steps: the maximum abscissa (x) of 6 key points is calculatedmax) And minimum abscissa (x)min) Maximum abscissa (x)max) And minimum abscissa (x)min) The difference therebetween is taken as the knee joint width wkneeNamely:
xmax=max(x1,x2,x3,x4,x5,x6)
xmin=min(x1,x2,x3,x4,x5,x6)
wknee=xmax-xmin
2.1.2.3.3 marking the knee joint area to obtain the real knee joint area boundary box coordinateThe coordinates represent the bounding box of the real knee joint region (upper left abscissa, upper left ordinate, lower right abscissa, lower right ordinate). The center points of the 6 key points are used as the center to expand the knee joint area by 0.65 times (the knee joint area on each graph can be framed by 0.65 times) of the width of the knee joint up, down, left and right, and the knee joint area is used as the knee joint area really interested, namely the labeled boundary box, and the calculation formula is as follows:
2.1.2.4 if M is more than or equal to 2M, rotating to 2.2; if M <2M, let M be M +1, go to 2.1.2.2.
2.2 prepare training samples for the first stage network in the knee joint area positioning network based on the multitask two-stage convolution neural network, the method is as follows:
2.2.1 initializing variable m ═ 1;
2.2.2 prepare training samples for the first level network on the mth single knee image by:
2.2.2.1 initializing variable k ═ 1;
2.2.2.2 randomly taking the kth point on the mth single knee image, and enabling the coordinate of the kth point to beBy pointAs the top left coordinate point of the randomly selected bounding box.
2.2.2.2 taking the width of the kth randomly chosen bounding boxAnd heightLet the width of the mth single knee image be wmHeight of hmAnd take wmAnd hmIs half of the minimum value of (a), and a rounding operation is performed, i.e., int (min (w)m,hm)/2),int(min(wm,hm) [ 2 ] denotes the ratio of min (w)m,hm) Rounded at [48, int (min (w) ]m,hm)/2)]Is random in the value range ofTaking an integer as the width of the randomly selected bounding boxAnd height
2.2.2.3 determine the coordinates of the kth randomly chosen bounding box. Order pointFor the lower right coordinate point of the randomly selected bounding box,randomly selected bounding box coordinates ofRepresenting the kth randomly chosen bounding box (upper left abscissa, upper left ordinate, lower right abscissa, lower right ordinate).
2.2.2.4 calculating the intersection ratio of the kth randomly selected bounding box and the labeled bounding box on the mth single knee imageIf it isThe kth randomly chosen bounding box is usedAs a positive sample of training, go to 2.2.2.5; if it isThe kth randomly chosen bounding box is usedAs part of the training, go to 2.2.2.5; if it isThe kth randomly chosen bounding box is usedAs a negative example of training, go 2.2.2.5.
2.2.2.5 if K < K (K >50, K is the total number of randomly selected bounding boxes on the mth single knee image), let K be K +1, turn 2.2.2.2; if K is more than or equal to K, the operation is switched to 2.2.2.6.
2.2.2.6 if M is more than or equal to 2M, turning to 2.2.3; if M <2M, let M be M +1, turn 2.2.2.
2.2.3 downscaling the training samples (including positive, partial, negative) generated in step 2.2.2 to 48 × 48 × 3 (length of image × image width of × image channels).
2.2.4 the training sample after 2.2.3 steps of size reduction is amplified by mirror image operation, the method is: horizontally turning each training sample left and right, namely performing mirror image operation on each training sample to generate the training sample of the first-stage network, wherein the number of positive samples of the first-stage network is N11The number of partial samples is N12The number of negative samples is N13,N11、N12And N13Are all positive integers.
2.3 training the first-stage network by adopting the first-stage network training sample to obtain the trained first-stage network, wherein the method comprises the following steps:
2.3.1 initializing variable q to 1, setting values of model parameters, including setting learning rate to 0.001 and batch size to 500;
2.3.2 inputting the training sample of the first network into the first network, and outputting an overall loss function Lq,LqIs calculated as follows, with the model parameters set in step 2.3.1, by LqThe back propagation of the value updates the parameters in the first-level network to obtain the first-level network NET with updated parameters for the q-th timeq;
Wherein N isNumber of training samples of first-stage network, N ═ N11+N12+N13,Represents the loss of the ith training sample on the jth task, j is more than or equal to 1 and less than or equal to 3, and totally has 3 tasks of knee joint detection, knee joint boundary box positioning, knee joint key point positioning, αjAn importance degree coefficient α showing the importance degree coefficient of the jth task and the knee joint detection task11, importance coefficient α for knee joint bounding box positioning task20.5, importance coefficient α for knee joint key point localization task3=0;
2.3.3, if Q is more than or equal to 2 and less than or equal to Q +1, Q is the number of times of network training, Q is 10, and then 2.3.4 is carried out; otherwise, after training, the first-stage network after training, namely NET, is obtainedQTurning to 2.4;
2.3.4 input training samples of the first level network into NETq-1Outputting the q-th overall loss function LqPassing through L under model parameters set in step 2.3.1qReverse propagation of values to update NETq-1The parameter in (1) is obtained, and the first-level network NET with the parameter updated for the q-th time is obtainedqTurning to 2.3.3;
2.4 prepare training samples for the second-stage network in the knee joint area positioning network based on the multitask two-stage convolutional neural network, the method comprises the following steps:
2.4.1 initializing variable m ═ 1;
2.4.2 positioning the preliminary knee joint region of the mth single knee image I by using the trained first-level network, wherein the method comprises the following steps:
2.4.2.1 generating a picture pyramid of the mth single-knee image I, the method comprises the steps of multiplying the mth single-knee image I by different scaling factors s (s is a positive real number, and 0< s is less than or equal to 1), scaling the mth single-knee image I in different scales to obtain PP (PP is a positive integer and 0< PP <100) single-knee images with different sizes, and forming the picture pyramid of the single-knee image I, wherein the size of each image in the Picture Pyramid (PP) is recorded as H × W × 3 (the number of channels of × images of the high × images of the images is RGB images, and 3 channels exist).
2.4.2.2 initializing variable p ═ 1;
2.4.2.3 the first trained feature extraction module extracts the features of the p-th picture in the picture pyramid to obtain a first image feature F5, where F5 is(Picture height × Picture Width × feature map channel number).
2.4.2.4 the preliminary target detection module detects the first image feature F5 by using a knee joint region detection method to obtain a knee joint region in the single knee image I, wherein the knee joint region detection method is as follows:
2.4.2.4.1 the 1 × 1 convolution layer in the preliminary target detection module performs 1 × 1 convolution on the first image feature F5 of the p picture in the picture pyramid, and outputs two groups of vectors, namely the probability vector A1 of the knee jointp,A1pIs composed of Representing the number of vectors, 1 representing the dimension of each vector), vector a1pStored are probability values (in the range of 0, 1) for what is considered to be the knee joint]) When the probability value is not lower than α, the knee joint is considered, α is a first threshold value, α∈ [0.5, 1]α preferably having a value of 0.6, knee joint bounding box coordinate offset vector B1p,B1pIs composed of Representing the number of vectors, 4 representing the dimension of each vector), 4 dimensions (x) of each vector1,y1,x2,y2) Representing the bounding box (upper left abscissa offset, upper left ordinate offset, lower right abscissa offset, lower right ordinate offset).
2.4.2.4.2 preliminary object detection moduleThe knee joint boundary box coordinate operation layer pair A1pAnd B1pAnd screening and calculating to obtain the knee joint area in the single knee image I. Found A1pN vector positions with medium probability value not lower than αAt the same time as B1pFind the corresponding N vector positionsAnd 4-dimensional coordinate offset vector corresponding to the N positionsWherein the content of the first and second substances,and (3) coordinate offsets (upper left horizontal coordinate offset, upper left vertical coordinate offset, lower right horizontal coordinate offset, lower right vertical coordinate offset) of the nth bounding box. Calculating the coordinates of N knee joint region bounding boxes in the original single knee image IAnd framing N knee joint areas on the original drawing I according to the coordinates of the boundary frame, wherein,coordinates (upper left horizontal coordinate, upper left vertical coordinate, lower right horizontal coordinate, lower right vertical coordinate) representing the nth bounding box are calculated as follows:
wherein the content of the first and second substances,coordinate offset (upper left horizontal coordinate offset, upper left vertical coordinate offset, lower right horizontal coordinate offset, lower right vertical coordinate offset) of the nth bounding box, s (0)<s ≦ 1) represents the scaling factor of the original single knee image I.
2.4.2.5 if p is not less than PP, turning to 2.4.2.6; if p < PP, let p be p +1, go to 2.4.2.3.
2.4.2.6 the first non-maximum suppression screening layer employs a non-maximum suppression (NMS) algorithm to filter all knee joint regions of the mth single knee picture I. Non-Maximum Suppression (NMS) algorithm reference (document "Efficient Non-Maximum Suppression [ C ]]//18th International Conference on Pattern Recognition (ICPR 2006),20-24August 2006, Hong Kong, China. Efficient non-maximum suppression, international conference on pattern recognition at 18th conference), setting the non-maximum suppression filter threshold delta to 0.6, and screening the knee joint boundary region by the threshold to obtain the N of the mth single knee image I1(N1Is a positive integer and 0-N1≦ 100) preliminary knee joint regions.
2.4.3 prepare training samples for the second level network on the mth single knee image by:
2.4.3.1 initializing variable n1=1,k=1;
2.4.3.2 calculating the nth of the mth Single Knee image I1The intersection and combination ratio of the preliminary knee joint region and the knee joint region marked by the mth single knee image IIf it isWill n be1Taking the preliminary knee joint area as a positive sample of the second-level network training, turning to 2.4.3.3; if it isWill n be1Taking the preliminary knee joint area as a partial sample of the second-stage network training, turning to 2.4.3.3; if it isWill n be1The individual preliminary knee joint regions are used as negative examples for the second level of network training, turn 2.4.3.3.
2.4.3.3 if n1≥N1(N1Is the number of the preliminary knee joint areas in the mth single knee joint image I, and N is more than or equal to 01Less than or equal to 100), rotating 2.4.3.4; if n is1<N1Let n be1=n1+1, 2.4.3.2.
2.4.3.4 randomly selecting the kth point on the mth single knee imageWill be dottedAs the top left coordinate point of the randomly selected bounding box.
2.4.3.5 taking the width of the kth randomly chosen bounding boxAnd heightLet the width of the mth single knee image be wmHeight of hmTaking wmAnd hmIs half of the minimum value of (d), and a rounding operation is performed, i.e., int (min (w)m,hm) At [48, int (min (w) ]2)m,hm)/2)]Randomly taking an integer as the width of the randomly selected bounding boxAnd height
2.4.3.6 determine the coordinates of the kth randomly chosen bounding box: dotMarked as the bottom right coordinate point of the kth randomly chosen bounding box,the kth randomly selected bounding box coordinate isRepresenting the kth randomly chosen bounding box (upper left abscissa, upper left ordinate, lower right abscissa, lower right ordinate).
2.4.3.7 calculating the intersection ratio of the kth randomly selected bounding box and the labeled bounding box on the mth single knee imageIf it isTaking the kth randomly selected bounding box as training data for positioning the key points of the knee joint area, and turning to 2.4.3.8; if it isGo directly to 2.4.3.8.
2.4.3.8 if K < K, let K equal to K +1, go to 2.4.3.4, if K ≧ K, go to 2.4.4.
2.4.4 if M is more than or equal to 2M, rotating to 2.4.5; if M <2M, let M be M +1, turn 2.4.2.
2.4.5 the positive, partial, and negative samples generated in steps 2.4.3-2.4.4 were down-scaled to 48 × 48 × 3 (image length × image width × image channels) since they are RGB images, the number of image channels is 3.
2.4.6 the positive sample, partial sample and negative sample after 2.4.5 steps of size reduction are subjected to mirror image operation and amplification, and the method comprises the following steps: turning each training sample in the left-right direction, namely performing mirror image operation to obtain the training of the second-level networkThe number of samples and positive samples of the second-stage network training is N21The number of partial samples is N22The number of negative samples is N23,N21、N22And N23Are all positive integers.
2.5 training the second-level network by adopting the second-level network training sample generated in the step 2.4.6 to obtain the trained second-level network, wherein the method comprises the following steps:
2.5.1 initializing variable t ═ 1, setting the values of the model parameters, including setting the learning rate to 0.0001 and the batch size batch _ size to 500;
2.5.2 inputting the training sample of the second network into the second network, and outputting an overall loss function Lt,LtIs calculated as follows, with the model parameters set in step 2.5.1, by LtThe back propagation of the values updates the parameters in the second level network to obtain the second level network NET with updated parameters for the t timet;
Wherein N is2Is the number of training samples, N2=N21+N22+N23;Represents the loss of the ith training sample on the jth task, j is more than or equal to 1 and less than or equal to 3, and totally has 3 tasks of knee joint detection, knee joint boundary box positioning, knee joint key point positioning, αjAn importance degree coefficient α showing the importance degree coefficient of the jth task and the knee joint detection task10.8, importance of knee positioning task α20.6, importance coefficient α for knee joint key point localization task3=1.5;
2.5.3, if T is equal to or more than 2 and is equal to or less than T, T is the number of times of network training, T is equal to 10, and then 2.5.4 is carried out; otherwise, after training, the second-level Network (NET) after training is obtainedTTurning to the third step;
2.5.4 inputting training samples of the second level networkInto NETt-1Outputting the t-th overall loss function LtPassing through L under model parameters set in step 2.5.1tReverse propagation of values to update NETt-1The parameter in (1) is obtained, and the second-level network NET with the parameter updated for the t time is obtainedt(ii) a Turn 2.5.3.
Thirdly, preprocessing the X-ray film of the double knees to be detected to obtain 2 single knee images to be detected which are processed by histogram equalization, wherein the method comprises the following steps:
3.1 if the X-ray film of the double knees to be detected is that the background is bright and the two legs are dark, converting the X-ray film of the double knees to be detected into an image that the background is dark and the two legs are bright, namely subtracting the original pixel value of the X-ray film of the double knees to be detected from 255 to be used as a new image pixel value, converting the new image pixel value into an image that the background is dark and the two legs are bright, and converting the image into 3.2; if the X-ray film of the double knees to be detected is dark background and bright legs, directly rotating to 3.2.
3.2 converting the pixels of the X-ray film to be detected into [0, 255 ]]Processing the double-knee X-ray film to be detected into double-knee images of the uint 8; the new pixel value corresponding to each pixel in the double knee image of the agent 8 is P is the pixel value of each pixel of the pre-processed image, PmaxFor maximum pixel value of pre-processed image, PminIs the minimum pixel value of the pre-processed image.
3.3 converting the double knee image of the agent 8 into a single knee image by: finding the width W and half of the width of the uint8 bipartite knee image, respectivelyThen half the widthGet the whole and mark asFinally, according to width coordinates [0, int (W/2) ] from the double knee image]、[int(W/2)+1,W-1]The truncated, i.e., double knee image of the fluid 8 is divided into 2 single knee images.
3.4, performing histogram equalization processing on 2 single knee images, wherein the 2 single knee images subjected to the histogram equalization processing are used as the single knee images to be detected.
Fourthly, performing knee joint positioning on the 2 to-be-detected single knee images obtained in the third step based on the trained knee joint area positioning network of the multitask two-stage convolutional neural network, wherein the method comprises the following steps of:
4.1 initializing variable d ═ 1;
and 4.2, generating a picture pyramid of the d-th single knee image to be detected, zooming the single knee image in different scales, wherein the zooming factor is s (0< s is less than or equal to 1), obtaining PP (0< PP <100) single knee images in different sizes, and setting the size of each image as H × W × 3 (the number of channels of the wide × image of the high × image of the image), wherein the picture pyramid of the single knee image to be detected is called as the picture pyramid of the single knee image to be detected.
4.3 processing the image pyramid of the d-th single knee image to be detected by the first-stage network in the knee joint area positioning network based on the trained multitask two-stage convolutional neural network to obtain a preliminary knee joint area, wherein the method comprises the following steps:
4.3.1 initializing variable p ═ 1;
4.3.2 the first feature extraction module extracts the features of the p image in the image pyramid of the d single knee image to be detected, and the method comprises the following steps:
4.3.2.1 convolution operation is carried out on the p picture in the picture pyramid of the d single knee image to be detected by the first 3 × 3 convolution layer in the first feature extraction module, pooling operation is carried out on the p picture after convolution operation by the first 2 × 2 maximum pooling layer, a feature image F1 is output, wherein the convolution operation step length is 1, the maximum pooling operation step length is 2, the size of the p picture in the picture pyramid is set as H × W × 3 (the number of picture channels is × pictures and × pictures, and is 3 because of RGB images), and the convolution operation is carried out on the p picture pyramid through the first convolution and maximum pooling operation layer to obtain the final image(number of channels of signature height × signature width × signature F1) size signature F1.
4.3.2.2 the second 3 × 3 convolutional layer in the first feature extraction module performs convolution operation on the feature map F1, the second 3 × 3 maximal pooling layer performs pooling operation on the feature map F1 which completes the convolution operation, and obtains a feature map F2, wherein the step size of the convolution operation is 1, the step size of the maximal pooling operation is 2,(number of channels in feature map height × feature map width × feature map F1) size feature map F1 is obtained by second convolution and maximum pooling operation layer (number of channels of signature height × signature width × signature F2) size signature F2.
4.3.2.3 the third 3 × 3 convolutional layer in the first feature extraction module performs convolution operation on the feature map F2, the third 2 × 2 maximal pooling layer performs pooling operation on the feature map F2 which completes the convolution operation, and obtains a feature map F3, wherein the step size of the convolution operation is 1, the step size of the maximal pooling operation is 2,(the number of channels of feature map height × feature map width × feature map F2) is obtained by a third convolution and a maximum pooling operation layer(number of channels of signature height × signature width × signature F3) size signature F3.
4.3.2.4 the fourth 3 × 3 convolutional layer in the first feature extraction module performs convolution operation on the feature map F3 to obtain a feature map F4, wherein the step size of the convolution operation is 1,(feature height × feature Width × TetNumber of channels of the feature map F3) is obtained by the fourth convolution operation layer F3(number of channels of signature height × signature width × signature F4) size signature F4.
4.3.2.5 the fifth 2 × 2 convolutional layer in the first feature extraction module performs convolution operation on the feature map F4 to obtain a feature map F5, wherein the step size of the convolution operation is 1,(number of channels of feature map height × feature map width × feature map F4) feature map F4 is obtained through a fifth convolution operation layer(the feature map height × and the feature map width × are the number of channels of the feature map F5) of the size of the feature map F5, wherein the feature map F5 is the first image feature extracted by the first-level network input of the pth picture of the picture pyramid of the pth single knee image to be detected.
4.3.3 the preliminary target detection module of the first-level network detects F5 by adopting the knee joint region detection method described in 2.4.2.4 to obtain the knee joint region of the single knee image to be detected;
4.3.4 if p is less than PP, let p be p +1, change to 4.3.2; if p is not less than PP, then 4.3.5 is turned.
And 4.3.5, filtering all knee joint bounding boxes in the d-th single knee image to be detected by a non-maximum suppression (NMS) algorithm through the second non-maximum suppression screening layer. Setting a filtering threshold delta of a non-maximum suppression (NMS) algorithm to be 0.7, screening a plurality of knee joint boundary frames in the single knee image to be detected through the threshold, framing a knee joint area on the d-th single knee image to be detected by the filtered boundary frames, and outputting N as N output by the first-level network1A preliminary knee joint region.
4.4 second-level network processing the No. d single knee image to be detected1Obtaining a final knee joint region and key points of the region by the initial knee joint region, wherein the method comprises the following steps:
4.4.1 initialization variablesn1=1;
4.4.2 outputting the nth image of the d single knee to be detected from the first-level network1The preliminary knee region dimensions were normalized to 48 × 48 × 3 (number of channels for image height × image width × image).
4.4.3 second-level network extracting nth image on the d-th single knee image to be detected1Opening the characteristics of the initial knee joint region image, wherein the method comprises the following steps:
4.4.3.1 the first 3 × 3 convolutional layer pair in the second feature extraction Module1Performing convolution operation on the initial knee joint region image, and performing convolution operation on the nth layer of the first 2 × 2 maximum pooling layer1Pooling is carried out on the initial knee joint region images, the convolution operation step of the output feature map F6. is 1, the maximum pooling operation step is 2, the size of the scale-normalized initial knee joint region image is 48 × 48 × 3 (the number of channels of the picture × picture width × picture is 3 because the picture is an RGB image), and a feature map F6 (the number of channels of the feature map × feature map width × feature map F6) with the size of 23 × 23 × 32 is obtained through a first convolution and maximum pooling operation layer.
4.4.3.2 the second 3 × 3 convolutional layer in the second feature extraction module performs convolution operation on the feature map F6, the second 3 × 3 largest pooling layer performs pooling operation on the feature map F6 after the convolution operation is completed, the output feature map F7. has convolution operation step size of 1, the largest pooling operation step size is 2, and the feature map F6 with the size of 23 × 23 × 32 (feature map height × feature map width × feature map F6 channel number) passes through the second convolution and largest pooling operation layer to obtain the feature map F7 with the size of 10 × 10 × 64 (feature map height × feature map width × feature map F7 channel number).
4.4.3.3 the third 3 × 3 convolutional layer in the second feature extraction module performs convolution operation on the feature map F7, the third 2 × 2 maximal pooling layer performs pooling operation on the feature map F7 which has completed the convolution operation, the output feature map F8. has a convolution operation step size of 1, the maximal pooling operation step size is 2, the feature map F7 with the size of 10 × 10 × 64 (the number of channels of the feature map F7 with the feature map height × and the feature map width ×) passes through the third convolution and maximal pooling operation layers to obtain the feature map F8 with the size of 4 × 4 × 64 (the number of channels of the feature map F8 with the feature map height × and the feature map width ×).
4.4.3.4 the feature map F8 is convolved by the fourth 3 × 3 convolution layer in the second feature extraction module, and the feature map F8 with the size of the convolution operation step of the fourth layer of the output feature map F9. is 1, 4 × 4 × 64 (the number of channels of the feature map F8 with feature map height × and feature map width ×) is processed by the fourth convolution operation layer to obtain the feature map F9 with the size of 2 × 2 × 128 (the number of channels of the feature map F9 with feature map height × and feature map width ×).
4.4.3.5 the first full-connection layer in the second feature extraction module performs full-connection operation on the feature map F9, and outputs a feature map F9 of the size of the feature map F10.2 × 2 × 128 (the number of channels of the feature map F9 with the feature map height × and the feature map width ×) to obtain a feature map F10 containing 256-dimensional vectors through the first full-connection layer, where the feature map F10 is the extracted second image feature.
4.4.4.4 the final target detection module of the second level network processes the second image feature, i.e. feature map F10, and outputs the final knee joint region and the key point coordinates of the d-th single knee image to be detected, the method is:
4.4.4.1 the second full-connection layer of the final target detection module of the second-level network performs full connection on the feature map F10 and outputs three groups of vectors, namely probability vectors of knee jointsKnee joint bounding box coordinate offset vectorKnee joint key point coordinate offset vectorWhereinThere is a 1-dimensional vector that is,there is a 4-dimensional vector that is,there is a 12-dimensional vector.
4.4.4.2 Knee joint boundary box coordinate and knee joint region key point coordinate operation layer pair of final target detection module of second-level networkAnd screening and calculating the vectors to obtain the knee joint area of the d-th single knee image to be detected and the key points of the knee joint area. Vector quantityStored are probability values (in the range of 0, 1) for what is considered to be the knee joint]) When the probability value is not lower than β, the knee joint is considered, β is a second threshold value, β∈ [0.5, 1]β preferably has a value of 0.7, ifIs greater than β, then the nth value1Zhang preliminary Knee region is preserved, n1Zhang preliminary knee joint area bounding box coordinates asDenotes the n-th1Stretching the boundary box of the primary knee joint area (left upper horizontal coordinate, left upper vertical coordinate, right lower horizontal coordinate, right lower vertical coordinate), and reserving the coordinate offset vector of the boundary box of the knee jointAnd knee joint key point coordinate offset vectorIs provided withDenotes the n-th1The offset coordinates (the offset of the upper left horizontal coordinate, the offset of the upper left vertical coordinate, the offset of the lower right horizontal coordinate, and the offset of the lower right vertical coordinate) of the boundary frame of the knee joint are setDenotes the n-th1One knee jointOffset coordinates (offset abscissa, offset ordinate) of key point FM point of the nodal region,denotes the n-th1Offset coordinates (offset abscissa, offset ordinate) of the critical point FL point of the individual knee joint region,denotes the n-th1Offset coordinates (offset abscissa, offset ordinate) of the key point TM points of the individual knee joint regions,denotes the n-th1Offset coordinates (offset abscissa, offset ordinate) of the critical point TL point of the individual knee joint region,denotes the n-th1Offset coordinates (offset abscissa, offset ordinate) of JSM points of individual knee joint regions,denotes the n-th1Offset coordinates (offset abscissa, offset ordinate) of JSL points of individual knee joint regions. Final n th1Boundary frame coordinates of individual knee joint(denotes the n-th1Coordinates of the bounding box of the individual knee joints (upper left abscissa, upper left ordinate, lower right abscissa, lower right ordinate)) and the coordinates of the 6 key points(denotes the n-th1The coordinates of the FM points of the individual knee regions),(denotes the n-th1The coordinates of FL points for individual knee joint regions),(denotes the n-th1The coordinates of the TM points of the individual knee joint regions),(denotes the n-th1Coordinates of TL points of the individual knee joint regions),(denotes the n-th1Coordinates of JSL points of individual knee joint regions),(denotes the n-th1Coordinates of JSM points of individual knee joint regions) are calculated as follows:
4.4.4.3 if n1<N1Let n be1=n1+1, rotating to 4.4.2; if n is1≥N1Turn 4.4.4.4.
4.4.4.4.4 the second non-maximum suppression screening layer screens the knee joint bounding box in the d-th single knee image to be detected by adopting a non-maximum suppression (NMS) algorithm to obtain a final bounding box of the d-th single knee image to be detected and a final key point of the knee joint area, wherein the filtering threshold value delta of the non-maximum suppression (NMS) algorithm is 0.7.
4.4.4.5 if d is less than 2, let d be d +1, turn 4.2; if d is more than or equal to 2, the fifth step is carried out.
And fifthly, ending.
Claims (11)
1. A knee joint positioning method based on a multitask two-stage convolution neural network is characterized by comprising the following steps:
the method comprises the following steps that firstly, a knee joint area positioning network based on a multitask two-stage convolutional neural network is built, the knee joint area positioning network based on the multitask two-stage convolutional neural network comprises a first-stage network and a second-stage network, and the output of the first-stage network is used as the input of the second-stage network;
the first-level network consists of a first feature extraction module and a preliminary target detection module; the first feature extraction module receives the single knee image I from the outside, extracts first image features of the single knee image I and sends the first image features to the preliminary target detection module; the preliminary target detection module detects the first image characteristics and outputs a preliminary knee joint area in the single knee image I;
the first feature extraction module is composed of 5 convolutional layers and 3 maximum pooling layers, wherein the 5 convolutional layers comprise 4 convolutional layers of 3 × 3 and 1 convolutional layer of 2 × 2, and the 3 maximum pooling layers comprise 2 maximum pooling layers of 2 × 2 and 1 maximum pooling layer of 3 × 3;
performing convolution operation on the single knee image I by a first 3 × 3 convolution layer, performing pooling operation on the single knee image I after the convolution operation by a first 2 × 2 maximum pooling layer to obtain a feature map F1, performing convolution operation on the feature map F1 by a second 3 × 3 convolution layer, performing pooling operation on the feature map F1 after the convolution operation by a first 3 × 3 maximum pooling layer to obtain a feature map F2, performing convolution operation on the feature map F2 by a third 3 × 3 convolution layer, performing pooling operation on the feature map F2 after the convolution operation by a second 2 × 2 maximum pooling layer to obtain a feature map F3, performing convolution operation on the feature map F3 by a fourth 3 × 3 convolution layer to obtain a feature map F4, and performing convolution operation on the feature map F4 by a fifth 2 × 2 maximum pooling layer to obtain a feature map F5, wherein the feature map F5 is the extracted first image feature;
the preliminary target detection module comprises a1 × 1 convolution layer, a knee joint boundary frame coordinate operation layer and a first non-maximum value inhibition screening layer, wherein the 1 × 1 convolution layer convolves a characteristic diagram F5 to obtain two groups of vectors, namely a probability vector A1 of a knee joint and a knee joint boundary frame coordinate offset vector B1, the knee joint boundary frame coordinate operation layer determines a knee joint boundary frame according to A1 and B1 to obtain a knee joint area in the single knee image I, the first non-maximum value inhibition screening layer performs non-maximum inhibition screening on the knee joint area in the single knee image I to obtain a preliminary knee joint area in the single knee image I, and the single knee image I and the preliminary knee joint area in the single knee image I are sent to a second-level network;
the second-level network consists of a second feature extraction module and a final target detection module; the second feature extraction module is used for carrying out feature extraction on the single knee image I received from the first-level network and the preliminary knee joint region in the single knee image I to obtain second image features; the final target detection module performs target detection on the second image characteristics to obtain final knee joint boundary frame coordinates of the single knee image I and knee joint area key point coordinates of the single knee image I;
the second feature extraction module is composed of 4 convolutional layers, 3 maximum pooling layers and 1 full-link layer, wherein the 4 convolutional layers comprise 3 × 3 convolutional layers and 12 × 2 convolutional layers, and the 3 maximum pooling layers comprise 2 2 × 2 maximum pooling layers and 13 × 3 maximum pooling layer;
the first 3 × convolutional layer in the second feature extraction module performs convolution operation on a preliminary knee joint region in the single-knee image I, the first 2 × maximum pooling layer performs pooling operation on the preliminary knee joint region in the single-knee image I which completes the convolution operation to obtain a feature map F6, the second 3 × convolutional layer performs convolution operation on a feature map F6, the second 3 × maximum pooling layer performs pooling operation on a feature map F6 which completes the convolution operation to obtain a feature map F7, the third 3 × convolutional layer performs convolution operation on a feature map F7, the third 2 × maximum pooling layer performs pooling operation on a feature map F7 which completes the convolution operation to obtain a feature map F8, the fourth 3 × convolutional layer performs convolution operation on a feature map F8 to obtain a feature map F9, the first full-connection layer performs full-connection operation on the feature map F9 to obtain a feature map F10, and the feature map F10 is the second feature map F10;
the final target detection module comprises a second full-connection layer, a knee joint boundary frame, a key point coordinate operation layer of a knee joint area and a second non-maximum value inhibition screening layer; the second full-connection layer performs full-connection on the feature map F10 to obtain three groups of vectors, namely a probability vector A2 of the knee joint, a coordinate offset vector B2 of a boundary frame of the knee joint and a coordinate offset vector C of six key points in a knee joint region; determining the coordinates of the knee joint boundary box and the key point coordinates of the knee joint region of the single knee image I according to A2, B2 and C by the key point coordinate operation layer of the knee joint boundary box and the knee joint region; the second non-maximum inhibition screening layer carries out non-maximum inhibition screening on the knee joint region and the knee joint key point coordinates in the single knee image I, and outputs the single knee image I, the final knee joint region of the single knee image I and the knee joint region key point coordinates of the single knee image I;
secondly, training a knee joint area positioning network based on a multitask two-stage convolutional neural network, wherein the method comprises the following steps:
2.1 prepare the data of the knee joint area location network based on the multitask two-stage convolution neural network:
2.1.1, preprocessing M original images, namely X-ray medical images containing left and right knees, to obtain 2M single knee images subjected to histogram equalization, wherein M is a positive integer;
2.1.2 annotate the knee joint real bounding box in the 2M single knee images after histogram equalization, the method is as follows:
2.1.2.1 initializing variable m ═ 1;
2.1.2.2 manually and manually labeling 6 key points of the knee joint area of the mth single knee image; the 6 key points refer to 6 points on the boundaries of the knee joint gaps and osteophytes, and the 6 points on the left knee and the right knee are respectively femoral medial osteophyte points FM, femoral lateral osteophyte points FL, tibial medial osteophyte points TM, tibial lateral osteophyte points TL, joint gap medial points JSM and joint gap lateral points JSL; the 6 key points of the left knee and the right knee are in mirror symmetry;
2.1.2.3 marking the boundary box of the knee joint in the mth single knee image according to the key points marked manually, the method is:
2.1.2.3.1 center point coordinates (x) of 6 key points in the mth single knee image are calculated respectivelymid,ymid) (ii) a Noting 6 key point coordinatesAre respectively (x)1,y1)、(x2,y2)、(x3,y3)、(x4,y4)、(x5,y5)、(x6,y6) Center point coordinates (x) of 6 key pointsmid,ymid) Comprises the following steps:
2.1.2.3.2 calculating the width w of the kneekneeThe method comprises the following steps: the maximum abscissa x of 6 key points is calculatedmaxAnd minimum abscissa xminWidth of knee joint wkneeIs the maximum abscissa xmaxAnd minimum abscissa xminThe difference between, namely:
xmax=max(x1,x2,x3,x4,x5,x6);
xmih=min(x1,x2,x3,x4,x5,x6);
wknee=xmax-xmin;
2.1.2.3.3 marking the knee joint area to obtain the real knee joint area boundary box coordinateThe coordinates represent the real knee area bounding box (upper left horizontal coordinate, upper left vertical coordinate, lower right horizontal coordinate, lower right vertical coordinate); the central points of the 6 key points are taken as the center, and the width of the knee joint is expanded by 0.65 times of the width of the knee joint, so that the knee joint area which is really interested, namely the labeled boundary box, and the calculation formula is as follows:
2.1.2.4 if M is more than or equal to 2M, rotating to 2.2; if M is less than 2M, making M equal to M +1, and switching to 2.1.2.2;
2.2 prepare training samples for the first stage network in the knee joint area positioning network based on the multitask two-stage convolution neural network, the method is as follows:
2.2.1 initializing variable m ═ 1;
2.2.2 prepare training samples for the first level network on the mth single knee image by:
2.2.2.1 initializing variable k ═ 1;
2.2.2.2 randomly taking the kth point on the mth single knee image, and enabling the coordinate of the kth point to beBy pointAs the upper left coordinate point of the randomly selected bounding box;
2.2.2.3, determining the coordinates of the kth randomly chosen bounding box by: order pointFor the lower right coordinate point of the randomly selected bounding box, randomly selected bounding box coordinates of(ii) representing the kth randomly selected bounding box (upper left horizontal coordinate, upper left vertical coordinate, lower right horizontal coordinate, lower right vertical coordinate);
2.2.2.4 calculating the intersection ratio of the kth randomly selected bounding box and the labeled bounding box on the mth single knee image If it isThe kth randomly chosen bounding box is usedAs a positive sample of training, go to 2.2.2.5; if it isThe kth randomly chosen bounding box is usedAs part of the training, go to 2.2.2.5; if it isThen will kA randomly selected bounding boxAs a negative example of training, go 2.2.2.5;
2.2.2.5 if K is less than K, K is a positive integer, K is the total number of randomly selected bounding boxes on the mth single knee image, K is K +1, and then 2.2.2.2 is converted; if K is more than or equal to K, turning to 2.2.2.6;
2.2.2.6 if M is more than or equal to 2M, turning to 2.2.3; if M is less than 2M, making M equal to M +1, and rotating to 2.2.2;
2.2.3, reducing the scale of the training samples generated in the step 2.2.2, namely positive samples, partial samples and negative samples, to obtain an image with the length of × and the width of ×, wherein the number of image channels is 48 × 48 × 3;
2.2.4 carrying out mirror image operation on the training samples with the reduced size of 2.2.3 steps for augmentation, namely horizontally turning each training sample left and right, namely carrying out mirror image operation on each training sample to generate a first-stage network training sample, wherein the number of first-stage network positive samples is N11The number of partial samples is N12The number of negative samples is N13,N11、N12And N13Are all positive integers;
2.3, training the first-stage network by adopting a first-stage network training sample to obtain a trained first-stage network;
2.4 prepare training samples for the second-stage network in the knee joint area positioning network based on the multitask two-stage convolutional neural network, the method comprises the following steps:
2.4.1 initializing variable m ═ 1;
2.4.2 positioning the preliminary knee joint region of the mth single knee image I by using the trained first-level network, wherein the method comprises the following steps:
2.4.2.1, generating a picture pyramid of the mth single-knee image I, wherein the method comprises the steps of multiplying the mth single-knee image I by different scaling factors s in sequence, wherein s is a positive real number, scaling the mth single-knee image I in different scales to obtain PP single-knee images with different sizes to form the picture pyramid of the single-knee image I, and PP is a positive integer;
2.4.2.2 initializing variable p ═ 1;
2.4.2.3 the first trained feature extraction module extracts the features of the p-th picture in the picture pyramid to obtain a first image feature F5, where F5 is
2.4.2.4 the preliminary target detection module detects the first image characteristic F5 by adopting a knee joint region detection method to obtain a knee joint region in the single knee image I;
2.4.2.5 if p is not less than PP, turning to 2.4.2.6; if p is less than PP, let p be p +1, go to 2.4.2.3;
2.4.2.6 the first non-maximum suppression screening layer adopts the non-maximum suppression algorithm NMS algorithm to filter all knee joint regions of the mth single knee image I to obtain N of the mth single knee image I1Preliminary knee joint region, N1Is a positive integer;
2.4.3 prepare training samples for the second level network on the mth single knee image by:
2.4.3.1 initializing variable n1=1,k=1;
2.4.3.2 calculating the nth of the mth Single Knee image I1The intersection and combination ratio of the preliminary knee joint region and the knee joint region marked by the mth single knee image IIf it isWill n be1Taking the preliminary knee joint area as a positive sample of the second-level network training, turning to 2.4.3.3; if it isWill n be1Taking the preliminary knee joint area as a partial sample of the second-stage network training, turning to 2.4.3.3; if it isWill n be1The individual preliminary knee joint regions are used as negative samples for the second level of network training, turn 2.4.3.3;
2.4.3.3 if n1≥N1Turning to 2.4.3.4; if n is1<N1Let n be1=n1+1, 2.4.3.2;
2.4.3.4 randomly selecting the kth point on the mth single knee imageWill be dottedAs the upper left coordinate point of the randomly selected bounding box;
2.4.3.6 determine the coordinates of the kth randomly chosen bounding box: dotMarked as the bottom right coordinate point of the kth randomly chosen bounding box, the kth randomly selected bounding box coordinate is(ii) representing the kth randomly selected bounding box (upper left horizontal coordinate, upper left vertical coordinate, lower right horizontal coordinate, lower right vertical coordinate);
2.4.3.7 calculating the intersection ratio of the kth randomly selected bounding box and the labeled bounding box on the mth single knee image If it isTaking the kth randomly selected bounding box as training data for positioning the key points of the knee joint area, and turning to 2.4.3.8; if it isGo directly to 2.4.3.8;
2.4.3.8 if K is less than K, let K equal to K +1, go to 2.4.3.4, if K is greater than or equal to K, go to 2.4.4;
2.4.4 if M is more than or equal to 2M, rotating to 2.4.5; if M is less than 2M, making M equal to M +1, and rotating to 2.4.2;
2.4.5, reducing the sizes of the positive samples, the partial samples and the negative samples generated in the steps 2.4.3-2.4.4 to 48 × 48 × 3;
2.4.6 the positive sample, partial sample and negative sample after 2.4.5 steps of size reduction are subjected to mirror image operation and amplification, and the method comprises the following steps: turning each training sample in the left-right direction, namely performing mirror image operation to obtain training samples of a second-level network, wherein the number of positive samples of the second-level network training is N21The number of partial samples is N22The number of negative samples is N23,N21、N22And N23Are all positive integers;
2.5 training the second-level network by adopting the second-level network training sample generated in the step 2.4.6 to obtain a trained second-level network;
thirdly, preprocessing the double-knee X-ray film to be detected to obtain 2 single-knee images to be detected, which are subjected to histogram equalization processing;
fourthly, performing knee joint positioning on the 2 single knee images to be detected based on the trained knee joint area positioning network of the multitask two-stage convolution neural network, wherein the method comprises the following steps:
4.1 initializing variable d ═ 1;
4.2, generating a picture pyramid of the d-th single knee image to be detected, wherein the method comprises the steps of scaling the single knee image in different scales by a scaling factor of s to obtain PP single knee images in different sizes, and setting the size of each image to be H × W × 3 to be called the picture pyramid of the single knee image to be detected;
4.3 processing the image pyramid of the d-th single knee image to be detected by the first-stage network in the knee joint area positioning network based on the trained multitask two-stage convolutional neural network to obtain a preliminary knee joint area, wherein the method comprises the following steps:
4.3.1 initializing variable p ═ 1;
4.3.2 the first feature extraction module extracts the features of the p image in the image pyramid of the d single knee image to be detected, and the method comprises the following steps:
4.3.2.1 convolution operation is carried out on the p picture in the picture pyramid of the d single knee image to be detected by the first 3 × 3 convolution layer in the first feature extraction module, pooling operation is carried out on the p picture after the convolution operation by the first 2 × 2 maximum pooling layer, and a feature picture F1 is output, wherein the convolution operation step length is 1, the maximum pooling operation step length is 2, the size of the p picture in the picture pyramid is H × W × 3, and the convolution operation and the pooling operation are carried out on the p picture in the first feature extraction module to obtain the final productA size feature map F1;
4.3.2.2 the second 3 × 3 convolutional layer in the first feature extraction module performs convolution operation on the feature map F1, the second 3 × 3 maximal pooling layer performs pooling operation on the feature map F1 which completes the convolution operation, and obtains a feature map F2, wherein the step size of the convolution operation is 1, the step size of the maximal pooling operation is 2,the feature map F1 of the size is obtained by the second convolution and the maximum pooling operation layerA size feature map F2;
4.3.2.3 the third 3 × 3 convolutional layer in the first feature extraction module performs convolution operation on the feature map F2And the third 2 × 2 maximal pooling layer performs pooling operation on the feature map F2 which completes the convolution operation to obtain a feature map F3, wherein the step size of the convolution operation is 1, the step size of the maximal pooling operation is 2,obtained through a third convolution and a maximum pooling operation layerA size feature map F3;
4.3.2.4 the fourth 3 × 3 convolutional layer in the first feature extraction module performs convolution operation on the feature map F3 to obtain a feature map F4, wherein the step size of the convolution operation is 1,the feature map F3 of the size is obtained by the fourth convolution operation layerA size feature map F4;
4.3.2.5 the fifth 2 × 2 convolutional layer in the first feature extraction module performs convolution operation on the feature map F4 to obtain a feature map F5, wherein the step size of the convolution operation is 1,the feature map F4 of the size is obtained by a fifth convolution operation layerA feature map F5 of size, wherein the feature map F5 is a first image feature extracted by the first-level network and input into the pth picture of the picture pyramid of the pth single knee image to be detected;
4.3.3 the primary target detection module of the first-level network detects the F5 by adopting a knee joint region detection method to obtain the knee joint region of the single knee image to be detected;
4.3.4 if p is less than PP, let p be p +1, change to 4.3.2; if p is more than or equal to PP, turning to 4.3.5;
4.3.5 thThe two non-maximum suppression screening layers adopt a non-maximum suppression algorithm to filter all knee joint bounding boxes in the d-th single knee image to be detected, the filtered bounding boxes frame a knee joint area on the d-th single knee image to be detected, and the N knee joint bounding boxes are output by the first-level network1A preliminary knee joint region;
4.4 second-level network processing the No. d single knee image to be detected1Obtaining a final knee joint region and key points of the region by the initial knee joint region, wherein the method comprises the following steps:
4.4.1 initializing variable n1=1;
4.4.2 outputting the nth image of the d single knee to be detected from the first-level network1The dimensions of the preliminary knee joint regions were normalized to 48 × 48 × 3;
4.4.3 second-level network extracting nth image on the d-th single knee image to be detected1Opening the characteristics of the initial knee joint region image, wherein the method comprises the following steps:
4.4.3.1 the first 3 × 3 convolutional layer pair in the second feature extraction Module1Performing convolution operation on the initial knee joint region image, and performing convolution operation on the nth layer of the first 2 × 2 maximum pooling layer1Performing pooling operation on the initial knee joint region image, and outputting a feature map F6, wherein the convolution operation step length is 1, the maximum pooling operation step length is 2, the size of the scale-normalized initial knee joint region image is 48 × 48 × 3, and a feature map F6 with the size of 23 × 23 × 32 is obtained through a first convolution and maximum pooling operation layer;
4.4.3.2 the second 3 × 3 convolutional layer in the second feature extraction module performs convolution operation on the feature map F6, the second 3 × 3 maximal pooling layer performs pooling operation on the feature map F6 which completes the convolution operation, and the feature map F7 is output, wherein the convolution operation step is 1, the maximal pooling operation step is 2, and the feature map F6 with the size of 23 × 23 × 32 is subjected to second convolution and maximal pooling operation to obtain a feature map F7 with the size of 10 × 10 × 64;
4.4.3.3 the third 3 × 3 convolutional layer in the second feature extraction module performs convolution operation on the feature map F7, the third 2 × 2 maximal pooling layer performs pooling operation on the feature map F7 which completes the convolution operation, and the feature map F8 is output, wherein the convolution operation step is 1, the maximal pooling operation step is 2, and the feature map F7 with the size of 10 × 10 × 64 is subjected to third convolution and maximal pooling operation to obtain the feature map F8 with the size of 4 × 4 × 64;
4.4.3.4 the fourth 3 × 3 convolutional layer in the second feature extraction module performs convolution operation on the feature map F8 and outputs a feature map F9, wherein the feature map F8 with the step size of 1 in the convolution operation of the fourth layer and the size of 4 × 4 × 64 obtains a feature map F9 with the size of 2 × 2 × 128 through the fourth convolution operation layer;
4.4.3.5 the first full-link layer in the second feature extraction module performs full-link operation on the feature map F9, and the feature map F9 with the size of output feature map F10: 2 × 2 × 128 passes through the first full-link layer to obtain a feature map F10 containing 256-dimensional vectors, wherein the feature map F10 is the extracted second image feature;
4.4.4 the final target detection module of the second level network processes F10, outputs the final knee joint region and the key point coordinates of the d-th single knee image to be detected, the method is:
4.4.4.1 the second full-link layer of the final target detection module of the second-level network performs full-link to F10 and outputs three groups of vectors, namely probability vector of knee jointKnee joint bounding box coordinate offset vectorKnee joint key point coordinate offset vectorWhereinThere is a 1-dimensional vector that is,there is a 4-dimensional vector that is,there is a 12-dimensional vector;
4.4.4.2 Knee joint boundary box coordinate and knee joint region key point coordinate operation layer pair of final target detection module of second-level networkScreening and calculating vectors to obtain a knee joint area of the d-th single knee image to be detected and key points of the knee joint area; vector quantityStoring a probability value of the knee joint, identifying the knee joint as the probability value when the probability value is not lower than β, β is a second threshold, β∈ [0.5, 1]If, ifIs greater than β, then the nth value1Zhang preliminary Knee region is preserved, n1Zhang preliminary knee joint area bounding box coordinates asDenotes the n-th1Stretching the boundary box of the primary knee joint area (left upper horizontal coordinate, left upper vertical coordinate, right lower horizontal coordinate, right lower vertical coordinate), and reserving the coordinate offset vector of the boundary box of the knee jointAnd knee joint key point coordinate offset vectorIs provided withDenotes the n-th1The offset coordinates (the offset of the upper left horizontal coordinate, the offset of the upper left vertical coordinate, the offset of the lower right horizontal coordinate, and the offset of the lower right vertical coordinate) of the boundary frame of the knee joint are setDenotes the n-th1Offset coordinates (offset abscissa, offset ordinate) of key point FM points of individual knee joint regions,denotes the n-th1Offset coordinates (offset abscissa, offset ordinate) of the critical point FL point of the individual knee joint region,denotes the n-th1Offset coordinates (offset abscissa, offset ordinate) of the key point TM points of the individual knee joint regions,denotes the n-th1Offset coordinates (offset abscissa, offset ordinate) of the critical point TL point of the individual knee joint region,denotes the n-th1Offset coordinates (offset abscissa, offset ordinate) of JSM points of individual knee joint regions,denotes the n-th1Offset coordinates (offset abscissa, offset ordinate) of JSL points of individual knee joint regions; final n th1Boundary frame coordinates of individual knee joint I.e. n1Coordinates of the bounding box of the individual knee joints (upper left horizontal coordinate, upper left vertical coordinate, lower right horizontal coordinate, lower right vertical coordinate) and coordinates of the 6 key pointsI.e. n1The coordinates of the FM points of the individual knee joint regions,i.e. n1The coordinates of the FL point of each knee joint region, i.e. n1The coordinates of the TM points of the individual knee joint regions,i.e. n1The coordinates of the TL point of the individual knee joint area,i.e. n1The coordinates of the JSL points of the individual knee joint regions,i.e. n1The coordinates of the JSM points for the individual knee regions are calculated as follows:
4.4.4.3 if n1<N1Let n be1=n1+1, rotating to 4.4.2; if n is1≥N1Turning to 4.4.4.4;
4.4.4.4 the second non-maximum inhibition screening layer screens the knee joint bounding box in the d-th single knee image to be detected by adopting a non-maximum inhibition NMS algorithm to obtain the final bounding box of the d-th single knee image to be detected and the key points of the final knee joint area;
4.4.4.5 if d is less than 2, let d be d +1, turn 4.2; if d is more than or equal to 2, turning to the fifth step;
and fifthly, ending.
2. The knee joint positioning method based on the multitask two-stage convolution neural network as claimed in claim 1, wherein the method for preprocessing the original image in the step 2.1.1 is as follows:
2.1.1.1 randomly selecting M original images from an OAIbasepine public database, namely X-ray medical images containing left and right knees;
2.1.1.2 uniformly transforms M original images into dark background and light leg images: firstly, selecting a double knee image with a bright background and dark legs from M original images, and then carrying out image pixel inversion, namely subtracting an original pixel value from 255 to uniformly convert the M original images into an image with a dark background and bright legs;
2.1.1.3 convert M pixels of an image with a dark background and light legs to [0, 255%]I.e., processing the dark background and light legs images into a fluid 8 image by: the pixel value of each pixel in M images with dark background and bright legs is processed as follows: where P is the pixel value of any pixel in the pre-processed image, PmaxFor the maximum pixel value in the pre-processed image, PminFor minimum pixel value in the pre-processed image, PnewThe pixel value of any pixel in the processed image;
2.1.1.4 convert the M images of the uit 8 into 2M single knee images by: first, find the width W and half of the width of the M images of the fluid 8, respectivelyThen half the widthGet the whole and mark asFinally, from the image of the agent 8 according to the width coordinateIntercepting, namely dividing the picture of each agent 8 into 2 single knee images, and finally obtaining 2M single knee images;
2.1.1.5 histogram equalization is performed on each of the 2M single knee images to obtain 2M single knee images that have been histogram equalized.
3. The method of claim 1, wherein the steps of 2.2.2.2 and 2.4.3.5 are further characterized by taking the width of the kth randomly selected bounding boxAnd heightThe method comprises the following steps: let the width of the mth single knee image be wmHeight of hmAnd take wmAnd hmIs half of the minimum value of (a), and a rounding operation is performed, i.e., int (min (w)m,hm)/2),int(min(wm,hm) [ 2 ] denotes the ratio of min (w)m,hm) Rounded at [48, int (min (w) ]m,hm)/2)]Randomly taking an integer in the value range of (a) as the width of the randomly selected bounding boxAnd height
4. The method of claim 1, wherein M satisfies M>2000, said K satisfies K>50, the PP satisfies 0<PP<100, s satisfies 0<s is less than or equal to 1, N1Satisfies 0-N1≤100。
5. The knee joint positioning method based on the multitask two-stage convolutional neural network as claimed in claim 1, wherein the method for training the first-stage network in the 2.3 steps is:
2.3.1 initializing variable q to 1, setting values of model parameters, including setting learning rate to 0.001 and batch size to 500;
2.3.2 inputting the training sample of the first network into the first network, and outputting an overall loss function Lq,LqIs calculated as follows, with the model parameters set in step 2.3.1, by LqThe back propagation of the value updates the parameters in the first-level network to obtain the first-level network NET with updated parameters for the q-th timeq;
Where N is the number of training samples for the first stage network, and N is N11+N12+N13,Representing the ith training sample on the jth taskLoss, j is more than or equal to 1 and less than or equal to 3, and totally has 3 tasks of knee joint detection, knee joint boundary frame positioning and knee joint key point positioning, αjAn importance degree coefficient α showing the importance degree coefficient of the jth task and the knee joint detection task11, importance coefficient α for knee joint bounding box positioning task20.5, importance coefficient α for knee joint key point localization task3=0;
2.3.3, if Q is more than or equal to 2 and less than or equal to Q +1, Q is the number of times of network training, Q is 10, and then 2.3.4 is carried out; otherwise, after training, the first-stage network after training, namely NET, is obtainedQAnd ending;
2.3.4 input training samples of the first level network into NETq-1Outputting the q-th overall loss function LqPassing through L under model parameters set in step 2.3.1qReverse propagation of values to update NETq-1The parameter in (1) is obtained, and the first-level network NET with the parameter updated for the q-th time is obtainedqTurning to 2.3.3;
6. the knee joint positioning method based on the multitask two-stage convolutional neural network as claimed in claim 1, wherein in step 2.4.2.4 said preliminary target detection module uses knee joint region detection method to detect the first image feature F5 to obtain the knee joint region in the single knee image I, the knee joint region detection method is:
2.4.2.4.1 the 1 × 1 convolution layer in the preliminary target detection module performs 1 × 1 convolution on the first image feature F5 of the p picture in the picture pyramid, and outputs two groups of vectors, namely the probability vector A1 of the knee jointp,A1pIs composed of(Representing the number of vectors, 1 representing the dimension of each vector), vector a1pStoring a probability value for a knee joint, α∈ [0.5, 1 ] for the knee joint when the probability value is not less than a first threshold α](ii) a Knee joint boundary frame coordinate deviationShift quantity B1p,B1pIs composed of(Representing the number of vectors, 4 representing the dimension of each vector), 4 dimensions (x) of each vector1,y1,x2,y2) Representing the bounding box (left upper horizontal coordinate offset, left upper vertical coordinate offset, right lower horizontal coordinate offset, right lower vertical coordinate offset);
2.4.2.4.2 Knee Joint bounding Box coordinate operation layer pairs of preliminary object detection Module A1pAnd B1pScreening and calculating to obtain a knee joint area in the single knee image I; found A1pN vector positions with medium probability value not lower than αN is more than or equal to N and more than or equal to 1, N is more than or equal to 1, and B is 1pFind the corresponding N vector positionsAnd 4-dimensional coordinate offset vector corresponding to the N positionsWherein the content of the first and second substances,coordinate offsets (upper left horizontal coordinate offset, upper left vertical coordinate offset, lower right horizontal coordinate offset, lower right vertical coordinate offset) of the nth bounding box are represented; calculating the coordinates of N knee joint region bounding boxes in the original single knee image IAnd framing N knee joint areas on the original drawing I according to the coordinates of the boundary frame, wherein,coordinates (upper left horizontal coordinate, upper left vertical coordinate, lower right horizontal coordinate, lower right vertical coordinate) representing the nth bounding box are calculated as follows:
7. The method of claim 1, wherein the step 2.4.2.4.1 provides a value of α of 0.6.
8. The knee joint positioning method based on the multitask two-stage convolutional neural network as claimed in claim 1, wherein the method for training the second-stage network in the 2.5 steps is as follows:
2.5.1 initializing variable t ═ 1, setting the values of the model parameters, including setting the learning rate to 0.0001 and the batch size batch _ size to 500;
2.5.2 inputting the training sample of the second network into the second network, and outputting an overall loss function Lt,LtIs calculated as follows, with the model parameters set in step 2.5.1, by LtThe back propagation of the values updates the parameters in the second level network to obtain the second level network NET with updated parameters for the t timet;
Wherein N is2Is the number of training samples, N2=N21+N22+N23;Represents the loss of the ith training sample on the jth task, j is more than or equal to 1 and less than or equal to 3, and totally has 3 tasks of knee joint detection, knee joint boundary box positioning, knee joint key point positioning, αjAn importance degree coefficient α showing the importance degree coefficient of the jth task and the knee joint detection task10.8, importance of knee positioning task α20.6, importance coefficient α for knee joint key point localization task3=1.5;
2.5.3, if T is equal to or more than 2 and is equal to or less than T, T is the number of times of network training, T is equal to 10, and then 2.5.4 is carried out; otherwise, after training, the second-level Network (NET) after training is obtainedTAnd ending;
2.5.4 inputting training sample of second-level network into NETt-1Outputting the t-th overall loss function LtPassing through L under model parameters set in step 2.5.1tReverse propagation of values to update NETt-1The parameter in (1) is obtained, and the second-level network NET with the parameter updated for the t time is obtainedt(ii) a Turn 2.5.3.
9. The knee joint positioning method based on the multitask two-stage convolutional neural network as claimed in claim 1, wherein the third step of preprocessing the X-ray film of the double knees to be detected is as follows:
3.1 if the X-ray film of the double knees to be detected is that the background is bright and the two legs are dark, converting the X-ray film of the double knees to be detected into an image that the background is dark and the two legs are bright, namely subtracting the original pixel value of the X-ray film of the double knees to be detected from 255 to be used as a new image pixel value, converting the new image pixel value into an image that the background is dark and the two legs are bright, and converting the image into 3.2; if the X-ray film of the double knees to be detected is dark background and bright legs, directly rotating to 3.2;
3.2 converting the pixels of the X-ray film to be detected into [0, 255 ]]Processing the double-knee X-ray film to be detected into double-knee images of the uint 8; the new pixel value corresponding to each pixel in the double knee image of the agent 8 isP is the pixel value of each pixel of the pre-processed image, PmaxFor maximum pixel value of pre-processed image, PminIs the minimum pixel value of the image before processing;
3.3 converting the double knee image of the agent 8 into a single knee image by: finding the width W and half of the width of the uint8 bipartite knee image, respectivelyThen half the widthGet the whole and mark asFinally, according to width coordinates [0, int (W/2) ] from the double knee image]、[int(W/2)+1,W-1]Intercepting, namely dividing the double knee image of the uint8 into 2 single knee images;
3.4, performing histogram equalization processing on 2 single knee images, wherein the 2 single knee images subjected to the histogram equalization processing are used as the single knee images to be detected.
10. The method of claim 1, wherein the first non-maxima suppression screening layer at 2.4.2.6 filters all knee joint regions of the mth single knee image I with a non-maxima suppression algorithm, wherein the filtering threshold δ is 0.6; 4.3.5, setting a filtering threshold value delta to be 0.7 when the second non-maximum value inhibition screening layer adopts a non-maximum value inhibition algorithm to filter all knee joint boundary frames in the d-th single knee image to be detected; 4.4.4.4 step 4, the second non-maximum suppression screening layer screens the knee joint boundary frame in the d-th single knee image to be detected by adopting a non-maximum suppression algorithm, and the filtering threshold value delta is 0.7 when the final knee joint boundary frame and the final key point of the knee joint area are obtained.
11. The method of claim 1, wherein the value of β in 4.4.4.2 is 0.7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010097868.3A CN111340760B (en) | 2020-02-17 | 2020-02-17 | Knee joint positioning method based on multitask two-stage convolution neural network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010097868.3A CN111340760B (en) | 2020-02-17 | 2020-02-17 | Knee joint positioning method based on multitask two-stage convolution neural network |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111340760A true CN111340760A (en) | 2020-06-26 |
CN111340760B CN111340760B (en) | 2022-11-08 |
Family
ID=71186920
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010097868.3A Active CN111340760B (en) | 2020-02-17 | 2020-02-17 | Knee joint positioning method based on multitask two-stage convolution neural network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111340760B (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111767857A (en) * | 2020-06-30 | 2020-10-13 | 电子科技大学 | Pedestrian detection method based on lightweight two-stage neural network |
CN113076987A (en) * | 2021-03-29 | 2021-07-06 | 北京长木谷医疗科技有限公司 | Osteophyte identification method, device, electronic equipment and storage medium |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106845430A (en) * | 2017-02-06 | 2017-06-13 | 东华大学 | Pedestrian detection and tracking based on acceleration region convolutional neural networks |
CN108564029A (en) * | 2018-04-12 | 2018-09-21 | 厦门大学 | Face character recognition methods based on cascade multi-task learning deep neural network |
CN109902806A (en) * | 2019-02-26 | 2019-06-18 | 清华大学 | Method is determined based on the noise image object boundary frame of convolutional neural networks |
-
2020
- 2020-02-17 CN CN202010097868.3A patent/CN111340760B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106845430A (en) * | 2017-02-06 | 2017-06-13 | 东华大学 | Pedestrian detection and tracking based on acceleration region convolutional neural networks |
CN108564029A (en) * | 2018-04-12 | 2018-09-21 | 厦门大学 | Face character recognition methods based on cascade multi-task learning deep neural network |
CN109902806A (en) * | 2019-02-26 | 2019-06-18 | 清华大学 | Method is determined based on the noise image object boundary frame of convolutional neural networks |
Non-Patent Citations (1)
Title |
---|
苗光等: "二维和三维卷积神经网络相结合的CT图像肺结节检测方法", 《激光与光电子学进展》 * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111767857A (en) * | 2020-06-30 | 2020-10-13 | 电子科技大学 | Pedestrian detection method based on lightweight two-stage neural network |
CN113076987A (en) * | 2021-03-29 | 2021-07-06 | 北京长木谷医疗科技有限公司 | Osteophyte identification method, device, electronic equipment and storage medium |
WO2022205928A1 (en) * | 2021-03-29 | 2022-10-06 | 北京长木谷医疗科技有限公司 | Osteophyte identification method and apparatus, and electronic device and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN111340760B (en) | 2022-11-08 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Saeedi et al. | Infrared and visible image fusion using fuzzy logic and population-based optimization | |
Lakshminarayanan et al. | Deep Learning-Based Hookworm Detection in Wireless Capsule Endoscopic Image Using AdaBoost Classifier. | |
Choi et al. | Boosting proximal dental caries detection via combination of variational methods and convolutional neural network | |
Lin et al. | Automatic retinal vessel segmentation via deeply supervised and smoothly regularized network | |
Liang et al. | United snakes | |
CN106940816A (en) | Connect the CT image Lung neoplasm detecting systems of convolutional neural networks entirely based on 3D | |
Reddy et al. | A novel computer-aided diagnosis framework using deep learning for classification of fatty liver disease in ultrasound imaging | |
Rajalakshmi et al. | Deeply supervised u-net for mass segmentation in digital mammograms | |
Le et al. | Liver tumor segmentation from MR images using 3D fast marching algorithm and single hidden layer feedforward neural network | |
Chen et al. | Blood vessel enhancement via multi-dictionary and sparse coding: Application to retinal vessel enhancing | |
CN111709446B (en) | X-ray chest radiography classification device based on improved dense connection network | |
CN111340760B (en) | Knee joint positioning method based on multitask two-stage convolution neural network | |
Mehdizadeh et al. | Deep feature loss to denoise OCT images using deep neural networks | |
Bala | Intracardiac mass detection and classification using double convolutional neural network classifier | |
Wang et al. | Generative image deblurring based on multi-scaled residual adversary network driven by composed prior-posterior loss | |
Kora et al. | Automatic segmentation of polyps using U-net from colonoscopy images | |
Hasegawa et al. | Convolution neural-network-based detection of lung structures | |
CN107146202A (en) | The method of the Image Blind deblurring post-processed based on L0 regularizations and fuzzy core | |
Ramana | Alzheimer disease detection and classification on magnetic resonance imaging (MRI) brain images using improved expectation maximization (IEM) and convolutional neural network (CNN) | |
Antony | Automatic quantification of radiographic knee osteoarthritis severity and associated diagnostic features using deep convolutional neural networks | |
Suresh et al. | Improving the mammogram images by intelligibility mammogram enhancement method | |
Sivakumar et al. | A novel method on earlier detection of bone cancer using Markov random field segmentation | |
Hassanien et al. | Digital mammogram segmentation algorithm using pulse coupled neural networks | |
Cheng et al. | Spherical transformer for quality assessment of pediatric cortical surfaces | |
Sharma et al. | Solving image processing critical problems using machine learning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |