CN111340760A

CN111340760A - Knee joint positioning method based on multitask two-stage convolutional neural network

Info

Publication number: CN111340760A
Application number: CN202010097868.3A
Authority: CN
Inventors: 窦勇; 王康; 牛新; 姜晶菲; 熊运生; 杨迪
Original assignee: National University of Defense Technology
Current assignee: National University of Defense Technology
Priority date: 2020-02-17
Filing date: 2020-02-17
Publication date: 2020-06-26
Anticipated expiration: 2040-02-17
Also published as: CN111340760B

Abstract

The invention discloses a knee joint positioning method based on a multitask two-stage convolutional neural network, and aims to improve the knee joint detection accuracy. The knee joint area positioning network based on the multitask two-stage convolutional neural network is constructed by two stages of networks, wherein the first stage of network is composed of a first feature extraction module and a preliminary target detection module, and the second stage of network is composed of a second feature extraction module and a final target detection module; training the knee joint area positioning network to obtain a trained knee joint area positioning network model; preprocessing a double-knee X-ray film to be detected, and dividing a double-knee image into single-knee images; and performing knee joint positioning on the single knee image based on the trained knee joint area positioning network to obtain a final boundary frame of the single knee image and a final key point of the knee joint area. The invention can improve the detection accuracy of knee joint positioning.

Description

Knee joint positioning method based on multitask two-stage convolutional neural network

Technical Field

The invention relates to the field of image processing, computer vision and target positioning, in particular to a knee joint positioning method based on a multitask two-stage convolutional neural network.

Background

Gonarthritis is a very common joint disease and can even lead to disability. The knee joint arthritis is a disease which needs to be paid high attention because the knee joint arthritis has high morbidity among the old, the obese and the sedentary population, if the knee joint arthritis is serious, the pain feeling of the sufferer can be affected and even the total joint replacement can be caused, and the annual knee joint treatment cost of each patient suffering from the knee joint arthritis can reach 19000 Euro. Currently, physicians are primarily clinically diagnosing gonarthritis on the basis of X-ray film, although there are other medical imaging categories, such as: magnetic resonance imaging, ultrasound imaging, but X-ray film is a cheap and widely used medical imaging. Clinically, the X-ray images taken by patients are medical images including legs, and gonarthritis mainly shows pathological features of narrowing of knee joint gaps, osteophyte formation and hardening, so that only knee joint areas need to be concerned. Since doctors are subjective and time-consuming and labor-consuming when reviewing X-ray films, with the development of computer technology, computer-aided methods also begin to step into the medical field, but for X-ray films including knees, whether doctors perform visual diagnosis or computer-aided diagnosis, positioning of knee joint regions is a prerequisite for diagnosis and is also a crucial step.

At present, knee joint areas are positioned on X-ray films containing double knees in a manual marking mode, namely, the knee joint areas are manually marked by checking each X-ray film based on human eyes, and the manual marking mode is time-consuming and labor-consuming. Computer-aided methods of positioning the knee joint area have emerged later, for example (document "Early d)etection of radiographic kneeosteoarthritis using computer-aided analysis[J]Osteoarthritis and Cartilage,2009,17(10): 1307-: the computer aided analysis is utilized to discover radioactive gonarthritis in early stage, and the osteoarthritis and cartilage journal) proposes that a template matching method is adopted to automatically detect knee joint parts, and the method is slow for large data sets and low in knee joint detection accuracy rate. (document "Quantifying radiographic marking using decreasing volumetric network [ C ]]//201623rdInternational Conference on Pattern Recognition (ICPR). IEEE,2016: 1195-: the severity of the radioactive gonarthritis is quantified by using a deep convolutional neural network, and a method based on Sobel horizontal image gradients and a Support Vector Machine (SVM) is proposed in the 23rd international conference on pattern recognition in 2016. firstly, the knee joint center is automatically detected, and then a fixed area is extracted according to the detected knee joint center to serve as the interested knee joint part. The highest detection rate on 4496X-ray films of the knee joints in the public database OAI (https:// OAI. epi-ucsf. org/datarelease /) was 81.8%, and the highest detection rate using the template matching method was 54.4%. (document "A novel method for automatic localization of joint area on knee panel radiograms [ C)]Springer, Cham,2017: 290-: a new method for automatically positioning the knee joint area in a polished section, a Scandinavian image analysis conference, proposes a method for completing the positioning of the knee joint area by combining a direction gradient Histogram (HOG) and a Support Vector Machine (SVM). The mean IOU (IOU is cross-over ratio, i.e., the IOU cross-over ratio) was measured on 473X-ray films of the knee joints in the public data set MOST (http:// MOST. ucsf. edu)

Is 0.84. The knee joint area is positioned by adopting the modes of extracting traditional characteristics and combining a traditional classifier, the missing detection rate and the false detection rate are high, and the knee joint detection rate is still to be improved. With the development of deep learning, the deep learning technology can effectively represent image features, and has been widely applied to the fields of target recognition, target detection and the like in images.

Disclosure of Invention

The technical problem to be solved by the invention is as follows: in order to further improve the knee joint detection accuracy, a knee joint positioning method based on a multitask two-stage convolutional neural network is provided.

The technical scheme of the invention is as follows: the knee joint positioning method based on the multitask two-stage convolutional neural network mainly comprises the processes of network construction, network training, image preprocessing of an X-ray film to be detected and knee joint positioning in the X-ray film to be detected, and the overall steps are as shown in figure 1:

firstly, a knee joint area positioning network based on a multitask two-stage convolutional neural network is built.

The knee joint area positioning network based on the multitask two-stage convolution neural network comprises two stages of networks: the system comprises a first-level network and a second-level network, wherein the output of the first-level network is used as the input of the second-level network.

The first-level network is composed of a first feature extraction module and a preliminary target detection module. The first feature extraction module receives the single knee image I from the outside, extracts first image features of the single knee image I and sends the first image features to the preliminary target detection module; and the preliminary target detection module detects the first image characteristics and outputs a preliminary knee joint area in the single knee image I.

The first feature extraction module consists of 5 convolutional layers including 4 convolutional layers of 3 × 3 and 1 convolutional layer of 2 × 2 and 3 max pooling layers including 2 × 2 max pooling layers and 1 max pooling layer of 3 × 3.

The method comprises the steps of performing convolution operation on a single knee image I by a first 3 × 3 convolution layer, performing pooling operation on the single knee image I which is subjected to the convolution operation by a first 2 × 2 maximum pooling layer to obtain a feature map F1, performing convolution operation on a feature map F1 by a second 3 × 3 convolution layer, performing pooling operation on a feature map F1 which is subjected to the convolution operation by a first 3 × 3 maximum pooling layer to obtain a feature map F2, performing convolution operation on a feature map F2 by a third 3 × 3 convolution layer, performing pooling operation on a feature map F2 which is subjected to the convolution operation by a second 2 × 2 maximum pooling layer to obtain a feature map F3, performing convolution operation on a feature map F3 by a fourth 3 × 3 convolution layer to obtain a feature map F4, performing convolution operation on a fifth 2 × 2 convolution map F4 to obtain a feature map F5, and obtaining a feature map F5 which is the extracted first image feature.

The preliminary target detection module comprises a1 × 1 convolution layer, a knee joint boundary frame coordinate operation layer and a first non-maximum value inhibition screening layer, wherein the 1 × 1 convolution layer convolves the feature map F5 to obtain two groups of vectors, namely a probability vector A1 of a knee joint and a knee joint boundary frame coordinate offset vector B1, the knee joint boundary frame coordinate operation layer determines a knee joint boundary frame according to A1 and B1 to obtain a knee joint area in the single-knee image I, the first non-maximum value inhibition screening layer conducts non-maximum inhibition screening on the knee joint area in the single-knee image I to obtain a preliminary knee joint area in the single-knee image I, and the single-knee image I and the preliminary knee joint area in the single-knee image I are sent to the second-level network.

The second-level network is composed of a second feature extraction module and a final target detection module. The second feature extraction module is used for carrying out feature extraction on the single knee image I received from the first-level network and the preliminary knee joint region in the single knee image I to obtain second image features; and the final target detection module performs target detection on the second image characteristics to obtain final knee joint boundary frame coordinates of the single knee image I and knee joint region key point coordinates of the single knee image I.

The second feature extraction module is composed of 4 convolutional layers, 3 maximum pooling layers and 1 fully-connected layer, the 4 convolutional layers include 3 × 3 convolutional layers and 12 × 2 convolutional layers, and the 3 maximum pooling layers include 2 × 2 maximum pooling layers and 13 × 3 maximum pooling layers.

The first 3 × convolutional layer in the second feature extraction module performs convolution operation on the preliminary knee joint region in the single knee image I, the first 2 × maximum pooling layer performs pooling operation on the preliminary knee joint region in the single knee image I which is subjected to the convolution operation to obtain a feature map F6, the second 3 × convolutional layer performs convolution operation on a feature map F6, the second 3 × maximum pooling layer performs pooling operation on the feature map F6 which is subjected to the convolution operation to obtain a feature map F7, the third 3 × convolutional layer performs convolution operation on a feature map F7, the third 2 × maximum pooling layer performs pooling operation on the feature map F7 which is subjected to the convolution operation to obtain a feature map F8, the fourth 3 × convolutional layer performs convolution operation on the feature map F8 to obtain a feature map F9, the first full-connection layer performs full-connection operation on the feature map F9 to obtain a feature map F10, and the feature map F10 is the second feature image extracted by the feature map F10.

The final target detection module comprises a second full-connection layer, a knee joint boundary frame, a key point coordinate operation layer of a knee joint area and a second non-maximum value inhibition screening layer. The second full-connection layer performs full-connection on the feature map F10 to obtain three groups of vectors, namely a probability vector A2 of the knee joint, a coordinate offset vector B2 of a boundary frame of the knee joint and a coordinate offset vector C of six key points of a knee joint region. And the key point coordinate operation layer of the knee joint boundary box and the knee joint area determines the coordinates of the knee joint boundary box and the coordinates of the key points of the knee joint area according to A2, B2 and C for the single knee image I. And the second non-maximum inhibition screening layer carries out non-maximum inhibition screening on the knee joint region and the knee joint key point coordinates in the single knee image I, and outputs the single knee image I and the final knee joint region of the single knee image I and the knee joint region key point coordinates of the single knee image I.

And secondly, training the knee joint area positioning network based on the multitask two-stage convolutional neural network.

2.1 prepare the data of the knee joint area location network based on the multitask two-stage convolution neural network.

2.1.1 preprocessing M (M is a positive integer and M is more than 2000) original images to obtain 2M single knee images subjected to histogram equalization processing, wherein the method comprises the following steps:

2.1.1.1 randomly selected M raw images from the OAI baseline public database (https:// OAI. epi-ucsf. org/datarelease/, version 11 months 2008). The original image is an X-ray medical image containing the left knee and the right knee, and due to the influence of different illumination when the X-ray medical image is shot, the X-ray medical image can be a double-knee image with a bright background and dark legs, and can also be a double-knee image with a dark background and bright legs.

2.1.1.2 uniformly transforms M original images into an image with dark background and bright legs. Firstly, double knee images with bright background and dark legs are selected from M original images, then image pixel inversion is carried out, namely, 255 is used for subtracting original pixel values, and the M original images are uniformly converted into images with dark background and bright legs.

2.1.1.3 convert M pixels of an image with a dark background and light legs to [0, 255%]I.e., processing the dark background and light legs images into a fluid 8 image by: the pixel value of each pixel in M images with dark background and bright legs is processed as follows:

where P is the pixel value of any pixel in the pre-processed image, P_maxFor the maximum pixel value in the pre-processed image, P_minFor minimum pixel value in the pre-processed image, P_newIs the pixel value of any one pixel in the processed image.

2.1.1.4 convert the M images of the uit 8 into 2M single knee images by: first, find the width W and half of the width of the M images of the fluid 8, respectively

Then half the width

Get the whole and mark as

Finally, from the image of the agent 8 according to the width coordinate

And (4) cutting, namely dividing the diagram of each agent 8 into 2 single knee images, and finally obtaining 2M single knee images.

2.1.1.5 histogram equalization processing is performed on each of the 2M single knee images. The histogram equalization processing method is shown in (document "anyhow.

2.1.2 annotate the knee joint real bounding box in the 2M single knee images after histogram equalization, the method is as follows:

2.1.2.1 initializing variable m ═ 1;

2.1.2.2 manually and manually labeling 6 key points of the knee joint region of the mth single knee image. Since the doctor or the computer mainly focuses on the knee joint gap and the osteophyte part, 6 points on the boundary of the knee joint gap and the osteophyte are manually marked as main key points of the knee joint area, such as 6 key points of the right knee shown in fig. 3(a), namely a femur inner osteophyte point (FM), a femur outer osteophyte point (FL), a tibia inner osteophyte point (TM), a tibia outer osteophyte point (TL), a joint gap inner point (JSM) and a joint gap outer point (JSL). As shown in fig. 3(b), 6 key points of the left knee are also the femoral medial osteophyte point (FM), the femoral lateral osteophyte point (FL), the tibial medial osteophyte point (TM), the tibial lateral osteophyte point (TL), the joint space medial point (JSM), and the joint space lateral point (JSL), respectively. As shown, the 6 key points of the left and right knees are mirror images. The subsequent steps are processing for 6 key points, without concern for the left knee or the right knee.

2.1.2.3 marking the boundary box of the knee joint in the mth single knee image according to the key points marked manually, the method is:

2.1.2.3.1 center point coordinates (x) of 6 key points in the mth single knee image are calculated respectively_mid,y_mid). Let the coordinates of 6 key points be (x)₁,y₁)、(x₂,y₂)、(x₃,y₃)、(x₄,y₄)、(x₅,y₅)、(x₆,y₆) Center point coordinates (x) of 6 key points_mid,y_mid) Comprises the following steps:

2.1.2.3.2 calculating the width w of the knee_kneeSquare, squareThe method comprises the following steps: the maximum abscissa (x) of 6 key points is calculated_max) And minimum abscissa (x)_min) Maximum abscissa (x)_max) And minimum abscissa (x)_min) The difference therebetween is taken as the knee joint width w_kneeNamely:

x_max＝max(x₁,x₂,x₃,x₄,x₅,x₆)

x_min＝min(x₁,x₂,x₃,x₄,x₅,x₆)

w_knee＝x_max-x_min

2.1.2.3.3 marking the knee joint area to obtain the real knee joint area boundary box coordinate

The coordinates represent the bounding box of the real knee joint region (upper left abscissa, upper left ordinate, lower right abscissa, lower right ordinate). The center points of the 6 key points are used as the center to expand the knee joint area by 0.65 times (the knee joint area on each graph can be framed by 0.65 times) of the width of the knee joint up, down, left and right, and the knee joint area is used as the knee joint area really interested, namely the labeled boundary box, and the calculation formula is as follows:

2.1.2.4 if M is more than or equal to 2M, rotating to 2.2; if M <2M, let M be M +1, go to 2.1.2.2.

2.2 prepare training samples for the first stage network in the knee joint area positioning network based on the multitask two-stage convolution neural network, the method is as follows:

2.2.1 initializing variable m ═ 1;

2.2.2 prepare training samples for the first level network on the mth single knee image by:

2.2.2.1 initializing variable k ═ 1;

2.2.2.2 randomly taking the kth point on the mth single knee image, and enabling the coordinate of the kth point to be

By point

As the top left coordinate point of the randomly selected bounding box.

2.2.2.2 taking the width of the kth randomly chosen bounding box

And height

Let the width of the mth single knee image be w_mHeight of h_mAnd take w_mAnd h_mIs half of the minimum value of (a), and a rounding operation is performed, i.e., int (min (w)_m,h_m)/2),int(min(w_m,h_m) [ 2 ] denotes the ratio of min (w)_m,h_m) Rounded at [48, int (min (w) ]_m,h_m)/2)]Randomly taking an integer in the value range of (a) as the width of the randomly selected bounding box

And height

2.2.2.3 determine the coordinates of the kth randomly chosen bounding box. Order point

For the right side of a randomly selected bounding boxThe lower coordinate point is set to be the lower coordinate point,

randomly selected bounding box coordinates of

Representing the kth randomly chosen bounding box (upper left abscissa, upper left ordinate, lower right abscissa, lower right ordinate).

2.2.2.4 calculating the intersection ratio of the kth randomly selected bounding box and the labeled bounding box on the mth single knee image

If it is

The kth randomly chosen bounding box is used

As a positive sample of training, go to 2.2.2.5; if it is

The kth randomly chosen bounding box is used

As part of the training, go to 2.2.2.5; if it is

The kth randomly chosen bounding box is used

As a negative example of training, go 2.2.2.5.

2.2.2.5 if K < K (K >50, K is the total number of randomly selected bounding boxes on the mth single knee image), let K be K +1, turn 2.2.2.2; if K is more than or equal to K, the operation is switched to 2.2.2.6.

2.2.2.6 if M is more than or equal to 2M, turning to 2.2.3; if M <2M, let M be M +1, turn 2.2.2.

2.2.3 downscaling the training samples (including positive, partial, negative) generated in step 2.2.2 to 48 × 48 × 3 (image length × image width × image channels) since it is an RGB image, the number of image channels is 3.

2.2.4 the training sample after 2.2.3 steps of size reduction is amplified by mirror image operation, the method is: horizontally turning each training sample left and right, namely performing mirror image operation on each training sample to generate a training sample of a first-level network, wherein the training sample comprises N₁₁Positive sample, N₁₂Partial sample, N₁₃A negative sample, N₁₁、N₁₂And N₁₃All positive integers are determined by randomly selecting a boundary box on the knee image and calculating the size of the intersection ratio.

2.3 training the first-stage network by adopting the first-stage network training sample to obtain the trained first-stage network, wherein the method comprises the following steps:

2.3.1 initializing variable q to 1, setting values of model parameters, including setting learning rate to 0.001 and batch size to 500;

2.3.2 inputting the training sample of the first network into the first network, and outputting an overall loss function L_q，L_qIs calculated as follows, with the model parameters set in step 2.3.1, by L_qThe back propagation of the value updates the parameters in the first-level network to obtain the first-level network NET with updated parameters for the q-th time_q；

Where N is the number of training samples for the first stage network, and N is N₁₁+N₁₂+N₁₃，

Represents the loss of the ith training sample on the jth task, j is more than or equal to 1 and less than or equal to 3, and totally has 3 tasks of knee joint detection, knee joint boundary box positioning, knee joint key point positioning, α_jCoefficient of importance of j-th task and weight of knee joint detection taskDegree of importance coefficient α₁1, importance coefficient α for knee joint bounding box positioning task₂0.5, importance coefficient α for knee joint key point localization task₃＝0；

2.3.3, if Q is more than or equal to 2 and less than or equal to Q +1, Q is the number of times of network training, Q is 10, and then 2.3.4 is carried out; otherwise, after training, the first-stage network after training, namely NET, is obtained_QTurning to 2.4;

2.3.4 input training samples of the first level network into NET_q-1Outputting the q-th overall loss function L_qPassing through L under model parameters set in step 2.3.1_qReverse propagation of values to update NET_q-1The parameter in (1) is obtained, and the first-level network NET with the parameter updated for the q-th time is obtained_qTurning to 2.3.3;

2.4 prepare training samples for the second-stage network in the knee joint area positioning network based on the multitask two-stage convolutional neural network, the method comprises the following steps:

2.4.1 initializing variable m ═ 1;

2.4.2 positioning the preliminary knee joint region of the mth single knee image I by using the trained first-level network, wherein the method comprises the following steps:

2.4.2.1 generating a picture pyramid of the mth single-knee image I, the method comprises the steps of multiplying the mth single-knee image I by different scaling factors s (s is a positive real number, and 0< s is less than or equal to 1), scaling the mth single-knee image I in different scales to obtain PP (PP is a positive integer and 0< PP <100) single-knee images with different sizes, and forming the picture pyramid of the single-knee image I, wherein the size of each image in the Picture Pyramid (PP) is recorded as H × W × 3 (the number of channels of × images of the high × images of the images is RGB images, and 3 channels exist).

2.4.2.2 initializing variable p ═ 1;

2.4.2.3 the first trained feature extraction module extracts the features of the p-th picture in the picture pyramid to obtain a first image feature F5, where F5 is

(Picture height × Picture Width × feature map channel number).

2.4.2.4 the preliminary target detection module detects the first image feature F5 by using a knee joint region detection method to obtain a knee joint region in the single knee image I, wherein the knee joint region detection method is as follows:

2.4.2.4.1 the 1 × 1 convolution layer in the preliminary target detection module performs 1 × 1 convolution on the first image feature F5 of the p picture in the picture pyramid, and outputs two groups of vectors, namely the probability vector A1 of the knee joint_p，A1_pIs composed of

Representing the number of vectors, 1 representing the dimension of each vector), vector a1_pStored are probability values (in the range of 0, 1) for what is considered to be the knee joint]) When the probability value is not lower than α, the knee joint is considered, α is a first threshold value, α∈ [0.5, 1]α preferably having a value of 0.6, knee joint bounding box coordinate offset vector B1_p，B1_pIs composed of

Representing the number of vectors, 4 representing the dimension of each vector), 4 dimensions (x) of each vector₁，y₁，x₂，y₂) Representing the bounding box (upper left abscissa offset, upper left ordinate offset, lower right abscissa offset, lower right ordinate offset).

2.4.2.4.2 Knee Joint bounding Box coordinate operation layer pairs of preliminary object detection Module A1_pAnd B1_pAnd screening and calculating to obtain the knee joint area in the single knee image I. Found A1_pN vector positions with medium probability value not lower than α

At the same time as B1_pFind the corresponding N vector positions

And 4-dimensional coordinate offset vector corresponding to the N positions

Wherein the content of the first and second substances,

and (3) coordinate offsets (upper left horizontal coordinate offset, upper left vertical coordinate offset, lower right horizontal coordinate offset, lower right vertical coordinate offset) of the nth bounding box. Calculating the coordinates of N knee joint region bounding boxes in the original single knee image I

And framing N knee joint areas on the original drawing I according to the coordinates of the boundary frame, wherein,

coordinates (upper left horizontal coordinate, upper left vertical coordinate, lower right horizontal coordinate, lower right vertical coordinate) representing the nth bounding box are calculated as follows:

wherein the content of the first and second substances,

coordinate offset (upper left horizontal coordinate offset, upper left vertical coordinate offset, lower right horizontal coordinate offset, lower right vertical coordinate offset) of the nth bounding box, s (0)<s≤1) Representing the scaling factor of the original single knee image I.

2.4.2.5 if p is not less than PP, turning to 2.4.2.6; if p < PP, let p be p +1, go to 2.4.2.3.

2.4.2.6 the first non-maximum suppression screening layer employs a non-maximum suppression (NMS) algorithm to filter all knee joint regions of the mth single knee picture I. Non-Maximum Suppression (NMS) algorithm reference (document "Efficient Non-Maximum Suppression [ C ]]//18th International Conference on Pattern Recognition (ICPR 2006),20-24August 2006, Hong Kong, China. Efficient non-maximum suppression, international conference on pattern recognition at 18th conference), setting the non-maximum suppression filter threshold delta to 0.6, and screening the knee joint boundary region by the threshold to obtain the N of the mth single knee image I₁(N₁Is a positive integer and 0-N₁≦ 100) preliminary knee joint regions.

2.4.3 prepare training samples for the second level network on the mth single knee image by:

2.4.3.1 initializing variable n₁＝1，k＝1；

2.4.3.2 calculating the nth of the mth Single Knee image I₁The intersection and combination ratio of the preliminary knee joint region and the knee joint region marked by the mth single knee image I

If it is

Will n be₁Taking the preliminary knee joint area as a positive sample of the second-level network training, turning to 2.4.3.3; if it is

Will n be₁Taking the preliminary knee joint area as a partial sample of the second-stage network training, turning to 2.4.3.3; if it is

Will n be₁The individual preliminary knee joint regions are used as negative examples for the second level of network training, turn 2.4.3.3.

2.4.3.3 if n₁≥N₁(N₁Is the number of the preliminary knee joint areas in the mth single knee joint image I, and N is more than or equal to 0₁Less than or equal to 100), rotating 2.4.3.4; if n is₁<N₁Let n be₁＝n₁+1, 2.4.3.2.

2.4.3.4 randomly selecting the kth point on the mth single knee image

Will be dotted

As the top left coordinate point of the randomly selected bounding box.

2.4.3.5 taking the width of the kth randomly chosen bounding box

And height

Let the width of the mth single knee image be w_mHeight of h_mTaking w_mAnd h_mIs half of the minimum value of (d), and a rounding operation is performed, i.e., int (min (w)_m,h_m) At [48, int (min (w) ]2)_m,h_m)/2)]Randomly taking an integer as the width of the randomly selected bounding box

And height

2.4.3.6 determine the coordinates of the kth randomly chosen bounding box: dot

Marked as the bottom right coordinate point of the kth randomly chosen bounding box,

the kth randomly selected bounding boxIs marked as

2.4.3.7 calculating the intersection ratio of the kth randomly selected bounding box and the labeled bounding box on the mth single knee image

If it is

Taking the kth randomly selected bounding box as training data for positioning the key points of the knee joint area, and turning to 2.4.3.8; if it is

Go directly to 2.4.3.8.

2.4.3.8 if K < K, let K equal to K +1, go to 2.4.3.4, if K ≧ K, go to 2.4.4.

2.4.4 if M is more than or equal to 2M, rotating to 2.4.5; if M <2M, let M be M +1, turn 2.4.2.

2.4.5 the positive, partial, and negative samples generated in steps 2.4.3-2.4.4 were down-scaled to 48 × 48 × 3 (image length × image width × image channels) since they are RGB images, the number of image channels is 3.

2.4.6 the positive sample, partial sample and negative sample after 2.4.5 steps of size reduction are subjected to mirror image operation and amplification, and the method comprises the following steps: turning each training sample in the left-right direction, namely performing mirror image operation to obtain training samples of a second-level network, wherein the training samples comprise N₂₁Positive sample, N₂₂Partial sample, N₂₃A negative sample, N₂₁、N₂₂And N₂₃All positive integers are determined by the initial knee joint area selected on the single knee image and the size of the intersection ratio obtained by calculation.

2.5 training the second-level network by adopting the second-level network training sample generated in the step 2.4.6 to obtain the trained second-level network, wherein the method comprises the following steps:

2.5.1 initializing variable t ═ 1, setting the values of the model parameters, including setting the learning rate to 0.0001 and the batch size batch _ size to 500;

2.5.2 inputting the training sample of the second network into the second network, and outputting an overall loss function L_t，L_tIs calculated as follows, with the model parameters set in step 2.5.1, by L_tThe back propagation of the values updates the parameters in the second level network to obtain the second level network NET with updated parameters for the t time_t；

Wherein N is₂Is the number of training samples, N₂＝N₂₁+N₂₂+N₂₃；

Represents the loss of the ith training sample on the jth task, j is more than or equal to 1 and less than or equal to 3, and totally has 3 tasks of knee joint detection, knee joint boundary box positioning, knee joint key point positioning, α_jAn importance degree coefficient α showing the importance degree coefficient of the jth task and the knee joint detection task₁0.8, importance of knee positioning task α₂0.6, importance coefficient α for knee joint key point localization task₃＝1.5；

2.5.3, if T is equal to or more than 2 and is equal to or less than T, T is the number of times of network training, T is equal to 10, and then 2.5.4 is carried out; otherwise, after training, the second-level Network (NET) after training is obtained_TTurning to the third step;

2.5.4 inputting training sample of second-level network into NET_t-1Outputting the t-th overall loss function L_tPassing through L under model parameters set in step 2.5.1_tReverse propagation of values to update NET_t-1The parameter in (1) is obtained, and the second-level network NET with the parameter updated for the t time is obtained_t(ii) a Turn 2.5.3.

Thirdly, preprocessing the X-ray film of the double knees to be detected to obtain 2 single knee images to be detected which are processed by histogram equalization, wherein the method comprises the following steps:

3.1 if the X-ray film of the double knees to be detected is that the background is bright and the two legs are dark, converting the X-ray film of the double knees to be detected into an image that the background is dark and the two legs are bright, namely subtracting the original pixel value of the X-ray film of the double knees to be detected from 255 to be used as a new image pixel value, converting the new image pixel value into an image that the background is dark and the two legs are bright, and converting the image into 3.2; if the X-ray film of the double knees to be detected is dark background and bright legs, directly rotating to 3.2.

3.2 converting the pixels of the X-ray film to be detected into [0, 255 ]]Processing the double-knee X-ray film to be detected into double-knee images of the uint 8; the new pixel value corresponding to each pixel in the double knee image of the agent 8 is

P is the pixel value of each pixel of the pre-processed image, P_maxFor maximum pixel value of pre-processed image, P_minIs the minimum pixel value of the pre-processed image.

3.3 converting the double knee image of the agent 8 into a single knee image by: finding the width W and half of the width of the uint8 bipartite knee image, respectively

Then half the width

Get the whole and mark as

Finally, according to width coordinates [0, int (W/2) ] from the double knee image]、[int(W/2)+1，W-1]The truncated, i.e., double knee image of the fluid 8 is divided into 2 single knee images.

3.4, performing histogram equalization processing on 2 single knee images, wherein the 2 single knee images subjected to the histogram equalization processing are used as the single knee images to be detected.

Fourthly, performing knee joint positioning on the 2 to-be-detected single knee images obtained in the third step based on the trained knee joint area positioning network of the multitask two-stage convolutional neural network, wherein the method comprises the following steps of:

4.1 initializing variable d ═ 1;

and 4.2, generating a picture pyramid of the d-th single knee image to be detected, zooming the single knee image in different scales, wherein the zooming factor is s (0< s is less than or equal to 1), obtaining PP (0< PP <100) single knee images in different sizes, and setting the size of each image as H × W × 3 (the number of channels of the wide × image of the high × image of the image), wherein the picture pyramid of the single knee image to be detected is called as the picture pyramid of the single knee image to be detected.

4.3 processing the image pyramid of the d-th single knee image to be detected by the first-stage network in the knee joint area positioning network based on the trained multitask two-stage convolutional neural network to obtain a preliminary knee joint area, wherein the method comprises the following steps:

4.3.1 initializing variable p ═ 1;

4.3.2 the first feature extraction module extracts the features of the p image in the image pyramid of the d single knee image to be detected, and the method comprises the following steps:

4.3.2.1 convolution operation is carried out on the p picture in the picture pyramid of the d single knee image to be detected by the first 3 × 3 convolution layer in the first feature extraction module, pooling operation is carried out on the p picture after convolution operation by the first 2 × 2 maximum pooling layer, a feature image F1 is output, wherein the convolution operation step length is 1, the maximum pooling operation step length is 2, the size of the p picture in the picture pyramid is set as H × W × 3 (the number of picture channels is × pictures and × pictures, and is 3 because of RGB images), and the convolution operation is carried out on the p picture pyramid through the first convolution and maximum pooling operation layer to obtain the final image

(number of channels of signature height × signature width × signature F1) size signature F1.

4.3.2.2 the second 3 × 3 convolutional layer in the first feature extraction module performs convolution operation on the feature map F1, the second 3 × 3 maximal pooling layer performs pooling operation on the feature map F1 which completes the convolution operation, and obtains a feature map F2, wherein the step size of the convolution operation is 1, the step size of the maximal pooling operation is 2,

(number of channels in feature map height × feature map width × feature map F1) size feature map F1 is obtained by second convolution and maximum pooling operation layer

(number of channels of signature height × signature width × signature F2) size signature F2.

4.3.2.3 the third 3 × 3 convolutional layer in the first feature extraction module performs convolution operation on the feature map F2, the third 2 × 2 maximal pooling layer performs pooling operation on the feature map F2 which completes the convolution operation, and obtains a feature map F3, wherein the step size of the convolution operation is 1, the step size of the maximal pooling operation is 2,

(the number of channels of feature map height × feature map width × feature map F2) is obtained by a third convolution and a maximum pooling operation layer

(number of channels of signature height × signature width × signature F3) size signature F3.

4.3.2.4 the fourth 3 × 3 convolutional layer in the first feature extraction module performs convolution operation on the feature map F3 to obtain a feature map F4, wherein the step size of the convolution operation is 1,

(number of channels in feature map height × feature map width × feature map F3) feature map F3 is obtained through a fourth convolution operation layer

(number of channels of signature height × signature width × signature F4) size signature F4.

4.3.2.5 the fifth 2 × 2 convolutional layer in the first feature extraction module performs convolution operation on the feature map F4 to obtainFeature map F5: the convolution operation has a step size of 1,

(number of channels of feature map height × feature map width × feature map F4) feature map F4 is obtained through a fifth convolution operation layer

(the feature map height × and the feature map width × are the number of channels of the feature map F5) of the size of the feature map F5, wherein the feature map F5 is the first image feature extracted by the first-level network input of the pth picture of the picture pyramid of the pth single knee image to be detected.

4.3.3 the preliminary target detection module of the first-level network detects F5 by adopting the knee joint region detection method described in 2.4.2.4 to obtain the knee joint region of the single knee image to be detected;

4.3.4 if p is less than PP, let p be p +1, change to 4.3.2; if p is not less than PP, then 4.3.5 is turned.

And 4.3.5, filtering all knee joint bounding boxes in the d-th single knee image to be detected by a non-maximum suppression (NMS) algorithm through the second non-maximum suppression screening layer. Setting a filtering threshold delta of a non-maximum suppression (NMS) algorithm to be 0.7, screening a plurality of knee joint boundary frames in the single knee image to be detected through the threshold, framing a knee joint area on the d-th single knee image to be detected by the filtered boundary frames, and outputting N as N output by the first-level network₁A preliminary knee joint region.

4.4 second-level network processing the No. d single knee image to be detected₁Obtaining a final knee joint region and key points of the region by the initial knee joint region, wherein the method comprises the following steps:

4.4.1 initializing variable n₁＝1；

4.4.2 outputting the nth image of the d single knee to be detected from the first-level network₁The preliminary knee region dimensions were normalized to 48 × 48 × 3 (number of channels for image height × image width × image).

4.4.3 second-level network extracting nth image on the d-th single knee image to be detected₁Opening the characteristics of the initial knee joint region image, wherein the method comprises the following steps:

4.4.3.1 the first 3 × 3 convolutional layer pair in the second feature extraction Module₁Performing convolution operation on the initial knee joint region image, and performing convolution operation on the nth layer of the first 2 × 2 maximum pooling layer₁Pooling is carried out on the initial knee joint region images, the convolution operation step of the output feature map F6. is 1, the maximum pooling operation step is 2, the size of the scale-normalized initial knee joint region image is 48 × 48 × 3 (the number of channels of the picture × picture width × picture is 3 because the picture is an RGB image), and a feature map F6 (the number of channels of the feature map × feature map width × feature map F6) with the size of 23 × 23 × 32 is obtained through a first convolution and maximum pooling operation layer.

4.4.3.2 the second 3 × 3 convolutional layer in the second feature extraction module performs convolution operation on the feature map F6, the second 3 × 3 largest pooling layer performs pooling operation on the feature map F6 after the convolution operation is completed, the output feature map F7. has convolution operation step size of 1, the largest pooling operation step size is 2, and the feature map F6 with the size of 23 × 23 × 32 (feature map height × feature map width × feature map F6 channel number) passes through the second convolution and largest pooling operation layer to obtain the feature map F7 with the size of 10 × 10 × 64 (feature map height × feature map width × feature map F7 channel number).

4.4.3.3 the third 3 × 3 convolutional layer in the second feature extraction module performs convolution operation on the feature map F7, the third 2 × 2 maximal pooling layer performs pooling operation on the feature map F7 which has completed the convolution operation, the output feature map F8. has a convolution operation step size of 1, the maximal pooling operation step size is 2, the feature map F7 with the size of 10 × 10 × 64 (the number of channels of the feature map F7 with the feature map height × and the feature map width ×) passes through the third convolution and maximal pooling operation layers to obtain the feature map F8 with the size of 4 × 4 × 64 (the number of channels of the feature map F8 with the feature map height × and the feature map width ×).

4.4.3.4 the feature map F8 is convolved by the fourth 3 × 3 convolution layer in the second feature extraction module, and the feature map F8 with the size of the convolution operation step of the fourth layer of the output feature map F9. is 1, 4 × 4 × 64 (the number of channels of the feature map F8 with feature map height × and feature map width ×) is processed by the fourth convolution operation layer to obtain the feature map F9 with the size of 2 × 2 × 128 (the number of channels of the feature map F9 with feature map height × and feature map width ×).

4.4.3.5 the first full-connection layer in the second feature extraction module performs full-connection operation on the feature map F9, and outputs a feature map F9 of the size of the feature map F10.2 × 2 × 128 (the number of channels of the feature map F9 with the feature map height × and the feature map width ×) to obtain a feature map F10 containing 256-dimensional vectors through the first full-connection layer, where the feature map F10 is the extracted second image feature.

4.4.4.4 the final target detection module of the second level network processes the second image feature, i.e. feature map F10, and outputs the final knee joint region and the key point coordinates of the d-th single knee image to be detected, the method is:

4.4.4.1 the second full-connection layer of the final target detection module of the second-level network performs full connection on the feature map F10 and outputs three groups of vectors, namely probability vectors of knee joints

Knee joint bounding box coordinate offset vector

Knee joint key point coordinate offset vector

Wherein

There is a 1-dimensional vector that is,

there is a 4-dimensional vector that is,

there is a 12-dimensional vector.

4.4.4.2 Knee joint boundary box coordinate and knee joint region key point coordinate operation layer pair of final target detection module of second-level network

And screening and calculating the vectors to obtain the knee joint area of the d-th single knee image to be detected and the key points of the knee joint area. Vector quantity

Stored are probability values (in the range of 0, 1) for what is considered to be the knee joint]) When the probability value is not lower than β, the knee joint is considered, β is a second threshold value, β∈ [0.5, 1]β preferably has a value of 0.7, if

Is greater than β, then the nth value₁Zhang preliminary Knee region is preserved, n₁Zhang preliminary knee joint area bounding box coordinates as

Denotes the n-th₁Stretching the boundary box of the primary knee joint area (left upper horizontal coordinate, left upper vertical coordinate, right lower horizontal coordinate, right lower vertical coordinate), and reserving the coordinate offset vector of the boundary box of the knee joint

And knee joint key point coordinate offset vector

Is provided with

Denotes the n-th₁The offset coordinates (the offset of the upper left horizontal coordinate, the offset of the upper left vertical coordinate, the offset of the lower right horizontal coordinate, and the offset of the lower right vertical coordinate) of the boundary frame of the knee joint are set

Denotes the n-th₁Offset coordinates (offset abscissa, offset ordinate) of key point FM points of individual knee joint regions,

denotes the n-th₁Offset coordinates (offset abscissa, offset ordinate) of the critical point FL point of the individual knee joint region,

denotes the n-th₁Offset coordinates (offset abscissa, offset ordinate) of the key point TM points of the individual knee joint regions,

denotes the n-th₁Offset coordinates (offset abscissa, offset ordinate) of the critical point TL point of the individual knee joint region,

denotes the n-th₁Offset coordinates (offset abscissa, offset ordinate) of JSM points of individual knee joint regions,

denotes the n-th₁Offset coordinates (offset abscissa, offset ordinate) of JSL points of individual knee joint regions. Final n th₁Boundary frame coordinates of individual knee joint

(denotes the n-th₁Coordinates of the bounding box of the individual knee joints (upper left abscissa, upper left ordinate, lower right abscissa, lower right ordinate)) and the coordinates of the 6 key points

(denotes the n-th₁The coordinates of the FM points of the individual knee regions),

(denotes the n-th₁The coordinates of FL points for individual knee joint regions),

(denotes the n-th₁The coordinates of the TM points of the individual knee joint regions),

(denotes the n-th₁Coordinates of TL points of the individual knee joint regions),

(denotes the n-th₁Coordinates of JSL points of individual knee joint regions),

(denotes the n-th₁Coordinates of JSM points of individual knee joint regions) are calculated as follows:

4.4.4.3 if n₁<N₁Let n be₁＝n₁+1, rotating to 4.4.2; if n is₁≥N₁Turn 4.4.4.4.

4.4.4.4.4 the second non-maximum suppression screening layer screens the knee joint bounding box in the d-th single knee image to be detected by adopting a non-maximum suppression (NMS) algorithm to obtain a final bounding box of the d-th single knee image to be detected and a final key point of the knee joint area, wherein the filtering threshold value delta of the non-maximum suppression (NMS) algorithm is 0.7.

4.4.4.5 if d is less than 2, let d be d +1, turn 4.2; if d is more than or equal to 2, the fifth step is carried out.

And fifthly, ending.

The invention can achieve the following beneficial effects: the knee joint positioning method based on the multitask two-stage convolutional neural network improves the detection rate of knee joint positioning. The invention is compared with an HOG + SVM method (documents of' A novel method for automatic localization of joint area on knee joint plane radiograms [ C ]// Scandinavian Conference on Image analysis. Springer, Cham,2017: 290: ubuntu16.04, GPU 2080Ti, python3.6 and pytorch 0.4. When the method is tested on an OAI database (45110 single knee images), the average detection accuracy rate measured by the method is 99.93 percent, and the average detection accuracy rate measured by an HOG + SVM method is 91.73 percent; when the method is tested on an MOST database (19383 single knee images), the average detection accuracy rate measured by the method is 99.02 percent, and the average detection accuracy rate measured by an HOG + SVM method is 98.20 percent. Therefore, the knee joint positioning detection method effectively improves the knee joint positioning detection rate.

Drawings

FIG. 1 is an overall flowchart of the present invention for positioning a knee joint based on a multitasking two-stage convolutional neural network;

FIG. 2 is a logical block diagram of the multitasking two-stage convolutional neural network-based present invention;

FIG. 3 is an illustration of the present invention showing 6 key points of the left and right knees at step 2.1.2.2.

Detailed Description

FIG. 1 is an overall flowchart of the present invention for positioning a knee joint based on a multitasking two-stage convolutional neural network; as shown in fig. 1, the present invention comprises the steps of:

The knee joint area positioning network based on the multitask two-stage convolutional neural network is shown in fig. 2 and comprises two stages of networks: the system comprises a first-level network and a second-level network, wherein the output of the first-level network is used as the input of the second-level network.

2.1.1.1 randomly selected M raw images from the OAI baseline public database (https:// OAI. epi-ucsf. org/datarelease/, version 11 months 2008). The original image is an X-ray medical image containing the left and right knees.

Then half the width

Get the whole and mark as

Finally, from the image of the agent 8 according to the width coordinate

2.1.1.5 histogram equalization processing is performed on each of the 2M single knee images. The histogram equalization processing method is disclosed in the document "anybrillphi. application of histogram equalization to image processing [ J ] scientific and technical information, 2007(04): pages 39-40 ].

2.1.2.1 initializing variable m ═ 1;

2.1.2.2 manually and manually labeling 6 key points of the knee joint region of the mth single knee image. Since the doctor or computer is primarily concerned with the knee joint gaps and osteophyte locations, 6 points of the knee joint gaps and osteophyte boundaries are manually marked as the main key points of the knee joint area. Fig. 3 is an illustration of labeling 6 key points of the left and right knees at step 2.1.2.2 according to the present invention, wherein the 6 key points of the right knee shown in fig. 3(a) are the medial femoral osteophyte point (FM), the lateral femoral osteophyte point (FL), the medial tibial osteophyte point (TM), the lateral tibial osteophyte point (TL), the medial articular joint space point (JSM), and the lateral articular space point (JSL), respectively. As shown in fig. 3(b), 6 key points of the left knee are also the femoral medial osteophyte point (FM), the femoral lateral osteophyte point (FL), the tibial medial osteophyte point (TM), the tibial lateral osteophyte point (TL), the joint space medial point (JSM), and the joint space lateral point (JSL), respectively. As shown, the 6 key points of the left and right knees are mirror images. The subsequent steps are processing for 6 key points, without concern for the left knee or the right knee.

2.1.2.3.2 calculating the width w of the knee_kneeThe method comprises the following steps: the maximum abscissa (x) of 6 key points is calculated_max) And minimum abscissa (x)_min) Maximum abscissa (x)_max) And minimum abscissa (x)_min) The difference therebetween is taken as the knee joint width w_kneeNamely:

x_max＝max(x₁,x₂,x₃,x₄,x₅,x₆)

x_min＝min(x₁,x₂,x₃,x₄,x₅,x₆)

w_knee＝x_max-x_min

2.2.1 initializing variable m ═ 1;

2.2.2.1 initializing variable k ═ 1;

By point

As the top left coordinate point of the randomly selected bounding box.

2.2.2.2 taking the width of the kth randomly chosen bounding box

And height

Let the width of the mth single knee image be w_mHeight of h_mAnd take w_mAnd h_mIs half of the minimum value of (a), and a rounding operation is performed, i.e., int (min (w)_m,h_m)/2),int(min(w_m,h_m) [ 2 ] denotes the ratio of min (w)_m,h_m) Rounded at [48, int (min (w) ]_m,h_m)/2)]Is random in the value range ofTaking an integer as the width of the randomly selected bounding box

And height

For the lower right coordinate point of the randomly selected bounding box,

randomly selected bounding box coordinates of

If it is

The kth randomly chosen bounding box is used

As a positive sample of training, go to 2.2.2.5; if it is

The kth randomly chosen bounding box is used

As part of the training, go to 2.2.2.5; if it is

The kth randomly chosen bounding box is used

As a negative example of training, go 2.2.2.5.

2.2.3 downscaling the training samples (including positive, partial, negative) generated in step 2.2.2 to 48 × 48 × 3 (length of image × image width of × image channels).

2.2.4 the training sample after 2.2.3 steps of size reduction is amplified by mirror image operation, the method is: horizontally turning each training sample left and right, namely performing mirror image operation on each training sample to generate the training sample of the first-stage network, wherein the number of positive samples of the first-stage network is N₁₁The number of partial samples is N₁₂The number of negative samples is N₁₃，N₁₁、N₁₂And N₁₃Are all positive integers.

Wherein N isNumber of training samples of first-stage network, N ═ N₁₁+N₁₂+N₁₃，

Represents the loss of the ith training sample on the jth task, j is more than or equal to 1 and less than or equal to 3, and totally has 3 tasks of knee joint detection, knee joint boundary box positioning, knee joint key point positioning, α_jAn importance degree coefficient α showing the importance degree coefficient of the jth task and the knee joint detection task₁1, importance coefficient α for knee joint bounding box positioning task₂0.5, importance coefficient α for knee joint key point localization task₃＝0；

2.4.1 initializing variable m ═ 1;

2.4.2.2 initializing variable p ═ 1;

(Picture height × Picture Width × feature map channel number).

2.4.2.4.2 preliminary object detection moduleThe knee joint boundary box coordinate operation layer pair A1_pAnd B1_pAnd screening and calculating to obtain the knee joint area in the single knee image I. Found A1_pN vector positions with medium probability value not lower than α

At the same time as B1_pFind the corresponding N vector positions

And 4-dimensional coordinate offset vector corresponding to the N positions

Wherein the content of the first and second substances,

wherein the content of the first and second substances,

coordinate offset (upper left horizontal coordinate offset, upper left vertical coordinate offset, lower right horizontal coordinate offset, lower right vertical coordinate offset) of the nth bounding box, s (0)<s ≦ 1) represents the scaling factor of the original single knee image I.

2.4.3.1 initializing variable n₁＝1，k＝1；

If it is

2.4.3.4 randomly selecting the kth point on the mth single knee image

Will be dotted

As the top left coordinate point of the randomly selected bounding box.

2.4.3.5 taking the width of the kth randomly chosen bounding box

And height

And height

2.4.3.6 determine the coordinates of the kth randomly chosen bounding box: dot

the kth randomly selected bounding box coordinate is

If it is

Go directly to 2.4.3.8.

2.4.3.8 if K < K, let K equal to K +1, go to 2.4.3.4, if K ≧ K, go to 2.4.4.

2.4.6 the positive sample, partial sample and negative sample after 2.4.5 steps of size reduction are subjected to mirror image operation and amplification, and the method comprises the following steps: turning each training sample in the left-right direction, namely performing mirror image operation to obtain the training of the second-level networkThe number of samples and positive samples of the second-stage network training is N₂₁The number of partial samples is N₂₂The number of negative samples is N₂₃，N₂₁、N₂₂And N₂₃Are all positive integers.

2.5.4 inputting training samples of the second level networkInto NET_t-1Outputting the t-th overall loss function L_tPassing through L under model parameters set in step 2.5.1_tReverse propagation of values to update NET_t-1The parameter in (1) is obtained, and the second-level network NET with the parameter updated for the t time is obtained_t(ii) a Turn 2.5.3.

Then half the width

Get the whole and mark as

4.1 initializing variable d ═ 1;

4.3.1 initializing variable p ═ 1;

(feature height × feature Width × TetNumber of channels of the feature map F3) is obtained by the fourth convolution operation layer F3

4.3.2.5 the fifth 2 × 2 convolutional layer in the first feature extraction module performs convolution operation on the feature map F4 to obtain a feature map F5, wherein the step size of the convolution operation is 1,

4.4.1 initialization variablesn₁＝1；

Knee joint bounding box coordinate offset vector

Knee joint key point coordinate offset vector

Wherein

There is a 1-dimensional vector that is,

there is a 4-dimensional vector that is,

there is a 12-dimensional vector.

And knee joint key point coordinate offset vector

Is provided with

Denotes the n-th₁One knee jointOffset coordinates (offset abscissa, offset ordinate) of key point FM point of the nodal region,

And fifthly, ending.

Claims

1. A knee joint positioning method based on a multitask two-stage convolution neural network is characterized by comprising the following steps:

the method comprises the following steps that firstly, a knee joint area positioning network based on a multitask two-stage convolutional neural network is built, the knee joint area positioning network based on the multitask two-stage convolutional neural network comprises a first-stage network and a second-stage network, and the output of the first-stage network is used as the input of the second-stage network;

the first-level network consists of a first feature extraction module and a preliminary target detection module; the first feature extraction module receives the single knee image I from the outside, extracts first image features of the single knee image I and sends the first image features to the preliminary target detection module; the preliminary target detection module detects the first image characteristics and outputs a preliminary knee joint area in the single knee image I;

the first feature extraction module is composed of 5 convolutional layers and 3 maximum pooling layers, wherein the 5 convolutional layers comprise 4 convolutional layers of 3 × 3 and 1 convolutional layer of 2 × 2, and the 3 maximum pooling layers comprise 2 maximum pooling layers of 2 × 2 and 1 maximum pooling layer of 3 × 3;

performing convolution operation on the single knee image I by a first 3 × 3 convolution layer, performing pooling operation on the single knee image I after the convolution operation by a first 2 × 2 maximum pooling layer to obtain a feature map F1, performing convolution operation on the feature map F1 by a second 3 × 3 convolution layer, performing pooling operation on the feature map F1 after the convolution operation by a first 3 × 3 maximum pooling layer to obtain a feature map F2, performing convolution operation on the feature map F2 by a third 3 × 3 convolution layer, performing pooling operation on the feature map F2 after the convolution operation by a second 2 × 2 maximum pooling layer to obtain a feature map F3, performing convolution operation on the feature map F3 by a fourth 3 × 3 convolution layer to obtain a feature map F4, and performing convolution operation on the feature map F4 by a fifth 2 × 2 maximum pooling layer to obtain a feature map F5, wherein the feature map F5 is the extracted first image feature;

the preliminary target detection module comprises a1 × 1 convolution layer, a knee joint boundary frame coordinate operation layer and a first non-maximum value inhibition screening layer, wherein the 1 × 1 convolution layer convolves a characteristic diagram F5 to obtain two groups of vectors, namely a probability vector A1 of a knee joint and a knee joint boundary frame coordinate offset vector B1, the knee joint boundary frame coordinate operation layer determines a knee joint boundary frame according to A1 and B1 to obtain a knee joint area in the single knee image I, the first non-maximum value inhibition screening layer performs non-maximum inhibition screening on the knee joint area in the single knee image I to obtain a preliminary knee joint area in the single knee image I, and the single knee image I and the preliminary knee joint area in the single knee image I are sent to a second-level network;

the second-level network consists of a second feature extraction module and a final target detection module; the second feature extraction module is used for carrying out feature extraction on the single knee image I received from the first-level network and the preliminary knee joint region in the single knee image I to obtain second image features; the final target detection module performs target detection on the second image characteristics to obtain final knee joint boundary frame coordinates of the single knee image I and knee joint area key point coordinates of the single knee image I;

the second feature extraction module is composed of 4 convolutional layers, 3 maximum pooling layers and 1 full-link layer, wherein the 4 convolutional layers comprise 3 × 3 convolutional layers and 12 × 2 convolutional layers, and the 3 maximum pooling layers comprise 2 2 × 2 maximum pooling layers and 13 × 3 maximum pooling layer;

the first 3 × convolutional layer in the second feature extraction module performs convolution operation on a preliminary knee joint region in the single-knee image I, the first 2 × maximum pooling layer performs pooling operation on the preliminary knee joint region in the single-knee image I which completes the convolution operation to obtain a feature map F6, the second 3 × convolutional layer performs convolution operation on a feature map F6, the second 3 × maximum pooling layer performs pooling operation on a feature map F6 which completes the convolution operation to obtain a feature map F7, the third 3 × convolutional layer performs convolution operation on a feature map F7, the third 2 × maximum pooling layer performs pooling operation on a feature map F7 which completes the convolution operation to obtain a feature map F8, the fourth 3 × convolutional layer performs convolution operation on a feature map F8 to obtain a feature map F9, the first full-connection layer performs full-connection operation on the feature map F9 to obtain a feature map F10, and the feature map F10 is the second feature map F10;

the final target detection module comprises a second full-connection layer, a knee joint boundary frame, a key point coordinate operation layer of a knee joint area and a second non-maximum value inhibition screening layer; the second full-connection layer performs full-connection on the feature map F10 to obtain three groups of vectors, namely a probability vector A2 of the knee joint, a coordinate offset vector B2 of a boundary frame of the knee joint and a coordinate offset vector C of six key points in a knee joint region; determining the coordinates of the knee joint boundary box and the key point coordinates of the knee joint region of the single knee image I according to A2, B2 and C by the key point coordinate operation layer of the knee joint boundary box and the knee joint region; the second non-maximum inhibition screening layer carries out non-maximum inhibition screening on the knee joint region and the knee joint key point coordinates in the single knee image I, and outputs the single knee image I, the final knee joint region of the single knee image I and the knee joint region key point coordinates of the single knee image I;

secondly, training a knee joint area positioning network based on a multitask two-stage convolutional neural network, wherein the method comprises the following steps:

2.1 prepare the data of the knee joint area location network based on the multitask two-stage convolution neural network:

2.1.1, preprocessing M original images, namely X-ray medical images containing left and right knees, to obtain 2M single knee images subjected to histogram equalization, wherein M is a positive integer;

2.1.2.1 initializing variable m ═ 1;

2.1.2.2 manually and manually labeling 6 key points of the knee joint area of the mth single knee image; the 6 key points refer to 6 points on the boundaries of the knee joint gaps and osteophytes, and the 6 points on the left knee and the right knee are respectively femoral medial osteophyte points FM, femoral lateral osteophyte points FL, tibial medial osteophyte points TM, tibial lateral osteophyte points TL, joint gap medial points JSM and joint gap lateral points JSL; the 6 key points of the left knee and the right knee are in mirror symmetry;

2.1.2.3.1 center point coordinates (x) of 6 key points in the mth single knee image are calculated respectively_mid，y_mid) (ii) a Noting 6 key point coordinatesAre respectively (x)₁，y₁)、(x₂，y₂)、(x₃，y₃)、(x₄，y₄)、(x₅，y₅)、(x₆，y₆) Center point coordinates (x) of 6 key points_mid，y_mid) Comprises the following steps:

2.1.2.3.2 calculating the width w of the knee_kneeThe method comprises the following steps: the maximum abscissa x of 6 key points is calculated_maxAnd minimum abscissa x_minWidth of knee joint w_kneeIs the maximum abscissa x_maxAnd minimum abscissa x_minThe difference between, namely:

x_max＝max(x₁，x₂，x₃，x₄，x₅，x₆)；

x_mih＝min(x₁，x₂，x₃，x₄，x₅，x₆)；

w_knee＝x_max-x_min；

The coordinates represent the real knee area bounding box (upper left horizontal coordinate, upper left vertical coordinate, lower right horizontal coordinate, lower right vertical coordinate); the central points of the 6 key points are taken as the center, and the width of the knee joint is expanded by 0.65 times of the width of the knee joint, so that the knee joint area which is really interested, namely the labeled boundary box, and the calculation formula is as follows:

2.1.2.4 if M is more than or equal to 2M, rotating to 2.2; if M is less than 2M, making M equal to M +1, and switching to 2.1.2.2;

2.2.1 initializing variable m ═ 1;

2.2.2.1 initializing variable k ═ 1;

By point

As the upper left coordinate point of the randomly selected bounding box;

2.2.2.2 taking the width of the kth randomly chosen bounding box

And height

2.2.2.3, determining the coordinates of the kth randomly chosen bounding box by: order point

For the lower right coordinate point of the randomly selected bounding box,

randomly selected bounding box coordinates of

(ii) representing the kth randomly selected bounding box (upper left horizontal coordinate, upper left vertical coordinate, lower right horizontal coordinate, lower right vertical coordinate);

If it is

The kth randomly chosen bounding box is used

As a positive sample of training, go to 2.2.2.5; if it is

The kth randomly chosen bounding box is used

As part of the training, go to 2.2.2.5; if it is

Then will kA randomly selected bounding box

As a negative example of training, go 2.2.2.5;

2.2.2.5 if K is less than K, K is a positive integer, K is the total number of randomly selected bounding boxes on the mth single knee image, K is K +1, and then 2.2.2.2 is converted; if K is more than or equal to K, turning to 2.2.2.6;

2.2.2.6 if M is more than or equal to 2M, turning to 2.2.3; if M is less than 2M, making M equal to M +1, and rotating to 2.2.2;

2.2.3, reducing the scale of the training samples generated in the step 2.2.2, namely positive samples, partial samples and negative samples, to obtain an image with the length of × and the width of ×, wherein the number of image channels is 48 × 48 × 3;

2.2.4 carrying out mirror image operation on the training samples with the reduced size of 2.2.3 steps for augmentation, namely horizontally turning each training sample left and right, namely carrying out mirror image operation on each training sample to generate a first-stage network training sample, wherein the number of first-stage network positive samples is N₁₁The number of partial samples is N₁₂The number of negative samples is N₁₃，N₁₁、N₁₂And N₁₃Are all positive integers;

2.3, training the first-stage network by adopting a first-stage network training sample to obtain a trained first-stage network;

2.4.1 initializing variable m ═ 1;

2.4.2.1, generating a picture pyramid of the mth single-knee image I, wherein the method comprises the steps of multiplying the mth single-knee image I by different scaling factors s in sequence, wherein s is a positive real number, scaling the mth single-knee image I in different scales to obtain PP single-knee images with different sizes to form the picture pyramid of the single-knee image I, and PP is a positive integer;

2.4.2.2 initializing variable p ═ 1;

2.4.2.4 the preliminary target detection module detects the first image characteristic F5 by adopting a knee joint region detection method to obtain a knee joint region in the single knee image I;

2.4.2.5 if p is not less than PP, turning to 2.4.2.6; if p is less than PP, let p be p +1, go to 2.4.2.3;

2.4.2.6 the first non-maximum suppression screening layer adopts the non-maximum suppression algorithm NMS algorithm to filter all knee joint regions of the mth single knee image I to obtain N of the mth single knee image I₁Preliminary knee joint region, N₁Is a positive integer;

2.4.3.1 initializing variable n₁＝1，k＝1；

If it is

Will n be₁The individual preliminary knee joint regions are used as negative samples for the second level of network training, turn 2.4.3.3;

2.4.3.3 if n₁≥N₁Turning to 2.4.3.4; if n is₁<N₁Let n be₁＝n₁+1, 2.4.3.2;

2.4.3.4 randomly selecting the kth point on the mth single knee image

Will be dotted

As the upper left coordinate point of the randomly selected bounding box;

2.4.3.5 taking the width of the kth randomly chosen bounding box

And high;

2.4.3.6 determine the coordinates of the kth randomly chosen bounding box: dot

the kth randomly selected bounding box coordinate is

If it is

Go directly to 2.4.3.8;

2.4.3.8 if K is less than K, let K equal to K +1, go to 2.4.3.4, if K is greater than or equal to K, go to 2.4.4;

2.4.4 if M is more than or equal to 2M, rotating to 2.4.5; if M is less than 2M, making M equal to M +1, and rotating to 2.4.2;

2.4.5, reducing the sizes of the positive samples, the partial samples and the negative samples generated in the steps 2.4.3-2.4.4 to 48 × 48 × 3;

2.4.6 the positive sample, partial sample and negative sample after 2.4.5 steps of size reduction are subjected to mirror image operation and amplification, and the method comprises the following steps: turning each training sample in the left-right direction, namely performing mirror image operation to obtain training samples of a second-level network, wherein the number of positive samples of the second-level network training is N₂₁The number of partial samples is N₂₂The number of negative samples is N₂₃，N₂₁、N₂₂And N₂₃Are all positive integers;

2.5 training the second-level network by adopting the second-level network training sample generated in the step 2.4.6 to obtain a trained second-level network;

thirdly, preprocessing the double-knee X-ray film to be detected to obtain 2 single-knee images to be detected, which are subjected to histogram equalization processing;

fourthly, performing knee joint positioning on the 2 single knee images to be detected based on the trained knee joint area positioning network of the multitask two-stage convolution neural network, wherein the method comprises the following steps:

4.1 initializing variable d ═ 1;

4.2, generating a picture pyramid of the d-th single knee image to be detected, wherein the method comprises the steps of scaling the single knee image in different scales by a scaling factor of s to obtain PP single knee images in different sizes, and setting the size of each image to be H × W × 3 to be called the picture pyramid of the single knee image to be detected;

4.3.1 initializing variable p ═ 1;

4.3.2.1 convolution operation is carried out on the p picture in the picture pyramid of the d single knee image to be detected by the first 3 × 3 convolution layer in the first feature extraction module, pooling operation is carried out on the p picture after the convolution operation by the first 2 × 2 maximum pooling layer, and a feature picture F1 is output, wherein the convolution operation step length is 1, the maximum pooling operation step length is 2, the size of the p picture in the picture pyramid is H × W × 3, and the convolution operation and the pooling operation are carried out on the p picture in the first feature extraction module to obtain the final product

A size feature map F1;

the feature map F1 of the size is obtained by the second convolution and the maximum pooling operation layer

A size feature map F2;

4.3.2.3 the third 3 × 3 convolutional layer in the first feature extraction module performs convolution operation on the feature map F2And the third 2 × 2 maximal pooling layer performs pooling operation on the feature map F2 which completes the convolution operation to obtain a feature map F3, wherein the step size of the convolution operation is 1, the step size of the maximal pooling operation is 2,

obtained through a third convolution and a maximum pooling operation layer

A size feature map F3;

the feature map F3 of the size is obtained by the fourth convolution operation layer

A size feature map F4;

the feature map F4 of the size is obtained by a fifth convolution operation layer

A feature map F5 of size, wherein the feature map F5 is a first image feature extracted by the first-level network and input into the pth picture of the picture pyramid of the pth single knee image to be detected;

4.3.3 the primary target detection module of the first-level network detects the F5 by adopting a knee joint region detection method to obtain the knee joint region of the single knee image to be detected;

4.3.4 if p is less than PP, let p be p +1, change to 4.3.2; if p is more than or equal to PP, turning to 4.3.5;

4.3.5 thThe two non-maximum suppression screening layers adopt a non-maximum suppression algorithm to filter all knee joint bounding boxes in the d-th single knee image to be detected, the filtered bounding boxes frame a knee joint area on the d-th single knee image to be detected, and the N knee joint bounding boxes are output by the first-level network₁A preliminary knee joint region;

4.4.1 initializing variable n₁＝1；

4.4.2 outputting the nth image of the d single knee to be detected from the first-level network₁The dimensions of the preliminary knee joint regions were normalized to 48 × 48 × 3;

4.4.3.1 the first 3 × 3 convolutional layer pair in the second feature extraction Module₁Performing convolution operation on the initial knee joint region image, and performing convolution operation on the nth layer of the first 2 × 2 maximum pooling layer₁Performing pooling operation on the initial knee joint region image, and outputting a feature map F6, wherein the convolution operation step length is 1, the maximum pooling operation step length is 2, the size of the scale-normalized initial knee joint region image is 48 × 48 × 3, and a feature map F6 with the size of 23 × 23 × 32 is obtained through a first convolution and maximum pooling operation layer;

4.4.3.2 the second 3 × 3 convolutional layer in the second feature extraction module performs convolution operation on the feature map F6, the second 3 × 3 maximal pooling layer performs pooling operation on the feature map F6 which completes the convolution operation, and the feature map F7 is output, wherein the convolution operation step is 1, the maximal pooling operation step is 2, and the feature map F6 with the size of 23 × 23 × 32 is subjected to second convolution and maximal pooling operation to obtain a feature map F7 with the size of 10 × 10 × 64;

4.4.3.3 the third 3 × 3 convolutional layer in the second feature extraction module performs convolution operation on the feature map F7, the third 2 × 2 maximal pooling layer performs pooling operation on the feature map F7 which completes the convolution operation, and the feature map F8 is output, wherein the convolution operation step is 1, the maximal pooling operation step is 2, and the feature map F7 with the size of 10 × 10 × 64 is subjected to third convolution and maximal pooling operation to obtain the feature map F8 with the size of 4 × 4 × 64;

4.4.3.4 the fourth 3 × 3 convolutional layer in the second feature extraction module performs convolution operation on the feature map F8 and outputs a feature map F9, wherein the feature map F8 with the step size of 1 in the convolution operation of the fourth layer and the size of 4 × 4 × 64 obtains a feature map F9 with the size of 2 × 2 × 128 through the fourth convolution operation layer;

4.4.3.5 the first full-link layer in the second feature extraction module performs full-link operation on the feature map F9, and the feature map F9 with the size of output feature map F10: 2 × 2 × 128 passes through the first full-link layer to obtain a feature map F10 containing 256-dimensional vectors, wherein the feature map F10 is the extracted second image feature;

4.4.4 the final target detection module of the second level network processes F10, outputs the final knee joint region and the key point coordinates of the d-th single knee image to be detected, the method is:

4.4.4.1 the second full-link layer of the final target detection module of the second-level network performs full-link to F10 and outputs three groups of vectors, namely probability vector of knee joint

Knee joint bounding box coordinate offset vector

Knee joint key point coordinate offset vector

Wherein

There is a 1-dimensional vector that is,

there is a 4-dimensional vector that is,

there is a 12-dimensional vector;

Screening and calculating vectors to obtain a knee joint area of the d-th single knee image to be detected and key points of the knee joint area; vector quantity

Storing a probability value of the knee joint, identifying the knee joint as the probability value when the probability value is not lower than β, β is a second threshold, β∈ [0.5, 1]If, if

And knee joint key point coordinate offset vector

Is provided with

denotes the n-th₁Offset coordinates (offset abscissa, offset ordinate) of JSL points of individual knee joint regions; final n th₁Boundary frame coordinates of individual knee joint

I.e. n₁Coordinates of the bounding box of the individual knee joints (upper left horizontal coordinate, upper left vertical coordinate, lower right horizontal coordinate, lower right vertical coordinate) and coordinates of the 6 key points

I.e. n₁The coordinates of the FM points of the individual knee joint regions,

i.e. n₁The coordinates of the FL point of each knee joint region,

i.e. n₁The coordinates of the TM points of the individual knee joint regions,

i.e. n₁The coordinates of the TL point of the individual knee joint area,

i.e. n₁The coordinates of the JSL points of the individual knee joint regions,

i.e. n₁The coordinates of the JSM points for the individual knee regions are calculated as follows:

4.4.4.3 if n₁<N₁Let n be₁＝n₁+1, rotating to 4.4.2; if n is₁≥N₁Turning to 4.4.4.4;

4.4.4.4 the second non-maximum inhibition screening layer screens the knee joint bounding box in the d-th single knee image to be detected by adopting a non-maximum inhibition NMS algorithm to obtain the final bounding box of the d-th single knee image to be detected and the key points of the final knee joint area;

4.4.4.5 if d is less than 2, let d be d +1, turn 4.2; if d is more than or equal to 2, turning to the fifth step;

and fifthly, ending.

2. The knee joint positioning method based on the multitask two-stage convolution neural network as claimed in claim 1, wherein the method for preprocessing the original image in the step 2.1.1 is as follows:

2.1.1.1 randomly selecting M original images from an OAIbasepine public database, namely X-ray medical images containing left and right knees;

2.1.1.2 uniformly transforms M original images into dark background and light leg images: firstly, selecting a double knee image with a bright background and dark legs from M original images, and then carrying out image pixel inversion, namely subtracting an original pixel value from 255 to uniformly convert the M original images into an image with a dark background and bright legs;

where P is the pixel value of any pixel in the pre-processed image, P_maxFor the maximum pixel value in the pre-processed image, P_minFor minimum pixel value in the pre-processed image, P_newThe pixel value of any pixel in the processed image;

Then half the width

Get the whole and mark as

Finally, from the image of the agent 8 according to the width coordinate

Intercepting, namely dividing the picture of each agent 8 into 2 single knee images, and finally obtaining 2M single knee images;

2.1.1.5 histogram equalization is performed on each of the 2M single knee images to obtain 2M single knee images that have been histogram equalized.

3. The method of claim 1, wherein the steps of 2.2.2.2 and 2.4.3.5 are further characterized by taking the width of the kth randomly selected bounding box

And height

The method comprises the following steps: let the width of the mth single knee image be w_mHeight of h_mAnd take w_mAnd h_mIs half of the minimum value of (a), and a rounding operation is performed, i.e., int (min (w)_m，h_m)/2)，int(min(w_m，h_m) [ 2 ] denotes the ratio of min (w)_m，h_m) Rounded at [48, int (min (w) ]_m，h_m)/2)]Randomly taking an integer in the value range of (a) as the width of the randomly selected bounding box

And height

4. The method of claim 1, wherein M satisfies M>2000, said K satisfies K>50, the PP satisfies 0<PP<100, s satisfies 0<s is less than or equal to 1, N₁Satisfies 0-N₁≤100。

5. The knee joint positioning method based on the multitask two-stage convolutional neural network as claimed in claim 1, wherein the method for training the first-stage network in the 2.3 steps is:

Representing the ith training sample on the jth taskLoss, j is more than or equal to 1 and less than or equal to 3, and totally has 3 tasks of knee joint detection, knee joint boundary frame positioning and knee joint key point positioning, α_jAn importance degree coefficient α showing the importance degree coefficient of the jth task and the knee joint detection task₁1, importance coefficient α for knee joint bounding box positioning task₂0.5, importance coefficient α for knee joint key point localization task₃＝0；

2.3.3, if Q is more than or equal to 2 and less than or equal to Q +1, Q is the number of times of network training, Q is 10, and then 2.3.4 is carried out; otherwise, after training, the first-stage network after training, namely NET, is obtained_QAnd ending;

6. the knee joint positioning method based on the multitask two-stage convolutional neural network as claimed in claim 1, wherein in step 2.4.2.4 said preliminary target detection module uses knee joint region detection method to detect the first image feature F5 to obtain the knee joint region in the single knee image I, the knee joint region detection method is:

(

Representing the number of vectors, 1 representing the dimension of each vector), vector a1_pStoring a probability value for a knee joint, α∈ [0.5, 1 ] for the knee joint when the probability value is not less than a first threshold α](ii) a Knee joint boundary frame coordinate deviationShift quantity B1_p，B1_pIs composed of

(

Representing the number of vectors, 4 representing the dimension of each vector), 4 dimensions (x) of each vector₁，y₁，x₂，y₂) Representing the bounding box (left upper horizontal coordinate offset, left upper vertical coordinate offset, right lower horizontal coordinate offset, right lower vertical coordinate offset);

2.4.2.4.2 Knee Joint bounding Box coordinate operation layer pairs of preliminary object detection Module A1_pAnd B1_pScreening and calculating to obtain a knee joint area in the single knee image I; found A1_pN vector positions with medium probability value not lower than α

N is more than or equal to N and more than or equal to 1, N is more than or equal to 1, and B is 1_pFind the corresponding N vector positions

And 4-dimensional coordinate offset vector corresponding to the N positions

Wherein the content of the first and second substances,

coordinate offsets (upper left horizontal coordinate offset, upper left vertical coordinate offset, lower right horizontal coordinate offset, lower right vertical coordinate offset) of the nth bounding box are represented; calculating the coordinates of N knee joint region bounding boxes in the original single knee image I

wherein the content of the first and second substances,

and s represents the scaling factor of the original single knee image I.

7. The method of claim 1, wherein the step 2.4.2.4.1 provides a value of α of 0.6.

8. The knee joint positioning method based on the multitask two-stage convolutional neural network as claimed in claim 1, wherein the method for training the second-stage network in the 2.5 steps is as follows:

2.5.3, if T is equal to or more than 2 and is equal to or less than T, T is the number of times of network training, T is equal to 10, and then 2.5.4 is carried out; otherwise, after training, the second-level Network (NET) after training is obtained_TAnd ending;

9. The knee joint positioning method based on the multitask two-stage convolutional neural network as claimed in claim 1, wherein the third step of preprocessing the X-ray film of the double knees to be detected is as follows:

3.1 if the X-ray film of the double knees to be detected is that the background is bright and the two legs are dark, converting the X-ray film of the double knees to be detected into an image that the background is dark and the two legs are bright, namely subtracting the original pixel value of the X-ray film of the double knees to be detected from 255 to be used as a new image pixel value, converting the new image pixel value into an image that the background is dark and the two legs are bright, and converting the image into 3.2; if the X-ray film of the double knees to be detected is dark background and bright legs, directly rotating to 3.2;

P is the pixel value of each pixel of the pre-processed image, P_maxFor maximum pixel value of pre-processed image, P_minIs the minimum pixel value of the image before processing;

Then half the width

Get the whole and mark as

Finally, according to width coordinates [0, int (W/2) ] from the double knee image]、[int(W/2)+1，W-1]Intercepting, namely dividing the double knee image of the uint8 into 2 single knee images;

10. The method of claim 1, wherein the first non-maxima suppression screening layer at 2.4.2.6 filters all knee joint regions of the mth single knee image I with a non-maxima suppression algorithm, wherein the filtering threshold δ is 0.6; 4.3.5, setting a filtering threshold value delta to be 0.7 when the second non-maximum value inhibition screening layer adopts a non-maximum value inhibition algorithm to filter all knee joint boundary frames in the d-th single knee image to be detected; 4.4.4.4 step 4, the second non-maximum suppression screening layer screens the knee joint boundary frame in the d-th single knee image to be detected by adopting a non-maximum suppression algorithm, and the filtering threshold value delta is 0.7 when the final knee joint boundary frame and the final key point of the knee joint area are obtained.

11. The method of claim 1, wherein the value of β in 4.4.4.2 is 0.7.