CN107871316A

CN107871316A - A kind of X-ray hand bone interest region extraction method based on deep neural network

Info

Publication number: CN107871316A
Application number: CN201710975940.6A
Authority: CN
Inventors: 郝鹏翼; 陈易京; 尤堃; 吴福理; 黄玉娇; 白琮
Original assignee: Zhejiang University of Technology ZJUT
Current assignee: Zhejiang Feitu Imaging Technology Co ltd
Priority date: 2017-10-19
Filing date: 2017-10-19
Publication date: 2018-04-03
Anticipated expiration: 2037-10-19
Also published as: CN107871316B

Abstract

A kind of X-ray hand bone interest region extraction method based on deep neural network, to original hand bone X-ray image, remove the part that image both sides black background is embedded in word；Unification is highlighted to original hand bone X-ray image, denoising operation；Simultaneously training pattern M1 is sampled, obtains the hand bone X-ray image Output2 of no word；Output3 is obtained for Output2 normalization size；Sample and training pattern M2, the part intersecting to the hand bone in Output3, background, hand bone background judge；The image sliding window in Output3 is judged based on model M 2, hand bone label mapping graph Output4 is worth to according to judgement；Based on Output3 and Output4, the image Output5 of only hand bone is obtained；Output5 is optimized to obtain final hand bone interest region.The present invention can obtain the hand bone interest region in X-ray automatically.

Description

A kind of X-ray hand bone interest region extraction method based on deep neural network

Technical field

It is more particularly to a kind of to be applied to human body hand bone X the present invention relates to medical image analysis field and machine learning field The method of mating plate image interest extracted region, belongs to the medical image analysis field based on deep learning.

Background technology

Bone age, the abbreviation stone age, determined by the bone calcification degree of children.Stone age can relatively accurately reflect people From birth to the developmental level of each age level during full maturity, moreover, endocrine system disease, maldevelopment, In terms of dystrophia, the analysis of genetic disease and metabolic disease and diagnosis, the stone age has the function that important.Radiologist By contrasting the X-ray and the standard state at their corresponding ages of children's hand, to measure the stone age of children.This technology is very steady It is fixed, it continue for about good decades.With the raising of parents' health perception, children's quantity and the day for doing stone age detection are all Increase, but the present situation of medical institutions does not change：The practitioner of paediatrics image department is few, and tablet reading efficiency is not It is high.

With GPU accelerate depth learning technology, by artificial intelligence come complete for the stone age automatic detection into In order to possible.Deep learning needs a large amount of hand bone X-ray pictures to be trained, to reach good Detection results.It is but original There are many useless information, for example, word, noise etc. in hand bone X-ray picture；And in original hand bone X-ray picture due to Each human hand puts disunity, and the illumination disunity of machine, it is not solid to cause the position of hand and brightness in X-ray picture Fixed, these factors can cause the results of learning of detection model bad, so as to cause larger error in judgement.So for based on Deep learning stone age prediction for, exactly obtain X-ray in hand bone region and also by these hand bone regional processings for detection The good sample that model can be learnt is vital.But the mark of hand position is artificially carried out, not only under efficiency, And the annotation results of different mark persons can be variant.

The content of the invention

In order to overcome, the efficiency of existing X-ray hand bone interest region extracting mode is low, error is larger, precision is relatively low not Foot, the present invention propose that a kind of efficiency is higher, error is smaller, the higher X-ray hand bone based on deep neural network of precision is emerging Interesting region extraction method, the hand bone region in X-ray can be not only obtained automatically, but also can automatically remove noise, Adjust brightness.

In order to solve the above-mentioned technical problem the technical solution adopted in the present invention is：

A kind of X-ray hand bone interest region extraction method based on deep neural network, comprises the following steps：

Step 1, to original hand bone X-ray grayscale image, the black background for removing image both sides is embedded in the part of word, So that original image gets rid of most word；

Step 2, original hand bone X-ray grayscale image is carried out highlighting operation, the step, first image carried out overall bright Degree is assessed, and just carries out highlighting operation for the image of luminance shortage, carries out denoising operation after highlighting, obtained image set is referred to as Output1；

Step 3, is sampled and training pattern M1, the model be used to remove in X-ray in Output1 hand bone nearby and Word on hand bone, obtain the hand bone X-ray grayscale image Output2 of no word；

Step 4, all images in Output2 are normalized into size, to keep high wide in the same size, it is black first to carry out both sides Bottom padding, when picture is that width is bigger than high numerical value, take operation of the both sides to internal cutting；Then image is contracted to again (512,512), new image set are referred to as Output3；

Step 5, is sampled and training pattern M2, the model are used for (big to the hand bone in Output3, background, hand bone background Small is 16*16) intersecting three parts are judged；

Step 6, there is model M 2, the image sliding window in Output3 is judged, the value that will determine that is added to each pixel In, the difference that then each pixel obtains according to oneself judges the size of the value of type, and the pixel value of hand bone parts is set to 255, the pixel value of background parts is set to 0, so as to obtain bone Closing Binary Marker figure in one's hands, referred to as Output4；

Step 7, Output4 hand bone Closing Binary Marker figure is compareed, only hand bone region is obtained on the basis of Output3 Image, but can still have background impurities in the image simultaneously, then need to carry out once maximum UNICOM region herein Calculating, go the removal of impurity, obtain Output5；

Step 8, the phenomenon for being judged as hand bone tissue due to aperture around image in Output5 be present, and it with most Big UNICOM region is connected to what is connect；Because the length of aperture is far longer than the width of hand bone, then, to every image in Output5 Base section does comparison in difference, so as to remove the aperture in image, obtains final hand bone interest region.

So far, operating process narration finishes.

Further, in the step 1, removing the method that embedding text enters character segment in black background is：First image is converted into Numerical value array, the most row of left and right two start to intermediate detection, due to pure white word and the particularity of black background, if there is it is non- The number of black exceedes the 10% of whole one row, then may determine that and start the embedding word segment of non-black background here, then before Part will all be cut away.

Further, in the step 2, overall brightness evaluation process is：If former image O resolution ratio is M × N, The value of each pixel is t_ij, pass through formula Calculate the entirety of the image Brightness.Here the pixel that pixel value is more than 120 is considered.For different Aug values, highlighted with different parameters.

Further, in the step 3, the training for model M 1, sample collection procedure is：Intercept former grayscale image In containing letter 100*100 sample, as positive sample；The same size sample for not including letter completely is intercepted again, as negative Sample；Structure two-dimensional convolution neutral net process be：

Step 3.1 input picture is by Conv2D convolutional layers extraction local feature, and input size is 100*100*1, then It is relu activation primitive layers, Maxpooling ponds layer.

The different three layers of Conv2D convolutional layers of step 3.2 extraction size, activation primitive layer and pond Rotating fields therein and It is 3.1 consistent.

Step 3.3 connects above-mentioned 4 layers of convolutional layer and ensuing full articulamentum by a Flatten layer.

Step 3.4 features described above is by first full articulamentum, and internal sequence includes Dense layers, relu active coatings, Dropout layers prevent over-fitting.Followed by second full articulamentum, internal sequence includes Dense, and sigmoid activation Layer, obtains output result.

Wherein, letter is looked in image by SelectiveSearch (selective search) method and judged, because word Female particularity, L letters can be found by this method, and 100*100 size is precisely located into.Due to other letters Focus primarily upon the image upper right corner, and upper right corner word will not image hand bone region judgement, then finding L and removing it Afterwards, upper right comer region is filled according to the average value of background week boundary values.

In the step 5, the training process of model M 2 is：After word interference is eliminated by step 4, this mould Type is used for differentiating background, bone, bone background coexisting region.This three class is sampled respectively using sliding window, is defined as 0,1,2 three Class, corresponds to background respectively, bone background coexisting region, bone, and the process of structure two-dimensional convolution neural network model is：

Step 5.1 input picture extracts local feature by two-dimensional convolution layer, and input size is 16*16*1, followed by Relu activation primitive layers, Maxpooling ponds layer；

The different one layer of Conv2D two-dimensional convolution layer of step 5.2 extraction size, activation primitive layer therein and Maxpooling Rotating fields are consistent with 5.1；

Step 5.3 connects above-mentioned convolutional layer and ensuing full articulamentum by a Flatten layer；

Step 5.4 features described above is by first full articulamentum, and internal sequence includes Dense layers, relu active coatings, Dropout layers prevent over-fitting, and followed by second full articulamentum, internal sequence includes Dense, and softmax activation Layer (output is three classes), obtains output result.

In the step 6, the process that sliding window judges is as follows：

Step 6.1 carries out sliding window, 16*16, step-length 1 to input picture；

Step 6.2 is judged, obtained value x is for 16*16 in patch for 16*16 patch with model M 2 The x values of pixel all add 1, it can be seen that each pixel has a process for the different x values quantity of statistics here, defines val [512] [512] [3] count the quantity of each pixel difference x values；

Step 6.3 defines an output array result, for each point in result, in corresponding val arrays, only Compare the quantity of 0 class and 2 classes, if 2 class quantity are more, result corresponding points are filled with as 255, are white, otherwise be exactly 0, black；

Due to sliding window plus the statistics of pixel is corresponded to, complexity is O (n × n × m × m), because we only need single-point to look into Ask, carry out the reduction of complexity by tree-shaped array here, Statistical Complexity is reduced to O (n × n × log²), (n) single-point is inquired about Complexity is log²(n)。

In the step 8, the process of comparison in difference is：

Step 8.1：Image is first converted to numerical value array, due to shadow presence and bottom, therefore only takes one to bottom up The cross section (patch) for determining height is operated, and is highly determined according to the average height of shadow；

Step 8.2：White statistics is carried out to each row in patch, holds up to white line and minimum white line；

Step 8.3：For this two row of reservation, it is compared by column, one white one black as difference, writes down different quantity t；

Step 8.4：When t is more than the difference threshold values of setting, that is, judge shadow occur, then give up to fall between this two row Part.If being not more than threshold values, it is maintained for constant；

Step 8.5：Try again largest connected region calculating, due to having given up the part between shadow and hand bone, this Secondary hand bone region will be retained, while also been removed shadow part.

The present invention technical concept be：Text information in X-ray image is detected using deep neural network, and using deep Degree neutral net is classified to the different zones content in X-ray image, and obtained model can be extracted rapidly automatically Hand bone area-of-interest.

In the flow that the present invention provides, first model M1, (represented for alphabetical L most obvious in initial X-ray image Left hand), R (representing the right hand) and some residual text informations, by sample train, distinguish itself and other backgrounds, so as to Task of alphabetical information is removed from artwork is realized, the hand bone interest extracted region after being has established important basis.Second The purpose of individual model M 2 be bone label in one's hands mapping, the model is to part interested in input picture and part of loseing interest in Made a distinction by different labels, the present invention's is by being set to white and black marking.Due in model M 1 Except word, so in this stage, model is mainly used to distinguish three classes, and background, bone and tissue, background skeletal tissue coexist Region.By being sampled to input picture sliding window, testing result is counted on each pixel afterwards, obtains mark mapping.

Method with traditional artificial extraction hand bone area-of-interest is compared, advantage of the invention is that：1. greatly carry The high efficiency for obtaining hand bone interest region.2. unified processing, it is possible to reduce error caused by manual operation.3. it can obtain More accurately result is extracted to than artificial.4. different phase different task, the operating process of layering reduces extreme picture institute The paralysis risk for the whole flow process brought, it is easier to safeguard and improve.

Brief description of the drawings

The flow chart of X-ray hand bone interest extracted regions of the Fig. 1 based on deep neural network.

Fig. 2 is used for the structural representation of model M 1 for removing word.

Fig. 3 is used for constructing the structural representation of model M 2 of hand bone label mapping.

Convolution module structural representation in Fig. 4 model Ms 1 and model M 2.

Embodiment

The invention will be further described below in conjunction with the accompanying drawings.

1~Fig. 4 of reference picture, for Output1 sampling work, mainly pass through figure of the sliding window sampling with text information Piece, especially big alphabetical L.Almost it can be completed by training pattern M1 for whether there is text information in picture on 100% ground Detection.When Output1 is input in model, big letter conduct pair is incited somebody to action using selective search (SelectiveSearch) As detecting, simultaneously for other words for being present in corner, using the strategy of unified covering.So eliminate to original Big image carries out the time of sliding window, and and can enough effectively removes text information.

Image is normalized to uniform sizes by the present invention after text information is removed, and this will reduce follow-up acquisition hand bony Time complexity during the mark mapping in interest region.For the acquisition of mark mapping, detailed narration is carried out here：

1. model M 2 is that 16*16 sample can be classified, three classes, background, bone and tissue, the back of the body are specifically divided into Scape skeletal tissue coexisting region.

2. by the sliding window to Output3, step-length 1, that is, shared about 500*500 sample are to be detected.

3. the testing result of each sample, it can all carry out counting each classification on 16*16 pixel of the sample Testing result.So needing 512*512*3 array to be counted, 0,1,2 correspond to background, skeletal tissue's background respectively Coexisting region, skeletal tissue.Statistics is necessary, because it was assumed that previous pattern detection is to being background, next pattern detection To being coexisting region, whole pixels of this step-length institute difference will be considered as finally background, so as to the side in coexisting region Boundary is divided.According to the statistics of three classifications in pixel, the 0 classification high pixel is just set to 0 (black), on the contrary then be 255 (whites).It is white that finally we, which can obtain bone region (ROI) in one's hands, and background noise is black entirely.

4. because the time complexity of the algorithm is high, therefore we used tree-shaped array to optimize, and will be conventional Tree-shaped array expand to the tree-shaped array of two dimension.

5. after obtaining mark mapping, some white noises are had unavoidably, here by calculating and retaining maximum area of UNICOM Domain removes white noise.In addition, the shadow of hand bone region and image bottom, is overlapped sometimes, and shadow and hand bone tissue are again It is extremely similar so that model can not accurately make a distinction it, here by entering to the bottom up partial cross sectional of image Row Difference test, specific practice are：Target area is first converted into array of data, be calculated the rows of most white points with it is minimum white They are then compared, or being white, or being black, are otherwise then considered as difference by the row of point by column.Difference is big In threshold value, then caused by being considered as shadow, the region between this two row is given up to fall, try again maximum UNICOM region afterwards Calculate, then can get rid of noise caused by shadow.

Example：The range of age that the hand bone X-ray image used includes is 0 years old to 18 years old, totally 944 samples.Will be therein 632 samples are as training set, training pattern M1 and model M 2, and remaining 312 samples are as test set.Checking collection and training set Overlap.The structure and test process of two models are introduced separately below.

Model M 1：

Step 1.1, depth convolutional neural networks are built, concrete structure is as shown in Figure 2.

Step 1.1.1：This convolutional neural networks includes 4 convolution modules, a Flatten layer and two full articulamentums Composition

Step 1.1.2：In convolutional layer, convolution kernel size is 1*3, and input size is (1,100,100), convolution nuclear volume with Network deeply increases, and is followed successively by 6,12,24,48.Each convolution module also includes relu active coatings, and Maxpooling Pond layer, pool_size are (1,2).

Step 1.1.3：Before full articulamentum, one layer of flatten layer is had, to flatten the output of convolutional layer.

Step 1.1.4：In full articulamentum, first full articulamentum has Dropout layers to prevent over-fitting, parameter 0.5, Full articulamentum afterwards is exported by sigmoid active coatings.

Step 1.2, data sampling and model training

Step 1.2.1：Hand bone X-ray image is gray-scale map, port number 1.500 images are sampled, every Sample of 99 samples of figure grab sample plus an alphabetical L.All samples are connected as a numpy array again, first Dimension is sample size, thus obtains the four-dimensional array of one (50000,1,100,100) as training set；Test set is taken together The mode of sample is handled as (10000,1,100,100).Checking collection and training set overlap.

Step 1.2.2：Model is by the way of batch training, training set maker and each batch of checking collection maker Sample number be 120, train 200 rounds altogether, optimized using logarithm loss function, and using rmsprop algorithms.Model Only retain accuracy highest model.

Step 1.3, model measurement

Effect for model M 2, operating process is referred to, is repeated no more here.

Model M 2：

Step 2.1, depth convolutional neural networks are built, concrete structure is as shown in Figure 2.

Step 2.1.1：This convolutional neural networks includes 2 convolution modules, a Flatten layer and two full articulamentums Composition

Step 2.1.2：In convolutional layer, convolution kernel size is 1*3, and input size is (1,16,16), and convolution nuclear volume is with net Network deeply increases, and is followed successively by 6,24.Each convolution module also includes relu active coating, and Maxpooling layers, Pool_size is (1,2).

Step 2.1.3：Before full articulamentum, one layer of flatten layer is had, to flatten the output of volume basic unit.

Step 2.1.4：In full articulamentum, first full articulamentum inputs 24 outputs before in 96 input points, And there are Dropout layers to prevent over-fitting, parameter 0.05, full articulamentum afterwards is exported by softmax active coating, Export as three classes.

Step 2.2, data sampling and model training

Step 2.2.1：Hand bone X-ray image is gray-scale map, port number 1.Sliding window sampling is carried out to 500 images, 16*16 sliding window, step-length 32, excessively unique for background, step-length 8, and this part sample can be doubled to handle, To increase its weight in model training.Thus the four-dimensional array of one (100000,1,16,16) is obtained as training set, Wherein first dimension is sample size；Test set takes same mode to handle for (20000,1,16,16).Checking collection and instruction Practice collection to overlap.

Step 2.2.2：Model is by the way of batch training, training set maker and each batch of checking collection maker Sample number be 120, train 2000 rounds altogether, loss uses polytypic logarithm loss function, and optimizer is selected adam.Model only retains accuracy highest model.

Step 2.3, model measurement

By the operation of above-mentioned steps, you can realize the extraction in hand bone interest region in opponent's bone X-ray image.

Above-described specific descriptions, the purpose, technical scheme and beneficial effect of invention are carried out further specifically It is bright, the specific embodiment that the foregoing is only the present invention is should be understood that, for explaining the present invention, is not used to limit this The protection domain of invention, within the spirit and principles of the invention, any modification, equivalent substitution and improvements done etc. all should Within protection scope of the present invention.

Claims

A kind of 1. X-ray hand bone interest region extraction method based on deep neural network, it is characterised in that：Methods described Comprise the following steps：

Step 1, to original hand bone X-ray grayscale image, the black background for removing image both sides is embedded in the part of word, so as to So that original image gets rid of most word；

Step 2, original hand bone X-ray grayscale image is carried out highlighting operation, the step, first carrying out overall brightness to image comments Estimate, just carry out highlighting operation for the image of luminance shortage, carry out denoising operation after highlighting, obtained image set is referred to as Output1；

Step 3, simultaneously training pattern M1 is sampled, the model is used to remove in the X-ray in Output1 near hand bone and hand bone On word, obtain the hand bone X-ray grayscale image Output2 of no word；

Step 4, all images in Output2 are normalized into size, to keep high wide in the same size, first carry out both sides black matrix and fill out Operation is filled, when picture is that width is bigger than high numerical value, takes operation of the both sides to internal cutting.Then image is contracted to again (512,512), new image set are referred to as Output3；

Step 5, is sampled and training pattern M2, the model are used for the hand bone in Output3, background, and (size is hand bone background 16*16) intersecting three parts are judged；

Step 6, there is model M 2, the image sliding window in Output3 is judged, the value that will determine that is added in each pixel, connect The size that the difference that each pixel obtains according to oneself judges the value of type, the pixel value of hand bone parts is set to 255, the back of the body The pixel value of scape part is set to 0.So as to obtain bone Closing Binary Marker figure in one's hands, referred to as Output4；

Step 7, Output4 hand bone Closing Binary Marker figure is compareed, the shadow in only hand bone region is obtained on the basis of Output3 Picture, but can still have background impurities in the image simultaneously, then need to carry out once the meter in maximum UNICOM region herein Calculate, go the removal of impurity, obtain Output5；

Step 8, the phenomenon for being judged as hand bone tissue due to aperture around image in Output5 be present, and it joins with maximum Logical region is connected to what is connect, because the length of aperture is far longer than the width of hand bone, then to the base portion of every image in Output5 Divide and do comparison in difference, so as to remove the aperture in image, obtain final hand bone interest region.
2. a kind of X-ray hand bone interest region extraction method based on deep neural network as claimed in claim 1, its It is characterised by：In the step 1, removing the method that embedding text enters character segment in black background is：Image is first converted into numbered Group, the most row of left and right two start to intermediate detection, due to pure white word and the particularity of black background, if there is non-black Number exceedes the 10% of whole one row, then may determine that and start the embedding word segment of non-black background here, then part before To all it be cut away.
A kind of 3. X-ray hand bone interest region side of automatically extracting based on deep neural network as claimed in claim 1 or 2 Method, it is characterised in that：In the step 2, overall brightness evaluation process is：If former image O resolution ratio is M × N, often The value of one pixel is t_ij, pass through formulaThe entirety for calculating the image is bright Degree.Here the pixel that pixel value is more than 120 is considered.For different Aug values, highlighted with different parameters.
A kind of 4. X-ray hand bone interest region side of automatically extracting based on deep neural network as claimed in claim 1 or 2 Method, it is characterised in that：In the step 3, the training for model M 1, sample collection procedure is：Intercept and contain in former grayscale image The 100*100 of letter sample, as positive sample；The same size sample for not including letter completely is intercepted again, as negative sample； Structure two-dimensional convolution neutral net process be：

Step 3.1 input picture extracts local feature by Conv2D convolutional layers, and input size is 100*100*1, followed by Relu activation primitive layers, Maxpooling ponds layer；

Three layers of different step 3.2 extraction size Conv2D convolutional layers, activation primitive layer therein and pond Rotating fields and step It is 3.1 consistent；

Step 3.3 connects above-mentioned 4 layers of convolutional layer and ensuing full articulamentum by a Flatten layer；

Step 3.4 features described above includes Dense layers, relu active coatings, Dropout by first full articulamentum, internal sequence Layer prevents over-fitting, and followed by second full articulamentum, internal sequence includes Dense, and sigmoid active coatings, obtains Output result；

Letter is looked in image by selective search method and judged, because the particularity of letter, L letters can be by this Method is found, and 100*100 size is precisely located into；Because other letters focus primarily upon the image upper right corner, and Upper right corner word will not image hand bone region judgement, then find L and remove after, to upper right comer region according to background week The average value of boundary values is filled.
A kind of 5. X-ray hand bone interest region side of automatically extracting based on deep neural network as claimed in claim 1 or 2 Method, it is characterised in that：In the step 5, the training of model M 2, after word interference is eliminated by step 4, this mould Type is mainly used to differentiate background, bone, bone background coexisting region.This three class is sampled respectively using sliding window, is defined as 0,1,2 Three classes, background, bone background coexisting region, bone are corresponded to respectively；Structure two-dimensional convolution neural network model process be：

For step 5.1 input picture by two-dimensional convolution layer extraction local feature, input size is 16*16*1, is swashed followed by relu Function layer living, Maxpooling ponds layer；

One layer of different Conv2D two-dimensional convolution layer of step 5.2 extraction size, activation primitive layer and Maxpooling layers therein Structure is consistent with 5.1；

Step 5.3 connects above-mentioned convolutional layer and ensuing full articulamentum by a Flatten layer；

Step 5.4 features described above includes Dense layers, relu active coatings, Dropout by first full articulamentum, internal sequence Layer prevents over-fitting；Followed by second full articulamentum, internal sequence includes Dense, and sotfmax active coatings, obtains Output result.
A kind of 6. X-ray hand bone interest region side of automatically extracting based on deep neural network as claimed in claim 1 or 2 Method, it is characterised in that：In the step 6, the process that sliding window judges is：

Step 6.1 carries out sliding window, 16*16, step-length 1 to input picture；

Step 6.2 is judged, obtained value x is for 16*16 pixel in patch for 16*16 patch with model M 2 X values all plus 1, it can be seen that each pixel has a process for the different x values quantity of statistics, definition val [512] [512] here [3] quantity of each pixel difference x values is counted；

Step 6.3 defines an output array result, for each point in result, in corresponding val arrays, only compares The quantity of 0 class and 2 classes, if 2 class quantity are more, result corresponding points are filled with as 255, are white, otherwise are exactly 0, black Color.
A kind of 7. X-ray hand bone interest region side of automatically extracting based on deep neural network as claimed in claim 1 or 2 Method, it is characterised in that：In the step 8, the process of comparison in difference is：

Step 8.1：Image is first converted to numerical value array, due to shadow presence and bottom, therefore only takes certain height to bottom up The cross section patch of degree is operated, and is highly determined according to the average height of shadow；

Step 8.2：White statistics is carried out to each row in patch, holds up to white line and minimum white line；

Step 8.3：For this two row of reservation, it is compared by column, one white one black as difference, writes down different quantity t；

Step 8.4：When t is more than the difference threshold values of setting, that is, judges shadow occur, then give up to fall the portion between this two row Point.If being not more than threshold values, it is maintained for constant；

Step 8.5：Try again largest connected region calculates, due to having given up the part between shadow and hand bone, this time hand Bone region will be retained, while also been removed shadow part.