The content of the invention
In order to overcome, the efficiency of existing X-ray hand bone interest region extracting mode is low, error is larger, precision is relatively low not
Foot, the present invention propose that a kind of efficiency is higher, error is smaller, the higher X-ray hand bone based on deep neural network of precision is emerging
Interesting region extraction method, the hand bone region in X-ray can be not only obtained automatically, but also can automatically remove noise,
Adjust brightness.
In order to solve the above-mentioned technical problem the technical solution adopted in the present invention is:
A kind of X-ray hand bone interest region extraction method based on deep neural network, comprises the following steps:
Step 1, to original hand bone X-ray grayscale image, the black background for removing image both sides is embedded in the part of word,
So that original image gets rid of most word;
Step 2, original hand bone X-ray grayscale image is carried out highlighting operation, the step, first image carried out overall bright
Degree is assessed, and just carries out highlighting operation for the image of luminance shortage, carries out denoising operation after highlighting, obtained image set is referred to as
Output1;
Step 3, is sampled and training pattern M1, the model be used to remove in X-ray in Output1 hand bone nearby and
Word on hand bone, obtain the hand bone X-ray grayscale image Output2 of no word;
Step 4, all images in Output2 are normalized into size, to keep high wide in the same size, it is black first to carry out both sides
Bottom padding, when picture is that width is bigger than high numerical value, take operation of the both sides to internal cutting;Then image is contracted to again
(512,512), new image set are referred to as Output3;
Step 5, is sampled and training pattern M2, the model are used for (big to the hand bone in Output3, background, hand bone background
Small is 16*16) intersecting three parts are judged;
Step 6, there is model M 2, the image sliding window in Output3 is judged, the value that will determine that is added to each pixel
In, the difference that then each pixel obtains according to oneself judges the size of the value of type, and the pixel value of hand bone parts is set to
255, the pixel value of background parts is set to 0, so as to obtain bone Closing Binary Marker figure in one's hands, referred to as Output4;
Step 7, Output4 hand bone Closing Binary Marker figure is compareed, only hand bone region is obtained on the basis of Output3
Image, but can still have background impurities in the image simultaneously, then need to carry out once maximum UNICOM region herein
Calculating, go the removal of impurity, obtain Output5;
Step 8, the phenomenon for being judged as hand bone tissue due to aperture around image in Output5 be present, and it with most
Big UNICOM region is connected to what is connect;Because the length of aperture is far longer than the width of hand bone, then, to every image in Output5
Base section does comparison in difference, so as to remove the aperture in image, obtains final hand bone interest region.
So far, operating process narration finishes.
Further, in the step 1, removing the method that embedding text enters character segment in black background is:First image is converted into
Numerical value array, the most row of left and right two start to intermediate detection, due to pure white word and the particularity of black background, if there is it is non-
The number of black exceedes the 10% of whole one row, then may determine that and start the embedding word segment of non-black background here, then before
Part will all be cut away.
Further, in the step 2, overall brightness evaluation process is:If former image O resolution ratio is M × N,
The value of each pixel is tij, pass through formula Calculate the entirety of the image
Brightness.Here the pixel that pixel value is more than 120 is considered.For different Aug values, highlighted with different parameters.
Further, in the step 3, the training for model M 1, sample collection procedure is:Intercept former grayscale image
In containing letter 100*100 sample, as positive sample;The same size sample for not including letter completely is intercepted again, as negative
Sample;Structure two-dimensional convolution neutral net process be:
Step 3.1 input picture is by Conv2D convolutional layers extraction local feature, and input size is 100*100*1, then
It is relu activation primitive layers, Maxpooling ponds layer.
The different three layers of Conv2D convolutional layers of step 3.2 extraction size, activation primitive layer and pond Rotating fields therein and
It is 3.1 consistent.
Step 3.3 connects above-mentioned 4 layers of convolutional layer and ensuing full articulamentum by a Flatten layer.
Step 3.4 features described above is by first full articulamentum, and internal sequence includes Dense layers, relu active coatings,
Dropout layers prevent over-fitting.Followed by second full articulamentum, internal sequence includes Dense, and sigmoid activation
Layer, obtains output result.
Wherein, letter is looked in image by SelectiveSearch (selective search) method and judged, because word
Female particularity, L letters can be found by this method, and 100*100 size is precisely located into.Due to other letters
Focus primarily upon the image upper right corner, and upper right corner word will not image hand bone region judgement, then finding L and removing it
Afterwards, upper right comer region is filled according to the average value of background week boundary values.
In the step 5, the training process of model M 2 is:After word interference is eliminated by step 4, this mould
Type is used for differentiating background, bone, bone background coexisting region.This three class is sampled respectively using sliding window, is defined as 0,1,2 three
Class, corresponds to background respectively, bone background coexisting region, bone, and the process of structure two-dimensional convolution neural network model is:
Step 5.1 input picture extracts local feature by two-dimensional convolution layer, and input size is 16*16*1, followed by
Relu activation primitive layers, Maxpooling ponds layer;
The different one layer of Conv2D two-dimensional convolution layer of step 5.2 extraction size, activation primitive layer therein and
Maxpooling Rotating fields are consistent with 5.1;
Step 5.3 connects above-mentioned convolutional layer and ensuing full articulamentum by a Flatten layer;
Step 5.4 features described above is by first full articulamentum, and internal sequence includes Dense layers, relu active coatings,
Dropout layers prevent over-fitting, and followed by second full articulamentum, internal sequence includes Dense, and softmax activation
Layer (output is three classes), obtains output result.
In the step 6, the process that sliding window judges is as follows:
Step 6.1 carries out sliding window, 16*16, step-length 1 to input picture;
Step 6.2 is judged, obtained value x is for 16*16 in patch for 16*16 patch with model M 2
The x values of pixel all add 1, it can be seen that each pixel has a process for the different x values quantity of statistics here, defines val [512]
[512] [3] count the quantity of each pixel difference x values;
Step 6.3 defines an output array result, for each point in result, in corresponding val arrays, only
Compare the quantity of 0 class and 2 classes, if 2 class quantity are more, result corresponding points are filled with as 255, are white, otherwise be exactly
0, black;
Due to sliding window plus the statistics of pixel is corresponded to, complexity is O (n × n × m × m), because we only need single-point to look into
Ask, carry out the reduction of complexity by tree-shaped array here, Statistical Complexity is reduced to O (n × n × log2), (n) single-point is inquired about
Complexity is log2(n)。
In the step 8, the process of comparison in difference is:
Step 8.1:Image is first converted to numerical value array, due to shadow presence and bottom, therefore only takes one to bottom up
The cross section (patch) for determining height is operated, and is highly determined according to the average height of shadow;
Step 8.2:White statistics is carried out to each row in patch, holds up to white line and minimum white line;
Step 8.3:For this two row of reservation, it is compared by column, one white one black as difference, writes down different quantity
t;
Step 8.4:When t is more than the difference threshold values of setting, that is, judge shadow occur, then give up to fall between this two row
Part.If being not more than threshold values, it is maintained for constant;
Step 8.5:Try again largest connected region calculating, due to having given up the part between shadow and hand bone, this
Secondary hand bone region will be retained, while also been removed shadow part.
The present invention technical concept be:Text information in X-ray image is detected using deep neural network, and using deep
Degree neutral net is classified to the different zones content in X-ray image, and obtained model can be extracted rapidly automatically
Hand bone area-of-interest.
In the flow that the present invention provides, first model M1, (represented for alphabetical L most obvious in initial X-ray image
Left hand), R (representing the right hand) and some residual text informations, by sample train, distinguish itself and other backgrounds, so as to
Task of alphabetical information is removed from artwork is realized, the hand bone interest extracted region after being has established important basis.Second
The purpose of individual model M 2 be bone label in one's hands mapping, the model is to part interested in input picture and part of loseing interest in
Made a distinction by different labels, the present invention's is by being set to white and black marking.Due in model M 1
Except word, so in this stage, model is mainly used to distinguish three classes, and background, bone and tissue, background skeletal tissue coexist
Region.By being sampled to input picture sliding window, testing result is counted on each pixel afterwards, obtains mark mapping.
Method with traditional artificial extraction hand bone area-of-interest is compared, advantage of the invention is that:1. greatly carry
The high efficiency for obtaining hand bone interest region.2. unified processing, it is possible to reduce error caused by manual operation.3. it can obtain
More accurately result is extracted to than artificial.4. different phase different task, the operating process of layering reduces extreme picture institute
The paralysis risk for the whole flow process brought, it is easier to safeguard and improve.
Embodiment
The invention will be further described below in conjunction with the accompanying drawings.
1~Fig. 4 of reference picture, for Output1 sampling work, mainly pass through figure of the sliding window sampling with text information
Piece, especially big alphabetical L.Almost it can be completed by training pattern M1 for whether there is text information in picture on 100% ground
Detection.When Output1 is input in model, big letter conduct pair is incited somebody to action using selective search (SelectiveSearch)
As detecting, simultaneously for other words for being present in corner, using the strategy of unified covering.So eliminate to original
Big image carries out the time of sliding window, and and can enough effectively removes text information.
Image is normalized to uniform sizes by the present invention after text information is removed, and this will reduce follow-up acquisition hand bony
Time complexity during the mark mapping in interest region.For the acquisition of mark mapping, detailed narration is carried out here:
1. model M 2 is that 16*16 sample can be classified, three classes, background, bone and tissue, the back of the body are specifically divided into
Scape skeletal tissue coexisting region.
2. by the sliding window to Output3, step-length 1, that is, shared about 500*500 sample are to be detected.
3. the testing result of each sample, it can all carry out counting each classification on 16*16 pixel of the sample
Testing result.So needing 512*512*3 array to be counted, 0,1,2 correspond to background, skeletal tissue's background respectively
Coexisting region, skeletal tissue.Statistics is necessary, because it was assumed that previous pattern detection is to being background, next pattern detection
To being coexisting region, whole pixels of this step-length institute difference will be considered as finally background, so as to the side in coexisting region
Boundary is divided.According to the statistics of three classifications in pixel, the 0 classification high pixel is just set to 0 (black), on the contrary then be
255 (whites).It is white that finally we, which can obtain bone region (ROI) in one's hands, and background noise is black entirely.
4. because the time complexity of the algorithm is high, therefore we used tree-shaped array to optimize, and will be conventional
Tree-shaped array expand to the tree-shaped array of two dimension.
5. after obtaining mark mapping, some white noises are had unavoidably, here by calculating and retaining maximum area of UNICOM
Domain removes white noise.In addition, the shadow of hand bone region and image bottom, is overlapped sometimes, and shadow and hand bone tissue are again
It is extremely similar so that model can not accurately make a distinction it, here by entering to the bottom up partial cross sectional of image
Row Difference test, specific practice are:Target area is first converted into array of data, be calculated the rows of most white points with it is minimum white
They are then compared, or being white, or being black, are otherwise then considered as difference by the row of point by column.Difference is big
In threshold value, then caused by being considered as shadow, the region between this two row is given up to fall, try again maximum UNICOM region afterwards
Calculate, then can get rid of noise caused by shadow.
Example:The range of age that the hand bone X-ray image used includes is 0 years old to 18 years old, totally 944 samples.Will be therein
632 samples are as training set, training pattern M1 and model M 2, and remaining 312 samples are as test set.Checking collection and training set
Overlap.The structure and test process of two models are introduced separately below.
Model M 1:
Step 1.1, depth convolutional neural networks are built, concrete structure is as shown in Figure 2.
Step 1.1.1:This convolutional neural networks includes 4 convolution modules, a Flatten layer and two full articulamentums
Composition
Step 1.1.2:In convolutional layer, convolution kernel size is 1*3, and input size is (1,100,100), convolution nuclear volume with
Network deeply increases, and is followed successively by 6,12,24,48.Each convolution module also includes relu active coatings, and Maxpooling
Pond layer, pool_size are (1,2).
Step 1.1.3:Before full articulamentum, one layer of flatten layer is had, to flatten the output of convolutional layer.
Step 1.1.4:In full articulamentum, first full articulamentum has Dropout layers to prevent over-fitting, parameter 0.5,
Full articulamentum afterwards is exported by sigmoid active coatings.
Step 1.2, data sampling and model training
Step 1.2.1:Hand bone X-ray image is gray-scale map, port number 1.500 images are sampled, every
Sample of 99 samples of figure grab sample plus an alphabetical L.All samples are connected as a numpy array again, first
Dimension is sample size, thus obtains the four-dimensional array of one (50000,1,100,100) as training set;Test set is taken together
The mode of sample is handled as (10000,1,100,100).Checking collection and training set overlap.
Step 1.2.2:Model is by the way of batch training, training set maker and each batch of checking collection maker
Sample number be 120, train 200 rounds altogether, optimized using logarithm loss function, and using rmsprop algorithms.Model
Only retain accuracy highest model.
Step 1.3, model measurement
Effect for model M 2, operating process is referred to, is repeated no more here.
Model M 2:
Step 2.1, depth convolutional neural networks are built, concrete structure is as shown in Figure 2.
Step 2.1.1:This convolutional neural networks includes 2 convolution modules, a Flatten layer and two full articulamentums
Composition
Step 2.1.2:In convolutional layer, convolution kernel size is 1*3, and input size is (1,16,16), and convolution nuclear volume is with net
Network deeply increases, and is followed successively by 6,24.Each convolution module also includes relu active coating, and Maxpooling layers,
Pool_size is (1,2).
Step 2.1.3:Before full articulamentum, one layer of flatten layer is had, to flatten the output of volume basic unit.
Step 2.1.4:In full articulamentum, first full articulamentum inputs 24 outputs before in 96 input points,
And there are Dropout layers to prevent over-fitting, parameter 0.05, full articulamentum afterwards is exported by softmax active coating,
Export as three classes.
Step 2.2, data sampling and model training
Step 2.2.1:Hand bone X-ray image is gray-scale map, port number 1.Sliding window sampling is carried out to 500 images,
16*16 sliding window, step-length 32, excessively unique for background, step-length 8, and this part sample can be doubled to handle,
To increase its weight in model training.Thus the four-dimensional array of one (100000,1,16,16) is obtained as training set,
Wherein first dimension is sample size;Test set takes same mode to handle for (20000,1,16,16).Checking collection and instruction
Practice collection to overlap.
Step 2.2.2:Model is by the way of batch training, training set maker and each batch of checking collection maker
Sample number be 120, train 2000 rounds altogether, loss uses polytypic logarithm loss function, and optimizer is selected
adam.Model only retains accuracy highest model.
Step 2.3, model measurement
By the operation of above-mentioned steps, you can realize the extraction in hand bone interest region in opponent's bone X-ray image.
Above-described specific descriptions, the purpose, technical scheme and beneficial effect of invention are carried out further specifically
It is bright, the specific embodiment that the foregoing is only the present invention is should be understood that, for explaining the present invention, is not used to limit this
The protection domain of invention, within the spirit and principles of the invention, any modification, equivalent substitution and improvements done etc. all should
Within protection scope of the present invention.