Summary of the invention
In the existing skin characteristic dividing method based on deep learning, have much be dedicated to design it is new there is specific damage
The complete convolutional neural networks framework of function is lost to realize better pre-generatmg.However, few people's trial will be clinical valuable
Priori knowledge be encoded in deep learning system, to realize to the Accurate Segmentation of skin characteristic.In order to achieve this goal, originally
Invention is intended to develop a general framework, allows to model background information and be integrated into the complete convolutional neural networks of depth
In, to realize full-automatic skin characteristic segmentation, wherein complete convolutional neural networks are intended to merge the information from intermediate convolutional layer
It is output to output by jump connection and the layer that deconvolutes, so as to consider rudimentary appearance information and high-level semantics information.In addition to
Outside data-driven feature from complete convolutional neural networks, it is also contemplated that the cutaneous lesions of low-level edge and textural characteristics
Clinical priori knowledge.These low-level edges and textural characteristics are from the filtration predetermined specific to skin characteristic
Then core is building up to the complete convolutional neural networks workflow using convolution algorithm using a kind of new shallow convolutional network.It proposes
New network architecture be trained in a manner of end to end.In this way, specificity textural characteristics in field can be mended
Other level characteristics learnt from complete convolutional neural networks and semantic feature are filled, to enhance the fine thin of skin injury
Section, more accurately to divide.
According to an aspect of the present invention, the dermoscopy figure of a kind of combination textural characteristics and deep neural network information is provided
As processing method, comprising:
The input picture is handled by deep neural network, to generate Feature Mapping;
By input picture described in shallow network processes with texture, to generate texture mapping;And
The Feature Mapping and texture mapping are merged by convolutional layer, to generate comprehensive score figure.
In one embodiment of the invention, this is combined at textural characteristics and the skin lens image of deep neural network information
Reason method further include: the comprehensive score figure is input to the softmax function with loss layer, formation is end-to-end can training net
Network.
In one embodiment of the invention, using the small lot stochastic gradient descent SGD training net with momentum
Network.
In one embodiment of the invention, the deep neural network includes:
Multiple storehouses, each storehouse include one or more convolutional layers and pond layer;And
Multiple transformation convolutional layers and up-sampling layer after the multiple storehouse are set, to the output of the multiple storehouse
Prediction carries out convolution algorithm and the first up-sampling operation, generates activation figure.
In one embodiment of the invention, the multiple storehouse is 5 storehouses, and the first storehouse includes one or more the
A roll collection layer and the first pond layer, input picture generate after the processing of one or more first volume collection layers and the first pond layer
The prediction of first pond layer;Second storehouse includes one or more volume Two collection layers and the second pond layer, and the prediction of the first pond layer is logical
The prediction of the second pond layer is generated after crossing one or more volume Two collection layers and the processing of the second pond layer;Third storehouse include one or
Multiple third volume collection layers and third pond layer, the prediction of the second pond layer pass through one or more third volume collection layers and third pond layer
Layer prediction in third pond is generated after processing;4th storehouse includes one or more Volume Four collection layers and the 4th pond layer, third pond
Change layer prediction by generating the prediction of the 4th pond layer after one or more Volume Four collection layers and the processing of the 4th pond layer;5th storehouse
Including one or more collection layers of volume five and the 5th pond layer, the prediction of the 4th pond layer by one or more collection layers of volume five with
The prediction of the 5th pond layer is generated after the processing of 5th pond layer.
In one embodiment of the invention, this is combined at textural characteristics and the skin lens image of deep neural network information
Reason method further include:
4th pond layer prediction is merged with the activation figure of the up-sampling layer output, then carries out the second up-sampling
Operation generates the second up-sampling prediction;
Layer prediction in the third pond is merged with the second up-sampling prediction, third is then carried out and up-samples operation,
Generate third up-sampling prediction.
In one embodiment of the invention, include: by input picture described in shallow network processes with texture
Generate texture dictionary;
Filter in the input picture and filter group is subjected to convolution to generate filter response;
Minimum value at each pixel based on the input picture between texture and filter response divides each pixel
A texture label in dispensing texture dictionary (D) generates texture label figure;
The texture label figure is converted into intensity map;
Tab indexes in texture label figure are replaced with into corresponding mean intensity, form texture mapping.
In one embodiment of the invention, the generation texture dictionary includes:
It is marked using true value and prepares two class patches;
The two-dimentional second-order partial differential coefficient of Gauss with 12 directions is applied to each patch in every group;
The filter response that all patches in same group generate is clustered to generate the texture of a class;And
All trained textures are stored in texture dictionary.
In one embodiment of the invention, the Feature Mapping and line are merged by double block convolutional layer in a complementary fashion
Reason mapping.
In one embodiment of the invention, in the training process, it can will be mended from the more details of textural characteristics figure
It is charged to the characteristic pattern of complete convolutional neural networks, while being inhibited in textural characteristics figure by complete convolutional neural networks characteristic pattern
The influence at non-lesion edge.
The invention proposes the methods of the priori knowledge for combining shallow network code and the complete convolutional neural networks of depth to be used for
Segmentation of the skin characteristic in skin lens image.Existing convolutional neural networks need depth multilayer neural network just to can guarantee model
Validity.We have proposed combine the priori knowledge of shallow network code and the method for the complete convolutional neural networks of depth.Firstly,
Priori knowledge has used the extraction of textural characteristics, and spatial filter simulation simplifies the cell on primary visual cortex (V1) and connects
By the function in domain.Then will be known by the priori of shallow network code using the effective integration strategy of jump connection and convolution operator
Know and is coupled with the individual-layer data driving feature learnt from complete convolutional neural networks (FCN), the detailed segmentation for skin characteristic.
The present invention applies this new neural network on the detailed segmentation of skin characteristic for the first time.With existing deep learning model ratio
Compared with method proposed by the present invention has preferable stability, while this method does not need the neural network framework of ultra deep, is not required to
Data extending or comprehensive parameters is wanted to adjust, the experimental results showed that, compared with other advanced methods, this method has effective mould
Type is promoted.
Specific embodiment
In the following description, with reference to each embodiment, present invention is described.However, those skilled in the art will recognize
Know can in the case where none or multiple specific details or with other replacements and/or addition method, material or component
Implement each embodiment together.In other situations, well known structure, material or operation are not shown or are not described in detail in order to avoid making this
The aspects of each embodiment of invention is obscure.Similarly, for purposes of explanation, specific quantity, material and configuration are elaborated, with
Comprehensive understanding to the embodiment of the present invention is just provided.However, the present invention can be implemented in the case where no specific detail.This
Outside, it should be understood that each embodiment shown in the accompanying drawings is illustrative expression and is not drawn necessarily to scale.
In the present specification, the reference of " one embodiment " or " embodiment " is meaned to combine embodiment description
A particular feature, structure, or characteristic is included at least one embodiment of the invention.Occur in everywhere in this specification short
Language " in one embodiment " is not necessarily all referring to the same embodiment.
The architecture of the skin lens image processing method of disclosure of the invention includes two networks: complete convolutional Neural net
Network and shallow network based on texture.The two networks be it is complementary, to realize the segmentation of more accurate skin characteristic.Fig. 1 shows root
According to the stream of the skin lens image processing method of the combination textural characteristics and deep neural network information of one embodiment of the present of invention
Cheng Tu.Data-driven and hand-made characteristic are all taken into account segmentation.More specifically, classification semantic feature is by complete convolution mind
Through e-learning, and contextual information relevant to skin characteristic boundary as derived from shallow network textural characteristics model.Then
The two networks are integrated by using the Feature Mapping that convolution operator fusion is generated from each network.The two networks are being learned
More detailed segmentation is realized in the interaction of habit stage.
In our method, the basis of the feature learning process of two networks is derived from convolution algorithm
's.For depth network, convolutional filtering core is indicated by the weight that raw image data learns automatically, and based on the shallow of texture primitive
Layer network is by learning primitive element based on field specific knowledge (such as skin injury boundary and texture) manual designs filtering core.
There are two benefits for the combination of both networks: on the one hand, is highlighted in learning process from clinical priori knowledge or context letter
Some important clues obtained in breath, on the other hand, the automatic weight Learning Scheme designed in complete convolutional neural networks has
Help to optimize and hand-made wants sketch map.
Firstly, the input picture is handled by deep neural network, in step 110 to generate Feature Mapping.
Complete convolutional neural networks framework is the state-of-the-art technology of semantic segmentation, wherein segmentation is to predict to obtain by pixelation
's.Complete convolutional Neural net is trained by the way that input to be mapped to the ground truth of its label in a manner of end-to-end supervised learning
Network.In the present invention, the complete convolutional neural networks of existing architecture training can be used to learn layered characteristic, for simplification
Training part is no longer described in detail in present specification.
Fig. 2 shows have the complete of VGG16 (complete convolutional neural networks classification net) according to one embodiment of present invention
Convolutional neural networks framework.VGG16 architecture can be used in complete convolutional neural networks as basic network, advantage
It is as follows: 1) more representative spy can be learnt by the stacking of two or three convolutional layers with small filter (3 × 3)
Sign, because it increases the complexity of nonlinear function;2) training data limited amount the problem of can by shift learning come
It solves, i.e., the pre-training model learnt from natural image abundant can be used for that complete convolutional neural networks is trained to carry out skin disease
Variation is cut;3) deep 16 layer architectures with a large amount of weights can encode more complicated high-level semantics feature.
VGG16 network includes 5 storehouses and 3 transformation convolutional layers, and each piece includes several convolutional layers and pond layer.Convolution
Layer can be regarded as the feature extractor based on filtering.Filter response is passed through discrete by the local acceptance region of preceding layer Feature Mapping
Convolution algorithm generates, is defined as:
WhereinIt is the filter kernel in current layer l comprising weight.Input mappingIt is
(l-1) Feature Mapping in layer.B is the deviation in l layers.The result of local weighted sum is delivered to activation primitive.With
Sigmoid method is compared, and line rectification function (Rectified Linear Unit, ReLU) can obtain in terms of learning efficiency
Better performance is obtained, so as to cause the training of faster deep neural network, uses ReLU as activation letter in each convolutional layer
Number.Pond layer provides a kind of mechanism of keeping characteristics space-invariance.But it can reduce the resolution ratio for wanting sketch map.It is typical
Pondization operation include sub-sampling and maximum pond.In the present invention, maximum pond operation can be used, maximum pond is found by research
Operation is substantially better than secondary sample operation.
With reference to specific example shown in Fig. 2, the first storehouse includes convolutional layer 1, convolutional layer 2 and pond layer 1.Convolutional layer 1 makes
Feature extraction is carried out to input picture with 64 filters, obtains 64 characteristic patterns, by 64 characteristic patterns are stacked together must
The output characteristic pattern of convolutional layer 1 is arrived.The operation of convolutional layer 2 is similar with convolutional layer 1, and difference is convolutional layer 1 to original image
Input carries out convolution operation, and convolutional layer 2 carries out convolution operation to the output result of convolutional layer 1, generates the feature of convolutional layer 2
Figure.The characteristic pattern of convolutional layer 2 is predicted by 1 generate pond layer 1 of pond layer.The prediction of pond layer 1 enters the second storehouse.Second storehouse
Processing to the 5th storehouse is similar with the first storehouse, and difference is only that the input of the first storehouse is original image, and subsequent storehouse
Input be previous storehouse prediction.
Conventional convolutional neural networks completely, which include: convolution, --- activates --- convolution --- activation --- pond
Change --- ... --- pond --- full connection --- classification returns.In the present invention, by the way that conventional convolution completely is refreshing
The last one through network is fully connected layer and is converted to convolutional layer, then will up-sampling and warp lamination be added to it is complete after conversion
Full convolutional neural networks realize complete convolutional neural networks.The layer being fully connected is replaced to enable the network to receive arbitrary size
Image input.Up-sampling and warp lamination generate output activation figure, and make characteristic pattern size and input picture it is in the same size,
To realize pixel prediction.As described above, merging layer reduces the resolution ratio that may cause the characteristic pattern predicted roughly, this
In the case of, jump connection is introduced to combine the fine dimension of the rough prediction of deep layer and shallow-layer and predict, so as to improve segmentation details.
More specifically, jump connection has been merged the 2 times of up-samplings prediction calculated in the last layer of stride 32 and has been come
From the prediction of the pond layer 4 of stride 16.That is, 2 times of up-sampling predictions and the prediction of pond layer 4 are superimposed, the two it is total
Image is returned to then prediction is up-sampled by stride 16.Equally, preferably prediction (prediction of stride 8) is to pass through fusion relatively
Shallow-layer predicts the sums from pond layer 4 and the last layer two predictions that (pond layer 3) is up-sampled with 2 times to realize.
A variety of terms are used in the present specification to indicate characteristic pattern that input picture generates after different phase operation
Picture, for example, characteristic pattern, mapping, activation figure, filter response, Feature Mapping.These terms may be used interchangeably sometimes.
Although jump connection can be used to mitigate thick segmentation problem, in the spy restored by the layer that deconvolutes from shallow-layer
Still lack some details in sign mapping.In addition, the spatial regularization for lacking complete convolutional neural networks may result in segmentation
Space Consistency decline.Local dependence is not fully taken into account in complete convolutional neural networks, it is also possible to lead to identical structure
Some predictions between interior pixel are inconsistent.Hereinafter, the present invention will by will the spatial information integration based on texture to complete
A shallow network is introduced in convolutional neural networks framework to solve this problem.
Fig. 1 is returned to, in step 120, by shallow network processes input picture with texture, to generate texture mapping.Line
Reason is that one of representative spaces information of identification feature is provided for pattern recognition task.Texture has shown that it in encoding texture
The advantage of message context, and (W is indicated by the response to one group of filter kernel1,W2,…Wn):
R=[W1* I (x, y), W2* I (x, y) ... Wn* I (x, y)] [2]
Wherein * indicates that convolution algorithm, n are the quantity of filtering core (W), and texture is defined as what filter response cluster in R generated
One group of feature vector.
Designing suitable filter bank is the committed step for extracting particular texture feature.It is partitioning into skin based on marginal information
The clinical priori knowledge of important clue is provided, we use second dervative Gaussian filter.Two-dimensional Gaussian function is defined as:
The filter realizes anisotropic filter by incoming direction parameter because edge can in any direction on
It presents.Second-order partial differential coefficient of the formula (3) to y-axis direction are as follows:
X '=x cos θ-y sin θ
Y '=x sin θ+y cos θ [4]
The filter is designed as specific edge detector, and σ=1.5, θ=[0 °, 15 °, 30 °, 45 °, 60 °, 75 °,
90 °, 105 °, 120 °, 135 °, 150 °, 165 °].In addition, in the present invention, non-side is extracted using standard gaussian scale σ=1
Edge structure, this also applies simple smoothness constraint to character representation.In order to reduce computing redundancy and increase character representation, for each
Anisotropy filter considers the peak response across all directions to filter kernel, while directly record filters isotropism
The response of device.
It is clustered using the k mean value (k-means) that the filter of second-order portion derivative and standard gaussian from Gauss responds
To calculate texture.Pixel with similar filter response is focused into identical group.For each group of distribution tag ID, i.e. line
Manage label.In order to create the texture (primitive element) of lesion and non-lesion, (ground truth is marked using true value
Labeling) prepare two class patches (lesion and non-lesion).In the training stage, will have the Gauss's of 12 directions (equation 4)
Two-dimentional second-order partial differential coefficient is applied to each patch in every group, and the filter that all patches in same group are generated responds
It clusters to generate the texture of a class.As a result, wherein k (for example, k=8) is the mass center number in k mean value, c there are k*c texture
It is the quantity (for example, c=2, i.e. lesion and non-lesion) of class.All trained textures are stored in dictionary (D), the dictionary
It will be used to calculate texture map.Bottom flow chart in Fig. 3 illustrates this process.
Once generating texture dictionary from the training stage, each input picture is just mapped to texture using similar realization
Mapping.More specifically, skin image is given, it carries out convolution with the filter in filter group first and rings to generate filter
It answers, is then based on the minimum value at each pixel between texture and filter response and each pixel in image is distributed into texture
A texture label in dictionary (D)By the process, texture label figure is generated,
It is further converted into intensity map.That is, for the pixel with identical texture label, those respective pixels in calculating input image
Mean intensity.Then the tab indexes in texture label figure are replaced with into corresponding mean intensity.This map is known as by we
Texture mapping, this process is as shown in the top flow chart in Fig. 3.
The convolution of hand-designed filter of this shallow-layer Web vector graphic from field specific knowledge is to global and local sky
Between information encoded, the filter can be by high order visual feature or STRUCTURE DECOMPOSITION some primitive elements (be edge here,
Dot and spot).According to the filter group of design, each image can be indicated by the different distributions of these elements.Herein,
Each skin image is indicated by the edge that the spot for using the second-order partial differential coefficient of Gauss and standard gaussian being used to extract extracts.This
Outside, according to the quantity of k, we can distinguish the edge with different gradient amplitudes.Therefore, can distinguish lesion strong boundary and
Weak edge in lesion.In addition, instead of in complete convolutional neural networks using the operation in the similar pond that may be decreased resolution ratio,
In shallow network, cluster is realized in the filter response images with identical input picture size.Therefore, our shallow network
More edge details may be retained.Significantly, since the shallow characteristic (for example, without nonlinear transformation) of network,
Some edges in non-lesion region can be presented.However, it is possible to be inhibited by merging complete convolutional neural networks characteristic pattern
These non-lesion edges.
Fig. 1 is returned, in step 130, is merged the Feature Mapping and texture mapping by convolutional layer, to generate synthesis
Shot chart.
Texture mapping derived from Feature Mapping and shallow network is stated derived from complete convolutional neural networks to be existed by integrating block
It is merged in single network.In form, allowing required mapping function is M(x), indicate from input x1To output xl+1Non-linear change
It changes.Assuming that can more effectively learn the function by introducing priori knowledge model in deep layer network.We are not to pass through depth
Degree neural network is directly fitted M(x), but by complete convolutional neural networks mapping function F (x) it is set as F(x)=M(x)-T(x).Cause
This, original mappings function can indicate are as follows:
M (x)=F (x)+T (x) [5]
Wherein, T(x)It is mapping function.About mapping function T(x), the output of transformation is constant, because of the power of filter group
It is predefined and fixed again.In this case, T is added(x)Propagated forward can be influenced, without will affect to F(x)It is backward
It propagates.This is because gradient is calculated by (partial gradient * upstream gradient), and due to office in back-propagation calculating
Portion's gradient is allF can be directly passed to from derived upstream gradient is lost(x)Weight updates, and the weight of filter group
(pass through T(x)Coding priori knowledge) it remains unchanged.
In order to merge two mappings in a complementary fashion, we increase function complexity by introducing double block convolutional layer.
Formula (5) can indicate are as follows:
M (x)=C (F (x), { Wi)+C (T (x), { Wi })
C=W2λ (μ, W1) [6]
Wherein { WiIndicate convolution block C in one group of weight (i is the number of plies), wherein input μ and λ be activation primitive ReLU.
Each figure (for example, Feature Mapping and texture maps of convolutional neural networks output completely) is input into individual block
In, which includes two convolutional layers.It trains weight to be considered as filter in convolutional layer, can learn to allow two features
Scheme the function of correctly merging.That is, in the training process, the more details from textural characteristics figure can be added to complete convolution
The characteristic pattern of neural network, while the non-lesion edge in textural characteristics figure is inhibited by complete convolutional neural networks characteristic pattern
It influences.In our experiment, filter ID and kernel size are rule of thumb defined as follows: for each piece, in the first volume
There are two input and four outputs in lamination, kernel size multiplies 5 pixels for 5.In second convolutional layer, there are four input and two
A output with same kernel size.Finally, by whole network is fed to from each piece of the sum of the response of filter.
That is, the response of each piece of filter is superimposed, the net then is trained by minimizing softmax intersection entropy loss
Network.
Fig. 4 shows the effect of Feature Mapping of the fusion from two networks according to an embodiment of the invention.Such as Fig. 4
Shown, (a) is the extremely low sample image of a width contrast, and the regional area especially shown in amplification frame, top (b) is from originally
The comprehensive score figure that the network of invention generates (c) is score map merely with complete convolutional neural networks.
The improvement of network of the invention can be intuitively observed in Fig. 4, wherein (a) is input picture (top), with red
The region of frame is enlarged drawing (bottom), we can clearly be observed that original details.Subgraph (b) illustrates according to the present invention
Network and its surface (lower right corner of b) comprehensive score figure (upper left corner of b) for obtaining.Subgraph (c) is merely with complete convolution
The shot chart of neural network.As can be seen, (b) in figure than the figure in (c) have finer details, and
(a) region in red block use in (b) it is proposed that network be predicted that there is very high lesion probability, but
(c) it is missed in.For further observation in detail, we can see more than surface shown in (c) on the surface of (b)
Local detail.
Fig. 1 is returned to be trained network in step 140.Once texture dictionary is generated, it can be based in dictionary
Minimum range in texture and image between the filter response of each pixel calculates the texture mapping of each image.This is in turn
Network of the invention is set to give training in a manner of end to end.More specifically, input picture is in parallel through complete convolution
Neural network and shallow network are further merged using two block convolutional layers (such as 210 institute of frame in Fig. 2 with generating two characteristic patterns
Show).Final score figure is input into the softmax with loss layer, is formed and new complete trains network.
We are using the small lot stochastic gradient descent (SGD) for having momentum, our network of training.In our experiment
In, we set 20 for batch size.The slope of complete convolutional neural networks is set as 0.0001, momentum 0.9.Finally
The learning rate of one integral layer is set as 0.001.We use the training in advance for complete convolutional neural networks network
VGG16 model initialization network weight, and integral layer is initialized.Our the 6th and the 7th convolutional layer in Fig. 2
The dropper layer for the use of rate being later 0.5 is to reduce overfitting.
The invention proposes the methods of the priori knowledge for combining shallow network code and the complete convolutional neural networks of depth to be used for
Segmentation of the skin characteristic in skin lens image.Existing convolutional neural networks need depth multilayer neural network just to can guarantee model
Validity.We have proposed combine the priori knowledge of shallow network code and the method for the complete convolutional neural networks of depth.Firstly,
Priori knowledge has used the extraction of textural characteristics, and spatial filter simulation simplifies the cell on primary visual cortex (V1) and connects
By the function in domain.Then will be known by the priori of shallow network code using the effective integration strategy of jump connection and convolution operator
Know and is coupled with the individual-layer data driving feature learnt from complete convolutional neural networks (FCN), the detailed segmentation for skin characteristic.
The present invention applies this new neural network on the detailed segmentation of skin characteristic for the first time.With existing deep learning model ratio
Compared with method proposed by the present invention has preferable stability, while this method does not need the neural network framework of ultra deep, is not required to
Data extending or comprehensive parameters is wanted to adjust, the experimental results showed that, compared with other advanced methods, this method has effective mould
Type is promoted.
Although described above is various embodiments of the present invention, however, it is to be understood that they are intended only as example to present
, and without limitation.For those skilled in the relevant art it is readily apparent that various combinations, modification can be made to it
Without departing from the spirit and scope of the invention with change.Therefore, the width of the invention disclosed herein and range should not be upper
It states disclosed exemplary embodiment to be limited, and should be defined according only to the appended claims and its equivalent replacement.