CN109685801B

CN109685801B - Skin mirror image processing method combining texture features and deep neural network information

Info

Publication number: CN109685801B
Application number: CN201811502946.2A
Authority: CN
Inventors: 杨光; 叶旭冏; 董豪
Original assignee: Digong Hangzhou Science And Technology Industry Co ltd; Hangzhou Dishi Technology Co ltd
Current assignee: DIGONG (HANGZHOU) SCIENCE AND TECHNOLOGY INDUSTRY Co.,Ltd.; HANGZHOU DISHI TECHNOLOGY Co.,Ltd.
Priority date: 2018-12-10
Filing date: 2018-12-10
Publication date: 2021-03-26
Anticipated expiration: 2038-12-10
Also published as: CN109685801A

Abstract

The invention discloses a dermatoscope image processing method combining textural features and deep neural network information, which comprises the following steps: processing the input image through a deep neural network to generate a feature map; processing the input image through a shallow network having a texture to generate a texture map; and fusing the feature map and the texture map by a convolution layer to generate a composite score map.

Description

Skin mirror image processing method combining texture features and deep neural network information

Technical Field

The invention relates to the technical field of image processing. In particular, the invention relates to a dermatoscope image processing method combining texture features and deep neural network information.

Background

Existing methods for skin mirror image classification can be broadly classified into traditional histogram-based thresholding, clustering, edge-based detection, region-based detection, morphological detection, model-based, active contour (snap method and its variants), and supervised learning-based methods. For example, mm.e.celebi, q.wen, s.hwang, h.iyatomi and g.schaefer segment specific Skin features in Skin mirror Images Using segmentation Methods Using Thresholding Methods in the title "cause Border Detection in reduction Images Using segmentation of segmentation Methods", Skin Research and technology, vol.19, pp.e252-E258, Feb 2013. The result of threshold fusion shows a better skin lesion segmentation effect. However, this method may not process the image well when a large number of artifacts (such as hairs or bubbles) are present, as these factors can significantly change the histogram, resulting in biased threshold calculations. C.a.z.barcelos and v.b.pires in the title "An automatic based non-linear dispersion equalization for skin segmentation," Applied Mathematics and Computation, vol.215, pp.251-261, Sep 12009, An anisotropic diffusion filter is used to segment lesion edges before the edge detector. The results show that most of the unwanted edges are removed, however some skin lesion areas are missed.

Recent advances in machine learning, particularly in the field of deep learning such as Convolutional Neural Networks (CNNs), have greatly improved the latest techniques for identifying, classifying, and quantifying underlying patterns in medical images. In particular, it is at the heart of the progress to utilize hidden hierarchical features that are learned only from the data. The application progress of deep learning in medical image registration, segmentation, computer-aided disease diagnosis and prognosis diagnosis is introduced. In recent years, methods based on deep learning have also made great progress in the automatic segmentation of skin lesions.

The skin mirror image can be automatically segmented by utilizing the complete convolution neural network. To address the limitation of the coarse segmentation boundary generated by the original fully convolutional neural network due to lack of label refinement and consistency (especially for skin lesions with blurred boundaries and/or low variation of texture between foreground and background), the multi-level fully convolutional neural network was investigated to learn the complementary visual features of different skin lesions. The former stages of the fully convolutional neural network learn the coarse appearance and localization information, while the latter stages of the fully convolutional neural network learn the subtle features of the lesion boundary. The supplementary information from the various segmentation stages is then combined to obtain the final segmentation result. This approach has shown very promising results on the ISBI 2016 skin lesion challenge match data set.

Disclosure of Invention

In the existing skin feature segmentation method based on deep learning, much effort is made to design a new complete convolution neural network architecture with a specific loss function to realize better pre-generation. However, few attempts have been made to encode clinically valuable a priori knowledge into a deep learning regime to achieve accurate segmentation of skin features. To achieve this goal, the present invention aims to develop a generic framework that allows modeling and integration of background information into a deep fully convolutional neural network to achieve fully automatic skin feature segmentation, where the fully convolutional neural network aims to fuse information from intermediate convolutional layers to be output through the skip-join and deconvolution layers, so that low-level appearance information and high-level semantic information can be considered. In addition to data-driven features from a fully convolutional neural network, clinical prior knowledge of skin lesions of low-level edges and textural features is also considered. These low-level edge and texture features are derived from a predefined skin feature-specific filtering kernel, using a new shallow convolutional network, and then built into a full convolutional neural network workflow using convolution operations. The proposed new network architecture is trained in an end-to-end manner. In this way, the domain-specific texture features can complement other hierarchical and semantic features learned from the fully convolutional neural network to enhance the fine details of the skin lesion for more accurate segmentation.

According to one aspect of the invention, a dermatoscope image processing method combining texture features and deep neural network information is provided, which comprises the following steps:

processing the input image through a deep neural network to generate a feature map;

processing the input image through a shallow network having a texture to generate a texture map; and

the feature map and texture map are fused by the convolutional layer to generate a synthetic score map.

In one embodiment of the present invention, the method for processing a dermatoscope image combining texture features and deep neural network information further comprises: and inputting the comprehensive score map into a softmax function with a loss layer to form an end-to-end trainable network.

In one embodiment of the invention, the network is trained using a small batch of random gradient descent SGDs with momentum.

In one embodiment of the invention, the deep neural network comprises:

a plurality of stacks, each stack comprising one or more convolutional layers and pooling layers; and

and a plurality of transform convolution layers and upsampling layers provided after the plurality of stacks, and performing convolution operation and first upsampling operation on predictions output by the plurality of stacks to generate an activation map.

In one embodiment of the invention, the plurality of stacks is 5 stacks, the first stack comprises one or more first rollup layers and a first pooling layer, and the input image is processed by the one or more first rollup layers and the first pooling layer to generate a first pooling layer prediction; the second stack includes one or more second rolling layers and second pooling layers, the first pooling layer prediction generates a second pooling layer prediction after being processed by the one or more second rolling layers and the second pooling layers; the third stack includes one or more third rolling layers and a third pooling layer, the second pooling layer prediction generates a third pooling layer prediction after being processed by the one or more third rolling layers and the third pooling layer; the fourth stack includes one or more fourth rolling layers and fourth pooling layers, the third pooling layer prediction generates a fourth pooling layer prediction after being processed by the one or more fourth rolling layers and the fourth pooling layers; the fifth stack includes one or more fifth rolling layers and a fifth pooling layer, and the fourth pooling layer prediction is processed by the one or more fifth rolling layers and the fifth pooling layer to generate a fifth pooling layer prediction.

In one embodiment of the present invention, the method for processing a dermatoscope image combining texture features and deep neural network information further comprises:

fusing the fourth pooling layer prediction with the activation map output by the upsampling layer, and then performing a second upsampling operation to generate a second upsampling prediction;

and fusing the third pooling layer prediction with the second upsampling prediction, and then performing a third upsampling operation to generate a third upsampling prediction.

In one embodiment of the invention, processing the input image through the shallow network having texture comprises:

generating a texture dictionary;

convolving the input image with filters in a filter bank to produce a filter response;

assigning each pixel of the input image to a texture tag in a texture dictionary (D) based on a minimum between texture and filter response at each pixel, generating a texture tag map;

converting the texture label map into an intensity map;

and replacing the label indexes in the texture label graph with corresponding average intensity to form texture mapping.

In one embodiment of the present invention, the generating the texture dictionary includes:

preparing two types of patches by using a truth value mark;

a two-dimensional second-order partial derivative of a gaussian with 12 directions is applied to each patch in each set;

clustering filter responses generated from all patches in the same group to generate a class of textures; and

all the trained textures are stored in a texture dictionary.

In one embodiment of the invention, the feature map and texture map are fused in a complementary manner by a two-block convolutional layer.

In one embodiment of the invention, during the training process, more detail from the texture feature map can be supplemented to the feature map of the fully convolved neural network, while suppressing the effect of non-lesion edges in the texture feature map by the fully convolved neural network feature map.

The invention provides a method for combining the prior knowledge of shallow network coding with a deep complete convolution neural network for segmenting skin features in a skin mirror image. The existing convolutional neural network needs a deep multi-layer neural network to ensure the effectiveness of the model. We propose a method that combines a priori knowledge of shallow network coding with a deep fully convolutional neural network. First, a priori knowledge uses extraction of textural features, where a spatial filter model simplifies the function of the cellular receptive domain on the primary visual cortex (V1). A priori knowledge encoded by the shallow network is then coupled with the hierarchical data-driven features learned from the fully convolutional neural network (FCN) for detailed segmentation of skin features using an efficient fusion strategy of the skip-join and convolution operators. The invention applies this new neural network for the first time to a detailed segmentation of skin features. Compared with the existing deep learning model, the method provided by the invention has better stability, meanwhile, the method does not need an ultra-deep neural network architecture, and does not need data expansion or comprehensive parameter adjustment, and experimental results show that compared with other advanced methods, the method has effective model popularization.

Drawings

To further clarify the above and other advantages and features of embodiments of the present invention, a more particular description of embodiments of the invention will be rendered by reference to the appended drawings. It is appreciated that these drawings depict only typical embodiments of the invention and are therefore not to be considered limiting of its scope. In the drawings, the same or corresponding parts will be denoted by the same or similar reference numerals for clarity.

FIG. 1 shows a flow diagram of a dermoscopic image processing method incorporating texture features and deep neural network information, according to one embodiment of the present invention.

FIG. 2 illustrates a fully convolutional neural network architecture with VGG16 (fully convolutional neural network classification net) according to one embodiment of the present invention.

FIG. 3 illustrates a framework for a shallow network in which the texture dictionary generation process is shown in the bottom flowchart and the top flowchart shows the process of generating a texture map using a learned dictionary, according to one embodiment of the present invention.

FIG. 4 illustrates the effect of fusing feature maps from two networks according to one embodiment of the invention.

Detailed Description

In the following description, the invention is described with reference to various embodiments. One skilled in the relevant art will recognize, however, that the embodiments may be practiced without one or more of the specific details, or with other alternative and/or additional methods, materials, or components. In other instances, well-known structures, materials, or operations are not shown or described in detail to avoid obscuring aspects of embodiments of the invention. Similarly, for purposes of explanation, specific numbers, materials and configurations are set forth in order to provide a thorough understanding of the embodiments of the invention. However, the invention may be practiced without specific details. Further, it should be understood that the embodiments shown in the figures are illustrative representations and are not necessarily drawn to scale.

Reference in the specification to "one embodiment" or "the embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment of the invention. The appearances of the phrase "in one embodiment" in various places in the specification are not necessarily all referring to the same embodiment.

The architecture of the disclosed dermatoscope image processing method of the present invention comprises two networks: fully convolutional neural networks and shallow texture-based networks. The two networks are complementary to achieve a more accurate skin feature segmentation. FIG. 1 shows a flow diagram of a dermoscopic image processing method incorporating texture features and deep neural network information, according to one embodiment of the present invention. Both data-driven and hand-made features are considered for segmentation. More specifically, hierarchical semantic features are learned by a fully convolutional neural network, while contextual information related to skin feature boundaries is modeled by texture features derived from a shallow network. The two networks are then integrated by fusing the feature maps generated from each network using the convolution operator. The two networks interact in a learning phase to achieve a more detailed segmentation.

In our method, the basis of the feature learning process for both networks is derived from convolution operations. For deep networks, the convolution filter kernel is represented by weights that are automatically learned from the raw image data, while the texton-based shallow networks learn the raw elements by manually designing the filter kernel based on domain-specific knowledge (such as skin lesion boundaries and textures). The combination of these two networks has two benefits: on one hand, some important clues obtained from clinical priori knowledge or context information are emphasized in the learning process, and on the other hand, an automatic weight learning scheme designed in a complete convolutional neural network is helpful for optimizing a manually-made element diagram.

First, at step 110, the input image is processed by a deep neural network to generate a feature map.

The fully convolutional neural network architecture is the latest technology of semantic segmentation, where segmentation is obtained by pixelized prediction. A fully convolutional neural network is trained by mapping inputs to their labeled ground truth in an end-to-end supervised learning manner. In the present invention, the complete convolutional neural network can be trained to learn the hierarchical features using the existing architecture, and the training part will not be described in detail in order to simplify the present specification.

FIG. 2 illustrates a fully convolutional neural network architecture with VGG16 (fully convolutional neural network classification net) according to one embodiment of the present invention. The architecture of VGG16 can be used as the basic network in the complete convolution neural network, and the advantages are as follows: 1) a more representative feature can be learned by stacking two or three convolutional layers with small filters (3 x 3) because it increases the complexity of the non-linear function; 2) the problem of limited training data quantity can be solved by transfer learning, namely a pre-training model which is learned from abundant natural images can be used for training a complete convolution neural network to carry out skin lesion segmentation; 3) a deep 16-level architecture with a large amount of weights can encode more complex high-level semantic features.

The VGG16 network includes 5 stacks and 3 transform convolutional layers, each block containing several convolutional and pooling layers. The convolutional layer can be viewed as a filter-based feature extractor. The filter response is generated by the discrete convolution operation of the local acceptance domain of the feature map of the previous layer, and is defined as:

wherein

Is a filter containing weights in the current layer lA wave core. Input mapping

Is the feature map in layer (l-1). b is the deviation in the l-th layer. The result of the local weighted sum is passed to the activation function. Compared to Sigmoid methods, Linear rectification functions (relus) can achieve better performance in terms of learning efficiency, resulting in faster deep neural network training, using relus as activation functions in each convolutional layer. The pooling layer provides a mechanism to preserve feature space invariance. However, it reduces the resolution of the element map. Typical pooling operations include sub-sampling and maximum pooling. In the invention, maximum cell operation can be adopted, and the maximum cell operation is obviously superior to secondary sampling operation through research.

Referring to the specific example shown in fig. 2, the first stack includes convolutional layer 1, convolutional layer 2, and pooling layer 1. The convolutional layer 1 performs feature extraction on an input image using 64 filters to obtain 64 feature maps, and the output feature map of the convolutional layer 1 is obtained by stacking the 64 feature maps. The operation of convolutional layer 2 is similar to convolutional layer 1, except that convolutional layer 1 performs a convolution operation on the original image input, and convolutional layer 2 performs a convolution operation on the output result of convolutional layer 1 to generate a feature map of convolutional layer 2. The feature map of convolutional layer 2 is predicted by pooling layer 1 to generate pooling layer 1. Pooling layer 1 predicts entry into the second stack. The processing of the second stack through the fifth stack is similar to the first stack, except that the input to the first stack is the original image, and the input to the subsequent stack is the prediction of the previous stack.

Conventional fully convolutional neural networks include: convolution-activation-pooling-fully-connected-classification or regression. In the present invention, a full convolutional neural network is implemented by converting the last full connection layer of a conventional full convolutional neural network into a convolutional layer, and then adding an upsampling and deconvolution layer to the converted full convolutional neural network. Replacing fully connected layers enables the network to accept image input of any size. The upsampling and deconvolution layers generate an output activation map and make the size of the feature map consistent with the size of the input image, thereby realizing pixel prediction. As mentioned above, merging layers reduce the resolution of the feature map that may lead to coarse prediction, in which case skip-joins are introduced to combine coarse prediction at deep levels and fine-scale prediction at shallow levels to improve segmentation detail.

More specifically, the skip join fuses the 2 up-sampled prediction computed on the last layer of stride 32 with the prediction from pooled layer 4 of stride 16. I.e. the 2-fold upsampled prediction is superimposed with the prediction of the pooling layer 4, the sum of which then upsamples the prediction back to the picture by a step 16. Also, a relatively better prediction (stride 8 prediction) is achieved by fusing the shallow prediction (pooling layer 3) with the sum of the two predictions from pooling layer 4 and the last layer of the 2-fold upsampling.

Various terms are used in this specification to refer to feature images generated by subjecting an input image to different stages of operations, such as feature maps, activation maps, filter responses, feature maps. These terms are sometimes used interchangeably.

Although the coarse segmentation problem can be mitigated using skip connections, some details are still missing in feature mapping that is recovered from the shallow layer by the deconvolution layer. Furthermore, the lack of spatial regularization of a fully convolutional neural network may result in a reduction in spatial consistency of the segmentation. Local dependencies are not adequately accounted for in a fully convolutional neural network, and some prediction inconsistencies between pixels within the same structure may also result. The present invention will hereinafter address this problem by introducing a shallow network by integrating texture-based spatial information into a fully convolutional neural network architecture.

Returning to FIG. 1, at step 120, the input image is processed through the shallow network with textures to generate a texture map. Texture is one of the representative spatial information that provides identifying features for the pattern recognition task. Texture has shown its advantages in encoding texture information and is represented by the response to a set of filter kernels (W)₁,W₂,…W_n)：

R＝[W₁*I(x，y)，W₂*I(x，y)...W_n*I(x，y)] [2]

Where denotes the convolution operation, n is the number of filter kernels (W), and the texture is defined as a set of eigenvectors generated by clustering of the filter responses in R.

Designing a suitable filter bank is a key step in extracting specific texture features. Based on the edge information, we use a second derivative gaussian filter to provide clinical prior knowledge of important cues for skin segmentation. The two-dimensional gaussian function is defined as:

the filter implements an anisotropic filter by introducing directional parameters, since edges can be present in arbitrary directions. The second partial derivative of equation (3) in the y-axis direction is:

x′＝x cosθ-y sinθ

y′＝x sinθ+y cosθ [4]

the filter is designed as a special edge detector and σ is 1.5, θ is [0 °, 15 °, 30 °, 45 °, 60 °, 75 °, 90 °, 105 °, 120 °, 135 °, 150 °, 165 °. Furthermore, in the present invention, a standard gaussian scale σ of 1 is employed to extract non-edge structures, which also imposes a simple smoothing constraint on the feature representation. To reduce computational redundancy and increase the feature representation, for anisotropic filters, the maximum response to the filter kernel across all directions is considered, while the response to the isotropic filter is recorded directly.

Texture is computed using the second order partial derivatives from the gaussian and k-means (k-means) clustering of the filter response of the standard gaussian. Pixels with similar filter responses are grouped into the same group. Each group is assigned a tag ID, i.e., a texture tag. To create the texture (primitive elements) of both lesions and non-lesions, two types of patches (lesions and non-lesions) are prepared using ground truth labeling. In the training phase, a two-dimensional second-order partial derivative of a gaussian with 12 directions (equation 4) is applied to each patch in each group, and the filter responses generated from all patches in the same group are clustered to generate a class of textures. As a result, there are k × c textures, where k (e.g., k ═ 8) is the number of centroids in the k-means and c is the number of classes (e.g., c ═ 2, i.e., lesions and non-lesions). All the trained textures are stored in a dictionary (D) which will be used to compute the texture atlas. This process is illustrated in the bottom flow chart in fig. 3.

Once the texture dictionary is generated from the training phase, each input image is mapped to a texture map using a similar implementation. More specifically, given a skin image, it is first convolved with the filters in the filter bank to produce a filter response, and then each pixel in the image is assigned to a texture label in the texture dictionary (D) based on the minimum between the texture and the filter response at each pixel

Through this process, a texture label map is generated, which is further converted to an intensity map. That is, for pixels having the same texture label, the average intensity of those corresponding pixels in the input image is calculated. The tag indices in the texture tag map are then replaced with the corresponding average intensities. We refer to this map as texture mapping, a process which is shown in the top flow chart of fig. 3.

This shallow network encodes global and local spatial information using convolution from a manually designed filter of domain specific knowledge that can decompose higher order visual features or structures into some original elements (here edges, dots and blobs). Each image may be represented by a different distribution of these elements, depending on the designed filter bank. Herein, each skin image is represented by the second-order partial derivative using gaussian and the edges extracted from the spots using standard gaussian extraction. Furthermore, we can distinguish edges with different gradient magnitudes depending on the number of k. Thus, a strong boundary of the lesion and a weak edge within the lesion may be distinguished. In addition, instead of using a pool-like operation that may reduce resolution in a fully convolutional neural network, clustering is implemented on filter response images having the same input image size in a shallow network. Therefore, our shallow network may retain more edge details. Notably, some edges in non-diseased regions may also be present due to the shallow nature of the network (e.g., no non-linear transformation). However, these non-lesion edges can be suppressed by fusing the fully convolved neural network feature maps.

Returning to FIG. 1, at step 130, the feature map and texture map are fused by the convolutional layer to generate a synthetic score map.

The feature map derived from the fully convolutional neural network and the texture map derived from the shallow network are fused in a single network by an integration block. Formally let the required mapping function be M_(x)Which represents the slave input x₁To the output x_l+1Is performed by a non-linear transformation. It is assumed that this functionality can be learned more efficiently by introducing a priori knowledge models in the deep network. We do not fit M directly through the deep neural network_(x)Instead, the fully convolutional neural network mapping function F: (_x) Is set to F_(x)＝M_(x)-T_(x). Thus, the original mapping function can be expressed as:

M(x)＝F(x)+T(x) [5]

wherein, T_(x)Is a mapping function. With respect to the mapping function T_(x)The output of the transform is constant, since the weights of the filter bank are predefined and fixed. In this case, T is added_(x)Can affect forward propagation without affecting forward direction F_(x)Backward propagation of (a). This is because, in the backward propagation calculation, the gradients are calculated by (local gradient x upstream gradient), and since the local gradients are all local gradients

The upstream gradient derived from the loss can be transferred directly to F_(x)The weights are updated and the weights of the filter bank (by T)_(x)Coding a priori knowledge) remains unchanged.

To fuse the two mappings in a complementary way, we increase the functional complexity by introducing a two-block convolutional layer. Equation (5) can be expressed as:

M(x)＝C(F(x)，{W_i})+C(T(x)，{Wi})

C＝W₂λ(μ，W₁) [6]

wherein { W_iDenotes a set of weights (i is the number of layers) in the convolution block C, where the inputs μ and λ are the activation functions ReLU.

Each map (e.g., the feature map and texture map of the complete convolutional neural network output) is input into a separate block that contains two convolutional layers. Trainable weights in the convolutional layer are considered as filters that can learn functions that allow the two feature maps to fuse properly. That is, during the training process, more detail from the texture feature map can be supplemented to the feature map of the fully convolved neural network, while the effect of non-lesion edges in the texture feature map is suppressed by the fully convolved neural network feature map. In our experiments, the filter number and kernel size are empirically defined as follows: for each block, there are two inputs and four outputs in the first convolution layer, the kernel size being 5 by 5 pixels. In the second convolutional layer, there are four inputs and two outputs with the same kernel size. Finally, the sum of the responses of the filters from each block is fed to the whole network. That is, the filter responses for each block are added together, and then the network is trained by minimizing softmax cross-entropy loss.

FIG. 4 illustrates the effect of fusing feature maps from two networks according to one embodiment of the invention. As shown in fig. 4, (a) is a very low contrast sample image, particularly a local area displayed in a magnified box, top (b) is a composite score map generated from the network of the present invention, and (c) is a score map using only a fully convolutional neural network.

The improvement of the network of the present invention can be visually observed in fig. 4, where (a) is the input image (top) and the red boxed area is the magnified image (bottom), and we can clearly observe the original details. Sub-graph (b) shows the composite score graph (upper left corner of b) derived from the network according to the invention and its surface (lower right corner of b). Sub graph (c) is a score graph using only a fully convolutional neural network. As we see, the graph in (b) has finer detail than the graph in (c), and the region with red boxes in (a) is predicted to have a high probability of lesion in (b) using our proposed network, but misses it in (c). To see in further detail, we can see more local detail on the surface of (b) than the surface shown in (c).

Returning to FIG. 1, at step 140, the network is trained. Once the texture dictionary is generated, a texture map for each image may be calculated based on the minimum distance between the texture in the dictionary and the filter response for each pixel in the image. This in turn enables the network of the present invention to be trained in an end-to-end fashion. More specifically, the input image is passed in parallel through a fully convolved neural network and a shallow network to produce two feature maps, which are further fused using two block convolution layers (as shown in block 210 in FIG. 2). The final score map is input to softmax with the lossy layer, forming a new complete trainable network.

We train our network with small batches of random gradient descent (SGD) with momentum. In our experiment, we set the batch size to 20. The tilt rate of the fully convolutional neural network was set to 0.0001 and the momentum to 0.9. The learning rate of the last integration layer is set to 0.001. We initialize the network weights and initialize the integration layer using a pre-trained VGG16 model for a fully convolutional neural network. We used a 0.5 rate dip tube layer after the sixth and seventh convolutional layers in fig. 2 to reduce overfitting.

While various embodiments of the present invention have been described above, it should be understood that they have been presented by way of example only, and not limitation. It will be apparent to persons skilled in the relevant art that various combinations, modifications, and changes can be made thereto without departing from the spirit and scope of the invention. Thus, the breadth and scope of the present invention disclosed herein should not be limited by any of the above-described exemplary embodiments, but should be defined only in accordance with the following claims and their equivalents.

Claims

1. A dermatoscope image processing method combining texture features and deep neural network information comprises the following steps:

processing the input image through a shallow network with texture to generate a texture map, the shallow network with texture employing a second derivative Gaussian filter, wherein processing the input image through the shallow network with texture comprises: generating a texture dictionary; convolving the input image with filters in a filter bank to produce a filter response; assigning each pixel of the input image to a texture tag in a texture dictionary based on a minimum between a texture and a filter response at each pixel, generating a texture tag map; converting the texture label map into an intensity map; replacing the tag indices in the texture tag map with corresponding average intensities to form a texture map, wherein generating the texture dictionary comprises: preparing two types of patches by using a truth value mark; a two-dimensional second-order partial derivative of a gaussian with 12 directions is applied to each patch in each set; clustering filter responses generated from all patches in the same group to generate a class of textures; storing all the trained textures in a texture dictionary; and

fusing the feature map and texture map by a convolutional layer to generate a synthetic score map,

wherein the feature mapping and texture mapping are fused in a complementary manner by means of a two-block convolutional layer, the mapping function M (x) used representing the input x₁To the output x₁₊₁The formula is:

M(x)＝C(F(x)，{W_i})+C(T(x)，{W_i})

C＝W₂λ(μ，W₁)

f (x) is the complete convolutional neural network mapping function, T_(x)Is a mapping function, { W_iDenotes a set of weights in the convolution block C, i is the number of layers, and the inputs μ and λ are the activation functions ReLU.

2. The dermatoscopic image processing method combining textural features and deep neural network information of claim 1, further comprising:

and inputting the comprehensive score map into a softmax function with a loss layer to form an end-to-end trainable network.

3. The method of dermoscopic image processing in conjunction with texture feature and deep neural network information of claim 2 wherein the network is trained using a small batch of random gradient descent (SGD) with momentum.

4. The method of dermoscopic image processing in conjunction with textural features and deep neural network information of claim 1 wherein the deep neural network comprises:

5. The method of dermoscopic image processing in conjunction with textural features and deep neural network information of claim 4 wherein the plurality of stacks is 5 stacks, a first stack includes one or more first convolution layers and a first pooling layer, and the input image is processed by the one or more first convolution layers and the first pooling layer to generate a first pooling layer prediction; the second stack includes one or more second rolling layers and second pooling layers, the first pooling layer prediction generates a second pooling layer prediction after being processed by the one or more second rolling layers and the second pooling layers; the third stack includes one or more third rolling layers and a third pooling layer, the second pooling layer prediction generates a third pooling layer prediction after being processed by the one or more third rolling layers and the third pooling layer; the fourth stack includes one or more fourth rolling layers and fourth pooling layers, the third pooling layer prediction generates a fourth pooling layer prediction after being processed by the one or more fourth rolling layers and the fourth pooling layers; the fifth stack includes one or more fifth rolling layers and a fifth pooling layer, and the fourth pooling layer prediction is processed by the one or more fifth rolling layers and the fifth pooling layer to generate a fifth pooling layer prediction.

6. The dermatoscopic image processing method combining textural features and deep neural network information of claim 5, further comprising:

7. The method of dermoscopic image processing in combination with texture features and deep neural network information of claim 1 wherein during training, more details from the texture map are supplemented to the feature map of the fully convolutional neural network while suppressing the effects of non-lesion edges in the texture map by the fully convolutional neural network feature map.