CN117094980A

CN117094980A - Ultrasonic breast nodule image interpretation method based on deep learning

Info

Publication number: CN117094980A
Application number: CN202311124040.2A
Authority: CN
Inventors: 田传耕; 封�波; 李凡甲; 安媛; 马崇杰; 封新闻; 唐璐
Original assignee: Xuzhou University of Technology
Current assignee: Xuzhou University of Technology
Priority date: 2023-09-01
Filing date: 2023-09-01
Publication date: 2023-11-21

Abstract

The invention provides an ultrasonic breast nodule image interpretation method based on deep learning, which comprises the steps of firstly utilizing a data set trained deep v3+ network marked by a professional doctor as a segmentation network model, accurately positioning and segmenting breast nodules, then utilizing a pre-trained AlexNet network model to carry out migration learning, carrying out benign and malignant classification on breast nodules, firstly using a non-attribute method, carrying out boundary and kernel segmentation and segmentation on the nodules according to the characteristics of the nodules under the view angle of the doctor, and carrying out integrated learning on the sub-images to obtain an interpretation diagnosis result of comprehensive discrimination. And then, using an attribute method, converting the model network behavior into output which can be interpreted by a user by utilizing various visualization tools, and improving the interpretability. And finally, designing man-machine interaction software by utilizing Matlab App Designer. The invention realizes the interpretability of the deep learning model in the field of breast nodule image recognition, reduces psychological burden of patients, and improves diagnosis accuracy of breast cancer benign and malignant recognition.

Description

Ultrasonic breast nodule image interpretation method based on deep learning

Technical Field

The invention relates to the technical field of breast nodule image recognition, in particular to an ultrasonic breast nodule image interpretability method based on deep learning.

Background

The breast cancer has no special typical symptoms and characteristics at the initial stage of the onset, is not easy to attach importance to patients, has the problems of low diagnosis speed, insufficient intellectualization in the diagnosis process and the like, and has large gap of the prior ultrasonic profession, large diagnosis workload of doctors, low diagnosis efficiency of the doctors and large medical burden of the patients. The current diagnosis method mainly comprises palpation, ultrasonic detection, mammary gland molybdenum target, mammary gland nuclear magnetic resonance and imaging research of breast cancer, wherein the ultrasonic detection is the least traumatic, most popular and most widely used mode. The pathological features of the breast nodules mainly comprise boundary definition, regularity and kernel echo calcification, and the best method for improving the cure rate of early breast cancer is to regularly check the breast cancer of women of proper age, so that further deterioration is avoided, and the accurate and rapid screening of the breast nodules is particularly important.

At present, the traditional method for image processing adopts manual feature extraction, uses a support vector machine, a random forest and the like to identify and extract the features of the image based on the features, and often has a plurality of difficulties including poor generalization capability, high complexity, low accuracy of identification results and the like. Since the deep learning method has strong automatic feature extraction capability, the problem of dependency of manually extracted features does not exist, but most deep learning processes have no or poor interpretability.

In addition, AI is also attempting to provide help for intelligent auxiliary diagnosis of breast nodules, a breast cancer screening AI system is developed in China at present, a molybdenum target image obtained by X-ray radiography inspection of a breast molybdenum target is used for locating focus points and judging benign and malignant tumors, the sensitivity of detecting breast calcification and malignant tumors reaches 99% and 90.2%, the sensitivity and specificity of identifying benign and malignant breast nodules reach 87% and 96%, diagnosis is assisted by doctors, difficulty of visual interpretation is reduced, but a breast image identification system is not yet proposed, and the diagnosis process is not interpretable.

The breast nodule incidence is higher, the diagnosis has the conditions of misdiagnosis and missed diagnosis, the intelligent degree is low, the speed is low, the intelligent diagnosis is the most important for processing the image, the medical image data is unevenly distributed, the data volume is insufficient, the traditional method for image processing is difficult in feature extraction, the accuracy of the algorithm recognition result is low, and the interpretation of the deep learning method recognition result is poor.

Disclosure of Invention

In order to solve the problems, the invention discloses an ultrasonic breast nodule image interpretability method based on deep learning, which realizes the interpretability of a deep learning model in the field of breast nodule image recognition, reduces psychological burden of patients, improves diagnosis accuracy of breast cancer benign and malignant recognition, and has important theoretical significance and practical value in research and realization of judging the benign and malignant of nodules according to interpretable analysis results.

Firstly, training a deep labv & lt3+ & gt network by using a gold standard data set marked by a professional doctor as a segmentation network model, accurately positioning and segmenting a breast nodule, improving software performance by comparing with using a lightweight MobileNet v2 as a main network, and testing to obtain mIoU reaching 83.58%. And then, performing migration learning by using a pre-trained AlexNet network model, and classifying benign and malignant breast nodules, wherein the accuracy of the test set reaches 88.61%. Because the pathological features of the ultrasonic image of the breast nodule are expressed as boundary definition, regularity and kernel echo calcification, in order to further improve the model interpretability, a non-attribute method is firstly used in the text, the node is subjected to boundary and kernel segmentation and segmentation according to the nodule features under the view angle of a doctor, and the sub-image is subjected to integrated learning to obtain an interpretable diagnosis result of comprehensive discrimination. And then, using an attribute method, converting the model network behavior into output which can be interpreted by a user by utilizing various visualization tools, and improving the interpretability. Finally, the Matlab App Designer is used for designing man-machine interaction software, and a doctor can import images through one key to realize nodule segmentation and develop an interpretable diagnosis report.

The specific scheme is as follows:

An ultrasound breast nodule image interpretability method based on deep learning, comprising the steps of:

(1) Acquiring a breast nodule data set and marking by a professional doctor, and preprocessing an ultrasonic image picture, wherein the method comprises marking the picture, adjusting the size to adapt to a network model, performing classical simple preprocessing such as histogram equalization, wavelet denoising and the like;

(2) Building a pre-training segmentation model, and comparing a plurality of backbone networks including Xreception, resNet-50 and MobileNetv2, adjusting training super parameters, training the model and evaluating, wherein the evaluating mode comprises MeanIoU, weighteIoU, meanBFScore and the like, and comparing a predicted result on a single picture with a group trunk;

(3) Building a pre-training classification model, adjusting training super parameters, training the model and verifying accuracy on a test set;

(4) Performing interpretive analysis on the model by using an attribute method and a non-attribute method, wherein the interpretive analysis comprises the steps of comparing a plurality of visual model transparent model training processes and designing an interpretive algorithm by using the non-attribute method to improve the interpretive performance of the model;

(5) And (3) designing a man-machine interaction intelligent APP by utilizing an APP Designer of Matlab, wherein the functions comprise automatically analyzing whether a patient has breast nodules, risk probability of malignant tumors and corresponding medical record reports after inputting an ultrasonic image by one key.

As a further improvement of the present invention, the step (2) is to build a pre-training segmentation model, which includes preprocessing a dataset and designing a segmentation network, wherein the dataset obtained in the preprocessing of the dataset includes three types of breast ultrasound images including benign nodules, malignant nodules and non-nodules, and mask images marked by doctors, and for training the images, raw data and tag data are required, so that the mask images need to be preprocessed to generate final tag images, the tag image data format is a two-dimensional array, the nodules are "1", and the others are "0"; there are only two values 0 and 1 in the image, where 0,1 is just a pixel label, representing the other and nodule, respectively; whereas the segmentation network is actually also a classification network, except that the classification object is each pixel.

As a further improvement of the invention, the split network is designed to design a deep labv3+ network using MobileNetv2 as the backbone network.

The deep labv3+ can select different training networks as the backbone network to extract the characteristics according to the requirements, but the performances are different, and the deep labv3+ is selected according to the actual situation, which is another level of parameter adjustment. And (3) verifying the influence of the backbone network on the performance of the deep labv3+ network model, carrying out a comparison experiment on a verification set by configuring deep labv3+ of different backbone networks, wherein the evaluation method comprises the steps of model parameter size (Parameter quantity), average cross-over ratio (mIoU) and verification time (validing time), and the results are shown in table 1.

TABLE 1 Deeplabv3+ model Performance comparison for different backbone networks

As can be seen from the table, when the MobileNetv2 is used as the backbone network, the average cross-over ratio is 83.58%, the verification time is 89ms, the model parameter size is 2.18MB, and the average cross-over ratio is almost the same, but the model parameter number is obviously reduced, the verification time is greatly shortened, the complexity of the model is reduced, the running speed of the model is accelerated, the model is lighter, the deployment is convenient, and the diagnosis rate is an important index for assisting diagnosis.

Furthermore, in an ideal case, all classes have an equal number of observations, but for ultrasound image images, it is often the case that the nodule occupancy is small. Since learning is biased toward the dominant part of the image, this imbalance can greatly affect the learning process if not handled properly. The pixel ratio of the image nodule is pre-calculated and modifying the input layer parameters with class weights will improve network performance.

As a further improvement of the invention, to improve network accuracy, the original data set may be randomly transformed during training, i.e. data enhancement; by using data enhancement, more categories can be added to the training data without increasing the number of labeled training samples; applying the same random transform to the image and pixel label data in Matlab; the present example inverts the image along the X-axis and Y-axis.

As a further improvement of the invention, the parameters of the model training part are shown in Table 2, an Adam optimizer is used, a plurality of algorithms are applied to deep learning model training, adam can update the weight of the neural network by using training data, and the model training part is a first-order optimization algorithm which can replace the traditional random gradient descent process; setting an initial learning rate to be 0.0001, wherein the learning rate is reduced by 0.3 times every 10 epochs, and the maximum epochs=50 of training iteration enables a network to quickly learn at a higher initial learning rate, and meanwhile, a solution close to local optimum can be found when the learning rate is reduced; considering that the size of the video memory and the Batchsize are required to be moderate, the batch size is properly reduced to reduce the memory usage amount, and the size is set to be 8; training is carried out by using the trainNetwork after the training parameters are set.

TABLE 2 partial model training parameters

As a further improvement of the invention, the mIoU is one of the most used evaluation indexes for semantic segmentation, and besides, various other evaluation indexes are calculated, including performing coincidence comparison on a single Zhang Yanzheng image to test a network and checking the influence of each class on the whole performance.

The various metrics for the dataset, individual classes, and each test image are returned with a correlation function (evaluateSemanticSegmentation), with the metrics. DataSetMetrics look at the results shown in Table 3.

TABLE 3 Overall dataset evaluation index

Wherein, the mIoU is 83.58% by testing on the whole verification set.

In addition, the effect of nodules (Tubercle) and background (Back) on overall performance can be shown with metrics.

Table 4 class evaluation index of random individual images

Although the overall dataset performance is quite high, the class indicator indicates that the single picture portion IoU is not high enough. Increasing the number of data set samples helps to improve the results.

As a further improvement of the present invention, the step (3) is to build a pre-training classification model, which includes data set preprocessing and classification network design, wherein three types of ultrasonic image samples including benign nodules, malignant nodules and no nodules are obtained in the data set preprocessing, and 2826 total ultrasonic image samples are obtained. Cleaning the data set, and marking the data set with class labels, wherein the class labels are respectively benign corresponding to benign, malignant corresponding to malignant, normal corresponding to normal images without nodules, and the sizes of the normal images are uniform and are the input pixel sizes (227 multiplied by 3) required by a network input layer; after pretreatment, the data set is divided into a training set, a verification set and a test set according to the ratio of 6:2:2.

As a further improvement of the invention, the classification network is designed to use an AlexNet convolutional neural network as the classification network, the construction of the classification network comprises transfer learning and network training, wherein the transfer learning is widely applied to a large number of deep learning training processes, and a pre-training network is adjusted; new tasks are performed after a pre-training network (e.g., alexNet) inputs new data containing previously unknown classes, making some improvements to the network, such as classifying only benign nodules, malignant nodules, and no nodules, rather than classifying 1000 different objects. Yet another advantage is that much less data is required, processing thousands of pictures rather than millions of pictures, and thus the computation time is reduced to hours or minutes.

Migration learning requires interfaces for accessing the interior of existing networks, so that accurate modifications and enhancements can be made to new tasks; only the nodules need to be classified, while AlexNet is trained to identify 1000 classes, which need to be modified to identify only three classes. For this purpose, the last full connection layer and output layer are modified, the image input pixel size is 227×227×3, and the classification output is 3 kinds.

As a further improvement of the invention, an sgdm optimizer, namely random gradient descent, is used in network training, the overall samples are divided into small batches to be trained, and a gradient descent method is used for adjusting network model parameters; otherwise, similar to the split network, the initial learning rate is 0.0001, the minibatch size is set to 8, the verification frequency is set to 100, and data enhancement is performed on the data.

As a further improvement of the invention, a confusion matrix is made for the test set, wherein rows in the matrix represent real classes, columns represent and test classes, and the number of benign predictions as malignant in the test set is 12 as the first row and the second row.

In the test set, various performance indexes including accuracy, recall, sensitivity and specificity can be obtained in the confusion matrix, and the results of the calculation to obtain the classification performance indexes of each category are shown in table 5. Where test set accuracy = 88.92% is calculated.

TABLE 5 Performance evaluation index

As a further improvement of the present invention, the attribute method is a target of the attribute method when defining output neurons of a classification problem of a correct class as target neurons, determining contributions of input features to the target neurons; one way is to arrange the attributes of all input features in the input sample shape to form a thermodynamic diagram (hetmap), called attribute map (attributo Maps), which can be observed during model training. Matlab provides an interpretable method of many attribute method classes, such as activation graphs, grad-CAM, occlusion sensitivity, LIME and deep stream, that converts network behavior into output that can be interpreted by a user. This interpretable output may then answer questions about the network prediction.

The activation graph is a visualization of the activation of each layer. Most convolutional neural networks learn simple features such as color and edges in the first convolutional layer. In a more advanced convolutional layer, the network learns more complex features. The image is delivered over the network and the output activation of the layers is checked. Specifically for this example, an activation map is made for 96 channels of the first convolutional layer.

It can be seen that not all channels can learn features first, and that the network is learning low-level elements, the outline of the node's boundaries. The most active channel can be calculated using Matlab functions.

The method can also observe a deeper activation diagram, change conv1 into conv3, calculate the channel with the strongest activation, and can see that the network learns deeper features.

The thermodynamic approach is an interpretation of the model with Grad-CAM, occlusion sensitivity and LIME, where LIME compares which features benign and malignant images are based on, respectively, in the prediction process, and it can be seen that under LIME, whether benign or malignant, the main basis is the nodule region, other lighter colors are distributed around the nodule, and it can be understood that the boundary of the nodule is also a big basis, which is consistent with the use of Gard-CAM, and for occlusion sensitivity, low resolution and high resolution examination occlusion results are compared, and it is seen that at low resolution, the thermodynamic diagram is nearly consistent with Gard-CAM, and at high resolution, the part of interest moves to the edge part.

Regardless of the method, the final result is not the nodule kernel and the boundary, which is the same as the rule that the doctor focuses on the nodule edge when observing the image, the echo degree of the kernel is the same as the first trace, and the model can be well explained.

As a further improvement of the present invention, the non-attribute method is a method of making an interpretability for a specific problem, not a separate analysis as in the attribute method. Including attention-seeking diagrams, concept vectors, similar images, text proofs, expert knowledge, and intrinsic interpretations; there are mainly two non-attribute methods based on expert knowledge, and various methods are used to relate expert knowledge to the characteristics of a model and to formulate a way for prediction and interpretation using domain-specific expert knowledge.

It is not difficult to find that the feature of the model learning finally obtained in the attribute method is consistent with the expert, namely, the expert knowledge in the medical field is used as an interpretation rule, and the model learning method is also a non-attribute method. In addition, a method will be discussed herein that correlates model features with expert knowledge.

Judging that the characteristics of the benign and malignant general attention of the breast nodule are the characteristics of internal echo uniformity, the regularity, the definition and the like of the boundary, and in order to utilize a non-attribute method, namely, correlating the model characteristics with expert knowledge, cutting the nodule into sub-images of an inner core and a boundary so as to achieve the purpose of simulating the viewing angle of a doctor in a film reading manner;

Locating mask image pixels by the calculation formulas 1 and 2:

wherein A (i, j) is all pixel points in the mask image; area (a) is an area function of the entire connected region; (X) ₀ ，Y ₀ ) Is the midpoint of the entire connected region;

from the midpoint (X) ₀ ，Y ₀ ) Locating edge points h (X ₀ ，Y ₀ ) The calculation formula is shown in 3:

closing the extracted edge points by an edge closing technology to obtain a segmentation map of the edge of the nodule;

the calculation formula of the edge detection operator is shown in the formula 4:

firstly, locating the boundary pixel coordinates of a mask image, then corresponding to an original image, and cutting the original image to obtain a sub-image data set containing boundary features; and then locating pixel coordinates without boundary in a similar way, namely the positions of the pixel points of the kernel, and cutting the original image to obtain a sub-image data set with kernel characteristics.

Obtaining a final prediction result by a classification model into benign and malignant categories of a plurality of sub-images, and not obtaining a diagnosis result, so that a final comprehensive recognition result is obtained by adopting a centralized learning mode, and the algorithm adopts a Soft Voting mechanism; the principle of action of the soft voting mechanism is that a minority obeys majority, namely, the average value of probabilities of a certain type in benign and malignant diagnosis results of all sub-images is used as a prediction standard, and the category with the highest prediction probability is the final prediction result.

The invention has the beneficial effects that:

(1) Designing a classification segmentation network, wherein the classification segmentation network comprises a mobility net v2 network, a deep labv3+ network optimization modification method and an AlexNet network migration learning method;

(2) The network was evaluated with the segmentation model miou= 83.58%. The accuracy rate of the classification model test set reaches 88.61%, and in addition, multi-dimensional evaluation is carried out on the segmentation and classification models, including superposition comparison on a single image and confusion matrix of the classification model test set result, so as to obtain various evaluation indexes;

(3) The model is interpreted by utilizing the attribute method, and the comparison between various interpretable methods and the view angle of a doctor is included, so that the focus of the model in the training process is consistent with the doctor, and the model has a certain interpretability;

(4) The method is realized by utilizing a non-attribute method to explain the model, dividing and cutting the boundary and the inner core of the node according to the characteristics of the node in the view angle of a doctor, and carrying out integrated learning on the sub-images to obtain an interpretable diagnosis result of comprehensive discrimination.

Drawings

FIG. 1 is a flow chart of the present invention.

Fig. 2 is a dataset image, where (a) is an original image and (b) is a mask image.

Fig. 3 is an image nodule pixel ratio.

Fig. 4 is a deeplabv3+ model training schedule.

Fig. 5 is a segmentation test result, wherein the left graph is an original image, and the right graph is a segmentation result.

Fig. 6 is a graph of a recombinant comparison.

Fig. 7 is a view of an ultrasound image sample in which (a) is benign nodules, (b) is malignant nodules, and (c) is non-nodules.

Fig. 8 is a diagram of an AlexNet transition learning network.

Fig. 9 is an AlexNet model training schedule.

Fig. 10 is a classification confusion matrix.

Fig. 11 is a conv1 activation diagram.

Fig. 12 is a conv1 activated strongest channel plot compared to the original image.

Fig. 13 is the most active channel.

Fig. 14 is a comparison of various interpretative methods.

Fig. 15 is a nodule boundary diagram.

Fig. 16 is a diagram of the nodule nuclei.

Fig. 17 is a structural diagram of the voice algorithm.

FIG. 18 is a load and recognition result interface.

Detailed Description

The present invention is further illustrated in the following drawings and detailed description, which are to be understood as being merely illustrative of the invention and not limiting the scope of the invention.

As shown in fig. 1, the present invention provides an ultrasound breast nodule image interpretation method based on deep learning, comprising the steps of:

In this embodiment, the step (2) of constructing a pre-training segmentation model includes preprocessing a dataset and designing a segmentation network, wherein the dataset obtained in the preprocessing of the dataset includes three types of breast ultrasound images including benign nodules, malignant nodules and non-nodules and mask images marked by doctors, as shown in fig. 2, for training the images, original data and tag data are required, so that the mask images need to be preprocessed to generate final tag images, the tag image data format is a two-dimensional array, the nodules are "1", and the other are "0"; there are only two values 0 and 1 in the image, where 0,1 is just a pixel label, representing the other and nodule, respectively; whereas the segmentation network is actually also a classification network, except that the classification object is each pixel.

In this embodiment, the split network is designed to design a deep labv3+ network using mobilenv 2 as the backbone network.

TABLE 1 Deeplabv3+ model Performance comparison for different backbone networks

Furthermore, in an ideal case, all classes have an equal number of observations, but for ultrasound image images, it is often the case that the nodule ratio is small, the nodule (Tubercle) in the dataset is compared to other background image (Back) pixels besides the nodule, such as shown in FIG. 3. Since learning is biased toward the dominant part of the image, this imbalance can greatly affect the learning process if not handled properly. The pixel ratio of the image nodule is pre-calculated and modifying the input layer parameters with class weights will improve network performance.

In this embodiment, to improve network accuracy, the original data set may be randomly transformed during training, i.e., data enhancement; by using data enhancement, more categories can be added to the training data without increasing the number of labeled training samples; applying the same random transform to the image and pixel label data in Matlab; the present example inverts the image along the X-axis and Y-axis.

In the embodiment, the parameters of the model training part are shown in table 2, and an Adam optimizer is used, so that a plurality of algorithms are applied to deep learning model training, and Adam can update the weight of the neural network by using training data, so that the model training part is a first-order optimization algorithm which can replace the traditional random gradient descent process; setting an initial learning rate to be 0.0001, wherein the learning rate is reduced by 0.3 times every 10 epochs, and the maximum epochs=50 of training iteration enables a network to quickly learn at a higher initial learning rate, and meanwhile, a solution close to local optimum can be found when the learning rate is reduced; considering that the size of the video memory and the Batchsize are required to be moderate, the batch size is properly reduced to reduce the memory usage amount, and the size is set to be 8; training is carried out by using the trainNet after the training parameters are set, and the accuracy of the training set and the Loss function image are shown in figure 4.

TABLE 2 partial model training parameters

In this embodiment, mlou is one of the most used evaluation indexes for semantic segmentation, and besides, various other evaluation indexes are calculated, including performing coincidence comparison on a single Zhang Yanzheng image to test the network and checking the influence of each class on the overall performance.

TABLE 3 Overall dataset evaluation index

Wherein, the mIoU is 83.58% by testing on the whole verification set.

Table 4 class evaluation index of random individual images

The network was tested on a single test image, with the original image (left) and segmentation result (right) shown in fig. 5.

The test network is shown in fig. 6 in coincidence contrast to GroundTruth. The areas with different colors are different from the group Truth, so that the nodule segmentation effect is good.

In this embodiment, the pre-training classification model is built in the step (3), and includes data set preprocessing and classification network design, where three types of ultrasound image samples are obtained in the data set preprocessing, and as shown in fig. 7, the three types of ultrasound image samples include benign nodules, malignant nodules and no nodules, and total 2826 pieces. Cleaning the data set, and marking the data set with class labels, wherein the class labels are respectively benign corresponding to benign, malignant corresponding to malignant, normal corresponding to normal images without nodules, and the sizes of the normal images are uniform and are the input pixel sizes (227 multiplied by 3) required by a network input layer; after pretreatment, the data set is divided into a training set, a verification set and a test set according to the ratio of 6:2:2.

In this embodiment, the classification network is designed to use an AlexNet convolutional neural network as the classification network, and the construction of the classification network includes transfer learning and network training, where the transfer learning is widely applied to a large number of deep learning training processes, and the pre-training network is adjusted; new tasks are performed after a pre-training network (e.g., alexNet) inputs new data containing previously unknown classes, making some improvements to the network, such as classifying only benign nodules, malignant nodules, and no nodules, rather than classifying 1000 different objects. Yet another advantage is that much less data is required, processing thousands of pictures rather than millions of pictures, and thus the computation time is reduced to hours or minutes.

Migration learning requires interfaces for accessing the interior of existing networks, so that accurate modifications and enhancements can be made to new tasks; only the nodules need to be classified, while AlexNet is trained to identify 1000 classes, which need to be modified to identify only three classes. For this purpose, the last full connection layer and the last output layer are modified, the size of the input pixels of the image is 227×227×3, the classified output is 3 kinds, and the migration learning structure diagram is shown in fig. 8.

In the embodiment, an sgdm optimizer, namely random gradient descent, is used in network training, the overall samples are divided into small batches to be trained, and a gradient descent method is used for adjusting network model parameters; otherwise, similar to the segmentation network, the initial learning rate is 0.0001, the MiniBatch size is set to 8, the verification frequency is set to 100, data enhancement is carried out on the data, the training progress is as shown in fig. 9, the curve can be normally converged, the curve is smoother, the verification accuracy rate curve is quickly increased to be converged to about 88.61%, and the Loss curve is reduced to be about 0.

In this embodiment, a confusion matrix is made for the test set as shown in fig. 10, where the rows in the matrix represent real classes, and the columns represent the real classes, and if the first row and the second row represent the number of benign predictions as malignant in the test set is 12.

TABLE 5 Performance evaluation index

In this embodiment, the attribute method is a target of the attribute method when defining the output neuron of the classification problem of the correct class as the target neuron and determining the contribution of the input feature to the target neuron; one way is to arrange the attributes of all input features in the input sample shape to form a thermodynamic diagram (hetmap), called attribute map (attributo Maps), which can be observed during model training. Matlab provides an interpretable method of many attribute method classes, such as activation graphs, grad-CAM, occlusion sensitivity, LIME and deep stream, that converts network behavior into output that can be interpreted by a user. This interpretable output may then answer questions about the network prediction.

The activation graph is a visualization of the activation of each layer. Most convolutional neural networks learn simple features such as color and edges in the first convolutional layer. In a more advanced convolutional layer, the network learns more complex features. The image is delivered over the network and the output activation of the layers is checked. Specifically for this example, the activation map for 96 channels of the first convolutional layer is shown in FIG. 11.

It can be seen that not all channels can learn features first, and that the network is learning low-level elements, the outline of the node's boundaries. The most active channels can be calculated using Matlab functions as shown in fig. 12.

The more advanced activation diagram can be observed, conv1 is changed into conv3, the channel with the strongest activation is calculated as shown in fig. 13, and the more advanced features of the network can be seen.

The thermodynamic diagram is illustrated in fig. 14 using the Grad-CAM, occlusion sensitivity and LIME study model, wherein LIME compares which features the benign and malignant images are based on, respectively, in the prediction process, and it can be seen that the main basis is the nodule region under LIME, whether benign or malignant, other lighter colors are distributed around the nodule, and it can be understood that the boundary of the nodule is also a big basis, which is consistent with using Gard-CAM, comparing the occlusion results of low resolution and high resolution examination for occlusion sensitivity, and that at low resolution, the thermodynamic diagram is nearly consistent with Gard-CAM, and at high resolution, the interesting part moves to the edge part.

In this embodiment, the non-attribute method is a method of making an interpretability for a specific problem, and is not a separate analysis as in the attribute method. Including attention-seeking diagrams, concept vectors, similar images, text proofs, expert knowledge, and intrinsic interpretations; there are mainly two non-attribute methods based on expert knowledge, and various methods are used to relate expert knowledge to the characteristics of a model and to formulate a way for prediction and interpretation using domain-specific expert knowledge.

locating mask image pixels by the calculation formulas 1 and 2:

firstly, positioning the boundary pixel coordinates of a mask image, then corresponding to an original image, and cutting the original image to obtain a sub-image data set containing boundary features, as shown in fig. 15; and then locating pixel coordinates without boundary in a similar way, namely the positions of the pixel points of the kernel, and cutting the original image to obtain a sub-image data set containing kernel features, as shown in fig. 16.

The final prediction result is obtained by a classification model and is of a benign and malignant category of a plurality of sub-images, and a diagnosis result cannot be obtained, so that a final comprehensive recognition result is obtained by adopting a centralized learning mode, and a Soft Voting mechanism is adopted in an algorithm, and the structure is shown in figure 17; the principle of action of the soft voting mechanism is that a minority obeys majority, namely, the average value of probabilities of a certain type in benign and malignant diagnosis results of all sub-images is used as a prediction standard, and the category with the highest prediction probability is the final prediction result.

After the design and evaluation of the split network and the classification network are carried out, an application program Designer (APP Designer) tool in a Matlab tool box can be utilized for carrying out APP packaging and release, the APP Designer can manufacture APP under the condition that no professional software program developer exists, various control design user interfaces are placed in a design view, and the application behaviors of the control design user interfaces are programmed in a code view, so that the software with good man-machine interaction is formed. The APP designer integrates two main tasks of application construction, namely laying out visual components of the graphical user interface and programming the application behavior.

Firstly, designing a user interface, dragging a control in a component library in a design view area to perform design layout, and generating a corresponding code in a code view by an APP Designer; then defining application behaviors, defining the behaviors of an application program by using a Matlab editor, specifically to the project, programming a button, automatically calling a segmentation model and a classification model after the button imports an image, realizing segmentation and classification, and giving a judgment accuracy and a diagnosis report. And then, software testing is carried out, the software can be released after being packaged, the software can be deployed in an APP toolbox of Matlab, and the APP in the APP toolbox option is opened.

Clicking the import image button can jump to the folder to select the image picture to be diagnosed, then entering the loading interface, completing autonomous analysis within 10 seconds by software time, identifying the regularity, definition and echo calcification degree of the kernel of the node boundary, comprehensively diagnosing according to the characteristics, displaying the segmentation result in the image frame on the right side, and displaying the diagnosis result of accuracy and interpretability on the lower right side. As shown in fig. 17, when the image-in button is clicked, after a benign nodule image test is selected, the nodule is segmented and extracted within 10 seconds and color-marked, and the nodule boundary is clear and the morphological rule is displayed according to a plurality of features in the identified image, and the benign nodule is primarily determined. Through a large number of repeated tests, the software interface is friendly, the operation is convenient and quick, and no abnormality is found.

The technical means disclosed by the scheme of the invention is not limited to the technical means disclosed by the embodiment, and also comprises the technical scheme formed by any combination of the technical features. It should be noted that modifications and adaptations to the invention may occur to one skilled in the art without departing from the principles of the present invention and are intended to be within the scope of the present invention.

Claims

1. The ultrasonic breast nodule image interpretation method based on deep learning is characterized by comprising the following steps of:

(1) Acquiring a breast nodule data set and marking, and preprocessing an ultrasonic image picture, wherein the preprocessing comprises labeling the picture, marking a label, and adjusting the size to adapt to a network model and preprocessing of histogram equalization and wavelet denoising;

(2) Building a pre-training segmentation model, comparing and using a plurality of backbone networks including Xreception, resNet-50 and MobileNetv2, adjusting training super parameters, training the model and evaluating, wherein the evaluation mode comprises MeanIoU, weighteIoU, meanBFScore and comparing a prediction result on a single picture with a group trunk;

2. The method of claim 1, wherein the step (2) is performed with a pre-training segmentation model comprising data set preprocessing and segmentation network design, wherein the data set obtained in the data set preprocessing comprises three types of breast ultrasound images including benign nodules, malignant nodules and non-nodules and mask images marked by doctors, and for training the images, raw data and tag data are required, so that the mask images are preprocessed to generate final tag images, the tag image data format is a two-dimensional array, the nodules are 1, and the others are 0; there are only two values 0 and 1 in the image, where 0,1 is just a pixel label, representing the other and nodule, respectively; whereas the segmentation network is actually also a classification network, except that the classification object is each pixel.

3. The deep learning based ultrasound breast nodule image interpretability method of claim 2, wherein the segmentation network is designed to design a deep v3+ network using mobilenet v2 as a backbone network and pre-calculate pixel ratios of image nodules, modifying input layer parameters with class weights would improve network performance.

4. The deep learning based ultrasound breast nodule image interpretability method of claim 3, wherein to improve network accuracy, the original dataset is randomly transformed during training, i.e., data enhancement; by using data enhancement, more categories can be added to the training data without increasing the number of labeled training samples; applying the same random transform to the image and pixel label data in Matlab; the image is flipped along the X-axis and Y-axis.

5. The deep learning based ultrasound breast nodule image interpretability method of claim 3, wherein the model training section uses Adam optimizers, adam can update the weights of the neural network with training data instead of the traditional random gradient descent process; setting an initial learning rate to be 0.0001, wherein the learning rate is reduced by 0.3 times every 10 epochs, and the maximum epochs=50 of training iteration enables a network to quickly learn at a higher initial learning rate, and meanwhile, a solution close to local optimum can be found when the learning rate is reduced; considering that the video memory size and the Batchsize are to be moderate, the batch size is set to 8; training is carried out by using the trainNetwork after the training parameters are set.

6. The deep learning based ultrasound breast nodule image interpretability method of claim 1, wherein the step (3) is to build a pre-training classification model comprising data set preprocessing and classification network design, wherein three types of ultrasound image samples are obtained in the data set preprocessing, including benign nodules, malignant nodules and non-nodules, and the data set is cleaned, and the data set is marked with category labels, respectively, "benign" corresponds to benign, "malignant" corresponds to malignant, "normal" corresponds to normal images without nodules, and the size is uniform and is the input pixel size required by the network input layer; after pretreatment, the data set is divided into a training set, a verification set and a test set according to the ratio of 6:2:2.

7. The deep learning based ultrasound breast nodule image interpretability method of claim 6, wherein the classification network is designed to use an AlexNet convolutional neural network as the classification network, the construction of the classification network comprising transfer learning and network training, wherein the transfer learning is widely applied to a large number of deep learning training processes, adjusting the pre-training network; inputting new data containing previously unknown classes into the pre-training network, performing new tasks after making some improvements to the network, namely classifying benign nodules, malignant nodules and non-nodules, and accurately modifying and enhancing the new tasks; the last full connection layer and the output layer are modified, the size of the image input pixels is 227×227×3, and the classified output is 3 types.

8. The deep learning based ultrasound breast nodule image interpretability method of claim 7, wherein the overall samples are trained in small batches using a sgdm optimizer, random gradient descent, in the network training, and the network model parameters are adjusted using a gradient descent method; the initial learning rate was 0.0001, the minibatch size was set to 8, the verification frequency was set to 100, and data enhancement was performed on the data.

9. The deep learning based ultrasound breast nodule image interpretability method of claim 1, wherein the attribute method is a target of defining output neurons of a correct class of classification questions as target neurons, determining the contribution of input features to the target neurons as targets of the attribute method, including activation of graphs, grad-CAM, occlusion sensitivity, LIME and deep stream, converting network behavior into user interpretable output which can then answer questions about network predictions; the non-attribute method is a method for making an interpretability for a particular problem, including attention-seeking, concept-vector, similar images, text-proofing, expert knowledge, and intrinsic interpretability; there are mainly two non-attribute methods based on expert knowledge, and various methods are used to relate expert knowledge to the characteristics of a model and to formulate a way for prediction and interpretation using domain-specific expert knowledge.

10. The deep learning based ultrasound breast nodule image interpretability method of claim 9, wherein the characteristics of benign and malignant attention of the breast nodule are characterized by internal echo uniformity and boundary regularity and definition, and in order to use a non-attribute method, namely, associate model characteristics with expert knowledge, the nodule is cut into sub-images of two parts of an inner core and a boundary so as to achieve the purpose of simulating a doctor's viewing angle;

locating mask image pixels by the calculation formulas 1 and 2:

firstly, locating the boundary pixel coordinates of a mask image, then corresponding to an original image, and cutting the original image to obtain a sub-image data set containing boundary features; and then locating pixel coordinates without boundary in a similar way, namely, the positions of the pixel points of the kernel, and cutting the original image to obtain a sub-image data set with kernel characteristics; the final prediction result is obtained by the classification model and is the benign and malignant category of a plurality of sub-images, and the diagnosis result cannot be obtained, so that a final comprehensive recognition result is obtained by adopting a centralized learning mode, and the algorithm adopts a Soft Voting mechanism.