CN111476292B

CN111476292B - Small sample element learning training method for medical image classification processing artificial intelligence

Info

Publication number: CN111476292B
Application number: CN202010262936.7A
Authority: CN
Inventors: 李功杰; 马潞娜
Original assignee: Beijing Quanjingdekang Medical Imaging Diagnosis Center Co ltd
Current assignee: Beijing Quanjingdekang Medical Imaging Diagnosis Center Co ltd
Priority date: 2020-04-03
Filing date: 2020-04-03
Publication date: 2021-02-19
Anticipated expiration: 2040-04-03
Also published as: CN111476292A

Abstract

The small sample element learning training method for medical image classification processing artificial intelligence is characterized by constructing three element learners including a multi-scale CNN feature extractor, a measurement learner and a classification discriminator and designing measurement standards on the element learners at the same time; on each task of the training set, a target set is learned through support set distance measurement, a measurement standard is finally obtained through learning, and then for a new task of the test set, the target set can be rapidly and correctly classified only by means of a small number of samples of the support set; the practicability and reliability of artificial intelligent automatic detection in the field of difficult and complicated diseases are improved by adopting meta-learning, the defect that the single disease data amount is small and the disease is dispersed is overcome, the classification and sample-less learning training of a medical image processing intelligent system is completed, the accuracy of filtration is obviously improved, the accuracy of medical image classification is effectively enhanced, the production efficiency is greatly improved, and the upgrading of an intelligent diagnosis deep learning technology in the medical industry is facilitated.

Description

Small sample element learning training method for medical image classification processing artificial intelligence

Technical Field

The invention relates to a structure improvement technology of equipment, belonging to the technical field of machine learning, pattern recognition and medical image processing of G06T7/11 in IPC classification, or G06K9/00 for reading or recognizing printed or written characters or for recognizing graphs, in particular to a small sample meta-learning training method of artificial intelligence for medical image classification processing.

Background

At present, medical images including X-ray imaging, CT imaging, magnetic resonance imaging, ultrasonic imaging, nuclear medicine imaging and the like can enable doctors to know changes of internal forms, functions, metabolism and the like of the bodies of patients besides contact and anatomy, and have important effect on diagnosing causes and conditions of the patients. Medical imaging plays an extremely important role in medical clinical diagnosis, and modern medicine is not away from medical imaging technology. The medical image auxiliary diagnosis system developed by science development, Tencent development, Ali health and the like can assist doctor diagnosis in specific medical fields such as lung CT image diagnosis, early esophageal cancer screening and the like by means of high-quality high-resolution images, a computer can automatically cut medical image images and classify the images by learning a large amount of images and diagnosis data, suspicious tissues are amplified and observed for multiple times in different scale scales, important information is extracted, and finally suggestions of whether lesions exist are output to assist image doctors to make decisions, so that the diagnosis efficiency is greatly improved, meanwhile, the phenomena of missed diagnosis and misdiagnosis can be effectively reduced, and the future medical development is greatly promoted.

With the development of internet technology and the improvement of informatization level, the image data volume also shows explosive growth. In a huge image database, how a computer efficiently picks up image data poses certain challenges to image classification technology. In recent years, deep learning is rapidly developed, and by means of higher accuracy and recognition efficiency of the deep learning, image classification technology by means of the deep learning gradually replaces manual labeling features to classify images, but the deep learning often has the defects that parameters are difficult to adjust in the training process, the requirement of training samples is large, and the training time is too long.

In order to solve the problems, it is meaningful to research how to efficiently utilize deep learning to perform image recognition under the condition of a small number of samples, and the adaptability of a deep learning model under different sample conditions is enhanced.

One key feature of human intelligence is versatility (versatility): the ability to accomplish different tasks. The current AI systems are good at mastering individual skills such as go, Jeopardy games and even stunt flight of helicopters. However, letting AI systems do something that seems simple, it is rather difficult for them to do. The program that takes Jeopardy champion cannot talk, and a professional helicopter stunt flight controller cannot navigate in a completely new simple environment, such as locating a fire scene, flying past, and then extinguishing it. However, humans can flexibly cope with and adapt spontaneously in the face of a variety of new situations. Humans are expected to achieve such versatility for artificial intelligence.

Since the development of deep learning, various intelligent devices are rapidly developed, but tens of thousands or even millions of data are required for intelligent programs to train, and for example, in the case of the recently named AlphaGo, the chess playing is a computer, but the subsequent stage of the chess needs huge data to support training to achieve the effect. It is also a short board for deep learning, and the training data needs too much, but sometimes some training tasks cannot find so much training data. In this case, if the deep learner is trained with less training data, an over-fitting phenomenon occurs, which is not beneficial to application, and therefore, for the task with less samples, a learner is required to successfully apply a small amount of data to perform good training to achieve the functions required by the learner.

It is reported that the recent technology developed in some laboratories, model-independent meta-learning (model-independent learning), was tried, and the results were unknown. For a person to get a new item first, you can quickly learn and become familiar with it by exploring it, and this learning ability is not available with current machines. Therefore, people think that if the machine can also have the learning capability, the machine can quickly learn when the sample size is small, which is called meta learning (meta learning). For example, a small sample learning article, "Human-level content learning through basic programming index", was published in the Science journal cover 2015, which proposes a Bayesian Programming Learning (BPL) framework to implement One-shot learning, and data sets adopt handwritten characters from different world alphabets. It was found that the task of small sample classification can be well achieved through meta-learning.

In recent years, the learning and classification of small samples are rapidly developed, and in the face of a plurality of classification tasks, task requirements can be met by training one model. The meta-learning method is more, and for the greatest applicability, the mechanism of the meta-learning is the universality of tasks, namely different tasks are faced without constructing different models, and the same set of learning algorithm can be used for solving various different tasks. The learnable parameter theta of one model is defined, and the corresponding task can be solved by changing the value of the parameter theta in different tasks. The value of the parameter theta can be learned through the meta-learner, when different tasks are faced, the theta value is continuously updated through a gradient descending method according to a loss function, the model is continuously close to the model capable of solving the tasks, and when the theta value is finally converged, the meta-learner learns a better parameter theta to enable the model to adaptively solve the corresponding tasks. The algorithm does not build or introduce other parameters for the learner, and the strategy for training the learner uses known optimization processes such as gradient descent.

Aiming at a plurality of problems of small sample identification, such as high possibility of overfitting, poor model generalization and the like, a small sample classification identification model based on deep learning is provided through experimental research, and the optimization direction is mainly based on the following two aspects, namely the enhancement of the number of image samples and the optimization of the identification model. When sample number enhancement is carried out, a technology of combining a generative model and image preprocessing is proposed. Firstly, a full-connection generating model is used for sample enhancement, and a convolutional neural network is used for replacing a full-connection neural network to perform image training aiming at the problem of excessive parameters of the full-connection neural network. Because the generated sample image has randomness, the sample is generated by using the condition generating model, and the generated sample set contains labels and can be well applied to subsequent supervised classification learning. Aiming at the problem that a sample generated in a condition generation model is fuzzy, an image edge detection technology based on wavelet transformation and self-adaptive mathematical morphology is provided. The method can well overcome the problem of edge blurring, and in addition, because the generated samples often have noise, the optimized denoising model can well remove the noise and simultaneously reserve the edge details of the image based on the image denoising technology combining empirical mode decomposition and sparse representation. Therefore, the image sample number enhancement by the fusion model can well enlarge the samples for the classification model of the next part. In the small sample classification model, the model classification is carried out by using the transfer learning, and the transfer learning can always achieve higher recognition efficiency in a small training sample set. And combining the generation model with the transfer learning, and carrying out small sample image identification by using the fusion model. The transfer learning is trained by adopting an input-V3 model, compared with the classification training of a convolutional neural network, the generalization capability of the model can be improved by adding the transfer learning, the training efficiency of the model is also greatly improved, and when a sample is missing, the accuracy of identification is also greatly improved because the classification model based on the transfer learning only needs to be retrained. The method is combined with a sample enhancement technology, when a sample is missing, model fusion and classification are carried out by adding a mode of combining a simulated sample and transfer learning, a standard data set and a shot leaf image are used for experimental comparison, and in the aspect of accuracy, compared with the mode of simply using transfer learning and a convolutional neural network, the method is improved to some extent, and has good reference significance for deep learning classification in an experimental environment with insufficient sample amount.

In the medical industry, the greatest contribution of computer functions is to help doctors to filter a large number of medical images, so that the use of diagnosis is at most an auxiliary diagnostic tool, and machine learning reaches the time of deep learning, and tasks which are not possible are overcome by deep learning algorithms.

The related patent documents are less published.

Nanjing university and Jiangsu Wanweieisi network intelligent industry innovation center limited company 201910374698.6 disclose a point interactive medical image segmentation method based on a deep neural network, and a point interactive deep learning segmentation algorithm is provided specially for the problem of renal tumor segmentation in medical images. The algorithm consists of a point interaction preprocessing module, a bidirectional ConvRNN unit and a core depth segmentation network. The algorithm starts from a tumor center position provided by an expert, 16 image blocks with the size of 32 multiplied by 32 are densely collected from inside to outside according to 4 pixel step lengths in 16 directions with uniform intervals to form an image block sequence, a deep segmentation network with sequence learning is used for learning the change trend of the inside and the outside of a target, the edge of the target is determined, and the renal tumor is segmented. The method can overcome the influences of low contrast of medical images, variable target positions and fuzzy target edges, and is suitable for organ segmentation and tumor segmentation tasks. The technology has the following characteristics: 1) the interaction mode is simple and convenient; 2) the Sequential Patch Learning concept is put forward, the long-range semantic relation is captured by using sequence image blocks, and a large receptive field can be obtained even for a shallow network; 3) a brand-new ConvRNN unit is provided, the internal and external change trend of a target is learned, the interpretability is strong, the actual working mode of a doctor is met, and the final model is high in precision and strong in applicability.

Nanjing aerospace university in China patent application 201610654316.1 discloses a color image significance detection method based on background and foreground information, which comprises the following steps: performing over-segmentation processing on an input color image to obtain a series of super pixel blocks; selecting background seeds, and obtaining roughness significance by comparing the characteristics of each superpixel block and the background seeds; defining the background weight of each super pixel block based on the characteristic distribution of the background seeds, and improving the rough significance through the background weight to obtain the significance based on the background information; segmenting the saliency map based on the background information formed in the previous step, selecting a compact foreground region from all segmentation results, extracting foreground region characteristics, and obtaining the saliency based on the foreground information through characteristic comparison; and integrating the significance degrees based on the background and foreground information obtained in the first two steps, and performing smoothing operation to obtain the optimized significance degrees of all the superpixel blocks. The foreground object in the image can be highlighted more consistently, and the suppression effect on the background noise in the image is good.

Microsoft licensing, llc, in chinese patent application 201510362657.7 proposes a method involving a machine vision processing system. Techniques and constructs may determine an albedo map and a shadowed map from a digital image. The albedo and shadowed map may be determined based at least in part on a color difference threshold. A color-shaded map can be determined based at least in part on the albedo map, and the illumination factor can be determined based on the color-shaded map. The digital image may be adjusted based at least in part on the illumination coefficient. In some examples, respective shaded maps for respective color channels of the digital image may be generated. The color-shaded map may be generated based at least in part on the shaded map. In some examples, multiple regions of a digital image may be determined, and proximity relationships between the various regions may be determined. The albedo shaded map may be determined based at least in part on the proximity relationship.

Tianjin university discloses an image classification method for two-way channel attention element learning in Chinese patent application 201910770542.X, wherein the relationship between a support set and a query set in the element learning is learned through an attention model mechanism, so that the support set and the query set pay attention to each other, and the relationship between the support set and the query set is mined on the assumption that when the visual features of an image sample in the support set are mapped to category features, the attention of the sample in the support set to the significant region of the sample in the support set is considered, and the attention of the sample in the support set to the significant region of the sample in the query set is also considered, so that the attention of a network to the significant and detailed region of the image features is improved, the convergence speed is accelerated, and the performance of the image classification method based on element learning and few samples is further improved.

The Chinese patent application 201910143448.1 of Zhongshan university proposes a method for classifying small and zero sample images based on metric learning and meta learning, relating to the field of computer vision recognition and transfer learning, and comprising the following steps: constructing a training data set and a target task data set; selecting a support set and a test set from a training data set; respectively inputting the samples of the test set and the support set into a feature extraction network to obtain feature vectors; sequentially inputting the feature vectors of the test set and the support set into a feature attention module and a distance measurement module, calculating the class similarity of the test set sample and the support set sample, and updating the parameters of each module by using a loss function; repeating the steps until the parameters of the network of each module are converged, and finishing the training of each module; and sequentially passing the picture to be tested and the training picture in the target task data set through the feature extraction network, the feature attention module and the distance measurement module, and outputting a class label with the highest class similarity with the test set, namely the classification result of the picture to be tested.

Haqing encyclopedia of technology, Inc. in Chinese patent application 201910684593.0, proposes an ensemble learning based image classification system comprising a plurality of Cellular Neural Network (CNN) based Integrated Circuits (ICs) operatively coupled together as a set of basic learners for the image classification task. Each CNN-based IC is configured with at least one distinct deep learning model in the form of filter coefficients. The ensemble learning based image classification system further includes: a controller configured as the integrated meta learner; and a memory-based data buffer for holding various data used by the controller and the CNN-based IC in the integration. The various data may include input image data to be classified. The various data may also include extracted feature vectors or image classification outputs from the set of base learners. The extracted feature vectors or image classification outputs are then used by the meta learner to further perform the image classification task.

According to the application of extensive literature in medical image processing by deep learning, most of the work is based on 2D images, in fact, in medical images, both CT and MRI are the result of 2D transformation of 3D data, and 3D reconstruction and the like are also a very important large class of algorithms in medical image processing, however, the existing 3D-based algorithm is time-consuming and not much improved compared with the 2D-based algorithm, the more traditional algorithm regards CNN as an operator for feature extraction to obtain the description features of the image, and the latest methods directly use the CNN result as the final output result, which involves the limitation of the black box, has poor interpretability, and is more suitable for feature interpretation, the interpretability is better because of the more rule classes that can be added in some of the subsequent post-processing.

The artificial intelligence technology is used for assisting in the field of diagnosis in medical images and mainly depends on massive data for training. But most medical image data originates from hospitals. And the third-level hospital has most of image data, but in practical application, the data volume of a single disease type is insufficient, the data is dispersed, the disease type is dispersed, so that the artificial intelligent medical auxiliary diagnosis system does not have enough training data for training, and the accuracy rate cannot be guaranteed. Although the purpose of small sample learning and meta learning is to learn quickly, and the amount of training samples adopted in all researches in recent years is small, a large sample is still required for supporting the autonomous learning of a machine.

Disclosure of Invention

The invention aims to provide a small sample meta-learning training method for artificial intelligence of medical image classification processing, which solves the problem of medical small samples aiming at the phenomenon of less data in the medical field, realizes small sample classification by adopting a meta-learning method, improves the practicability and reliability of artificial intelligence automatic detection in the field of difficult and complicated diseases, and effectively enhances the accuracy of medical image classification.

The aim of the invention is achieved by the following technical measures: constructing three meta-learners including a multi-scale CNN feature extractor, a metric learner and a classification discriminator, and designing metric standards on the meta-learners; on each task of the training set, a target set is learned through support set distance measurement, a measurement standard is finally obtained through learning, and then for a new task of the testing set, the target set can be rapidly and correctly classified only by means of a small number of samples of the support set. The method specifically comprises the following steps:

reading a data set, namely a training data set, which is a medical image data set labeled for 1000 cases, wherein the training data set comprises a training set, a verification set and a test set;

step 2, carrying out image preprocessing operation, carrying out primary processing on the data in the training data set, specifying the range of the data, specifying the range and the size of the data, and removing irrelevant data;

step 3, data augmentation, namely normalizing the medical image data, and performing image rotation, noise addition and amplification treatment;

step 4, building a multi-scale feature extractor for algorithm model construction, wherein the feature extractor is a CNN feature extractor for extracting multi-scale super-pixel features of the image;

step 5, constructing a metric learning device and calculating the distance between image features;

step 6, setting up a classification judger, and designing a network layer structure and training network parameters;

step 7, entering a model training process, wherein a multi-scale relation network related to the model training process is an end-to-end differentiable structure, and estimating and adjusting the values of parameters in the model by adopting a back propagation algorithm and an adaptive moment;

step 8, finishing output; and obtaining a relation score after learning through a metric learning device according to the relation characteristics, wherein the relation score is a result of the image classification task of the few samples at one time.

Particularly, the superpixel is a small area consisting of a series of pixel points which are adjacent in position and similar in color, brightness or texture characteristics; most of the small areas keep effective information for further image segmentation, and the boundary information of objects in the images is generally not damaged; superpixels are graphs which divide a pixel-level graph into area-level graphs, and are abstractions of basic information elements; and for each block of superpixel, extracting a plurality of image patches with different sizes by taking the center of the superpixel as the center to serve as the input of a multi-scale CNN model, and performing feature extraction operation on the image patches corresponding to each block of superpixel by using the multi-scale CNN model by the multi-scale feature extractor based on a relation network.

Particularly, the specific network layer structure consists of two convolution modules, namely a convolution layer, a batch normalization layer, a modified linear unit layer and a maximum pooling layer; wherein the convolutional layers are composed of 64 filters with the size of 3x3, zero padding is not carried out, and the number of channels of the input signal of the first convolutional layer is determined by the number of channels of the input picture; the size of the pooling layer is 2x2, no zero padding is performed; the third and fourth convolution modules are composed of convolution layers, batch normalization layers and modified linear unit layers, wherein each convolution layer is composed of 64 filters with the size of 3x3 and is subjected to zero padding; inputting the picture into a feature extractor to obtain the size of (W, M, 64); a second, third and fourth layer feature map, wherein M represents the width and height of the feature map and 64 represents the depth of the feature map; and then, splicing the second layer characteristic diagram, the third layer characteristic diagram and the fourth layer characteristic diagram in the depth direction to obtain the multi-scale characteristic with the size (M, M, 128).

In particular, the specific structure of the metric learner is composed of two convolution modules and two full-connection layers; first of allThe first convolution module and the second convolution module consist of convolution layers, batch normalization layers, correction and modification linear unit layers and maximum pooling layers; wherein the convolutional layers are composed of 64 filters with the size of 3x3 or 2x2, and the number of channels of the input signal of the first convolutional layer is determined by the number of channels of the relation characteristic, here, the value is 128, the size is 2x2, and zero padding is performed; obtaining a feature graph with the size of (N, N,64) by the relationship features through a multi-scale learning device, and straightening the feature graph and transmitting the feature graph into a full connection layer; the first layer of full connection layer adopts a ReLU activation function, and the second layer of full connection layer adopts a Sigmoid activation function; inputting the relation feature into the metric learner to obtain a value of [0,1 ]]A value of (a), which represents the similarity between two picture features, this similarity being referred to as the relationship score; obtaining N under the condition of one-time N-wayk-shot few-sample image classification task setting²B similarity values; for each sample in the target set, taking the maximum value in the relation scores of the mean values of the samples of each type of the support set as a prediction label: finally, obtaining classification labels of the N x b target set samples; formulating a multi-scale relationship network as follows:

wherein the content of the first and second substances,i＝1,…k；j＝1,…b；n＝1,…N；m-1,…N；

a representation feature extractor;

a representative metric learner;

and

representing the original pixel characteristics of the sample pictures of the support set and the target set;

and

representing multi-scale characteristics of the support set and the target set samples;

a prototype representing each type of sample of the support set;

representing the three-dimensional vector reconnection subtraction and then taking an absolute value;

representing a relationship score;

a prediction tag representing a sample of the target set.

In particular, the network parameter design comprises the design of loss function, the selection of an optimizer and the design of an optimization algorithm, and the method comprises the following processes: for image classification tasks, cross entropy is generally selected to calculate the distance between a predicted probability distribution and an expected probability distribution; in the multi-scale relation network, the measurement learner describes the similarity between two images, and the last layer of network of the measurement learner has only one output node; the designed loss function has the following specific formula:

wherein the content of the first and second substances,

represents the parameters of the feature extractor and is,

a proxy metric learner parameter; argmin means when the following formula reaches the minimum

And

taking the value of (A); i.e. labels for each category of the support set, y_nA label is predicted for the target set picture sample,

the similarity of the same matching pair is 1, and the similarity of different matching pairs is 0;

calculated by the formula (1-4).

In particular, in the training process, after an epoch is finished, namely after the epoch training of different tasks is finished, the accuracy rate on the verification set is calculated, and the highest accuracy rate on the verification set up to now is recorded; when the continuous epochs are not optimal for multiple times, the accuracy rate can be considered not to be improved any more; when the accuracy is not improved any more or the accuracy is gradually reduced, the iteration can be stopped, a model corresponding to the highest accuracy is output, and then the model is used for testing; the calculation formula of the accuracy is as follows:

the epamode represents the number of tasks, and one task is a few-sample image classification task; n is_TTable total number of samples on non-target set;

a predictive tag value representing a sample of the target set,

a label value representing a sample of the target set; b represents the value of each type of picture of the target set, namely the batch number; and N represents the number of sample categories in one small-sample image classification task.

Particularly, the overall output process comprises the steps of inputting a support set and a target set to a feature extractor, obtaining a relation feature by obtaining extracted multi-scale features of the support set and multi-scale features of the target set, and finally obtaining a relation score by a measurement learning device.

The invention has the advantages and effects that: by utilizing meta-learning, the defect that the data amount of a single disease is small and the disease is dispersed is overcome, the learning training of classifying few samples of the medical image processing intelligent system is completed, the filtering accuracy is obviously improved, the production efficiency is greatly improved, and the upgrading of the intelligent diagnosis deep learning technology in the medical industry is facilitated. Feasibility support is provided for the development and application of medical imaging equipment.

Drawings

FIG. 1 is a diagram of a network topology used in the present invention;

FIG. 2 is a functional diagram of a multi-scale CNN feature extractor employed in the present invention;

FIG. 3 is a diagram of a multi-scale CNN feature extractor employed in the present invention;

FIG. 4 is a diagram of a metric learner used in accordance with the present invention;

FIG. 5 is a flow chart of the overall training process employed by the present invention;

Detailed Description

The principle of the invention is that the task of the low sample learning based on the meta-learning is to establish a model with excellent generalization performance, and further, the low sample learning problem of the deep learning method is processed by utilizing the meta-learning. For new category picture tasks which are not seen in the training process, the previous experience knowledge can be well utilized, so that the deep learning method can be adjusted according to the new tasks, the learning ability of the learning society of the network is enhanced, the deep learning and generalization are fast, and the correct classification can be carried out only by means of a small amount of samples of each new category.

In the present invention, the metric (metric), that is, the correlation between two samples expressed by some way, such as euclidean distance, may be considered that in a certain embedding projection space, a neural network with parameters is generally used to complete the embedding process, and samples with closer distances are more similar, that is, they may be classified into the same category. Of course, there are many metric methods, but in recent years, cosine distance and euclidean distance are used more, and the metric-based method is used to realize small sample learning. One key point in metric learning is how to project samples into a space, where samples that are close together are of the same class. The reason why it is possible to express similarity using distance on the omniroot handwritten character data set is because it is possible to use distance to explain similarity in the case where handwritten number recognition is implemented similarly to KNN, since the shape of a handwritten character can be seen in a matrix obtained by projecting the feature matrix of the picture into a space, which is the shape of the handwritten character. In a task with a small sample size based on optimized meta-learning, some common learners have an overfitting phenomenon because of fewer samples, and meanwhile, the learning effect is good in a training process, and the learners can be generally trained to iterate millions or even tens of millions and then can be converged.

The invention adds LSTM network in the meta-learner, uses a small amount of marked samples in the task, optimizes and updates the initialization parameters of the learner through the meta-learner, so that the learner can use only a small amount of samples when facing the corresponding task, and simultaneously, the convergence is faster, thereby achieving the purpose of rapidly learning and solving the task.

The invention realizes picture analysis and target positioning based on deep learning, converts accumulated medical data into an available algorithm model through the combination of computer vision, big data and deep learning technology, analyzes images by utilizing an image processing technology and a machine learning algorithm, detects the position of a focus in the medical images, overcomes the subjective difference between different operators by utilizing an AI technology, lightens the workload of manual processing, helps a doctor to improve the diagnosis efficiency on the aspects of accuracy and speed of a computer, and solves the problem of low accuracy caused by insufficient training data of artificial intelligence in the development of a medical auxiliary diagnosis system.

As shown in fig. 1, the present invention specifically comprises the following steps:

and 7, entering a model training process, wherein a multi-scale relation network related to the model training process is an end-to-end differentiable structure, and estimating and adjusting the values of parameters in the model by adopting a back propagation algorithm and an adaptive moment.

In the invention, three meta-learners, namely a multi-scale CNN feature extractor, a metric learner and a classification discriminator are set up, and simultaneously, the metric standard on the meta-learners is designed; on each task of the training set, a target set is learned through support set distance measurement, a measurement standard is finally obtained through learning, and then for a new task of the testing set, the target set can be rapidly and correctly classified only by means of a small number of samples of the support set. The similarity measurement is learned from a wide task space, so that the experience extracted from the past learning task is used for guiding the learning of a new task, and the purpose of learning how to learn is achieved.

The invention has wide application prospect in the fields of computer-aided diagnosis systems of future medical diseases and the like.

The invention is further illustrated by the following figures and examples.

Example 1: the meta learning is applied to the learning problem of few samples in the field of medical image classification processing, a training network is built, and network parameters are set. The training network supports a small amount of medical data samples to realize rapid and high-precision classification;

in the foregoing, a superpixel is a small region composed of a series of pixel points adjacent in position and similar in color, brightness or texture characteristics; most of the small areas keep effective information for further image segmentation, and the boundary information of objects in the images is generally not damaged; superpixels are graphs obtained by dividing a picture element level (pixel-level) into a region level (discrete-level) and are abstractions of basic information elements. As shown in fig. 2, for each super pixel, a plurality of image patches with different sizes are extracted with the center of the super pixel as the center to be used as the input of the multi-scale CNN model, and the multi-scale feature extractor performs feature extraction operation on the image patches corresponding to each super pixel by using the multi-scale CNN model based on the relational network; as shown in fig. 3, the specific network layer structure is composed of two convolution modules, including convolution layer, batch normalization layer, modified linear unit layer and max pooling layer; wherein the convolutional layers are composed of 64 filters with the size of 3x3, zero padding is not carried out, and the number of channels of the input signal of the first convolutional layer is determined by the number of channels of the input picture; the size of the pooling layer is 2x2, no zero padding is performed; the third and fourth convolution modules consist of convolutional layers, which consist of 64 filters of size 3x3, zero-padded, batch normalization layers, and modified linear cell layers. The picture is input to a feature extractor, resulting in a size of (W, M, 64). Second, third, and fourth layer feature maps, where M represents the width and height of the feature map and 64 represents the depth of the feature map. And then, splicing the second layer characteristic diagram, the third layer characteristic diagram and the fourth layer characteristic diagram in the depth direction to obtain the multi-scale characteristic with the size (M, M, 128).

In the foregoing, the specific structure of the metric learner is shown in fig. 4, and the metric learner is composed of two convolution modules and two full-connection layers in total. The first and second convolution modules are comprised of convolution layers, batch normalization layers, modification linear unit layers, and max pooling layers. Wherein the convolutional layers are composed of 64 filters with the size of 3x3 or 2x2, and the number of channels of the input signal of the first convolutional layer is determined by the number of channels of the relation characteristic, here the value is 128, and the size is 2x2, and zero padding is performed. Obtaining a feature graph with the size of (N, N,64) by the relationship features through a multi-scale learning device, and straightening the feature graph and transmitting the feature graph into a full connection layer; the first layer of full connection layer adopts a ReLU activation function, and the second layer of full connection layer adopts a Sigmoid activation function. Inputting the relation feature into the metric learner to obtain a value of [0,1 ]]The value of (c), which represents the similarity between two picture features, this similarity is called the relationship score. Obtaining N under the condition of one-time N-wayk-shot few-sample image classification task setting²B similarity values; for each sample in the target set, taking the maximum value in the relation scores of the mean values of the samples of each type of the support set as a prediction label: finally, we get the classification labels of the N x b target set samples. Formulating a multi-scale relationship network as follows:

wherein, i is 1, … k; j is 1, … b; n is 1, … N; m-1, … N;

a representation feature extractor;

a representative metric learner;

and

and

a prototype representing each type of sample of the support set;

representation of absolute value of three-dimensional vector after reconnection subtractionA value;

representing relationship scores

A prediction tag representing a sample of the target set.

In the foregoing, the network parameter design includes a loss function design, an optimizer selection, and an optimization algorithm design, and the process includes:

for image classification tasks, the cross entropy is typically chosen to compute the distance between the predicted probability distribution and the desired probability distribution. However, in the multi-scale relationship network, the similarity between two images is characterized by the metric learner, and the last layer of the network of the metric learner has only one output node. The designed loss function has the following specific formula:

wherein the content of the first and second substances,

represents the parameters of the feature extractor and is,

And

obtained by calculation of formula (1-4);

in the training process, after an epoch is finished, that is, after the epoch training of different tasks is completed, the accuracy on the verification set is calculated, and the highest accuracy on the verification set up so far is recorded. When the continuous epochs are not optimal, the accuracy rate is not increased any more. When the accuracy is not improved any more or the accuracy is gradually reduced, the iteration can be stopped, the model corresponding to the highest accuracy is output, and then the model is used for testing. The calculation formula of the accuracy is as follows:

a predictive tag value representing a sample of the target set,

Among the foregoing, as shown in FIG. 5; the whole output process comprises the steps of enabling the support set and the target set to go to a feature extractor, obtaining relation features by obtaining extracted support set multi-scale features and target set multi-scale features, and obtaining relation scores through a measurement learning device.

In the embodiment of the invention, the experimental conditions are IntelCorei 73770 type CPU, a memory 16GBytes, an NVIDIA GeForce GTX 0686 GB type accelerator and a Pytrch 0.4.0 deep learning frame.

In the embodiment of the invention, the similarity measurement is learned from a wide task space, so that the experience extracted from the past learning task is used for guiding the learning of a new task, and the purpose of learning how to learn is achieved.

In the embodiment of the invention, the larger the image patch scale is, the more context relations can be acquired, and the more context information is included in the extracted features, which is more beneficial to improving the function of the meta-learner.

Claims

1. A small sample element learning training method for medical image classification processing artificial intelligence is characterized in that three element learners including a multi-scale CNN feature extractor, a measurement learner and a classification discriminator are set up, and simultaneously, the element learners are subjected to measurement standard design; on each task of the training set, a target set is learned through support set distance measurement, a measurement standard is finally obtained through learning, and then for a new task of the test set, the target set can be rapidly and correctly classified only by means of a small number of samples of the support set; the method specifically comprises the following steps:

step 8, finishing output; obtaining a relation score after learning through a metric learning device according to the relation characteristics, wherein the relation score is a result of a one-time few-sample image classification task;

the super-pixel is a small area formed by a series of pixel points which are adjacent in position and similar in color, brightness or texture characteristics; most of the small areas keep effective information for further image segmentation, and the boundary information of objects in the images is generally not damaged; superpixels are graphs which divide a pixel-level graph into area-level graphs, and are abstractions of basic information elements; for each super pixel, extracting a plurality of image patches with different sizes by taking the center of the super pixel as the center to serve as the input of a multi-scale CNN model, and performing feature extraction operation on the image patches corresponding to each super pixel by the multi-scale CNN model by the multi-scale feature extractor based on a relation network;

the first two convolution modules of the specific network layer structure consist of a convolution layer, a batch standardization layer, a modified linear unit layer and a maximum pooling layer; wherein the convolutional layers are composed of 64 filters with the size of 3x3, zero padding is not carried out, and the number of channels of the input signal of the first convolutional layer is determined by the number of channels of the input picture; the size of the pooling layer is 2x2, no zero padding is performed; the third and fourth convolution modules are composed of convolution layers, batch normalization layers and modified linear unit layers, wherein each convolution layer is composed of 64 filters with the size of 3x3 and is subjected to zero padding; inputting the picture into a feature extractor to obtain a second, third and fourth layer feature map with the size of (M, M, 64), wherein M represents the width and height of the feature map, and 64 represents the depth of the feature map; then, splicing the second layer characteristic diagram, the third layer characteristic diagram and the fourth layer characteristic diagram in the depth direction to obtain a multi-scale characteristic with the size of (M, M, 128); the specific structure of the metric learner is composed of two convolution modules and two full-connection layers; the first convolution module and the second convolution module are composed of a convolution layer, a batch normalization layer, a modified linear unit layer and a maximum pooling layer; wherein the convolutional layer comprises 64 layers with the size of 3x3 or 2x2The number of channels of the input signal of the first convolutional layer is determined by the number of channels of the relation characteristic, the number of channels is 128, the size of the filter is 2x2, and zero padding is carried out; obtaining a feature graph with the size of (N, N,64) by the relationship features through a multi-scale learning device, and straightening the feature graph and transmitting the feature graph into a full connection layer; the first layer of full connection layer adopts a ReLU activation function, and the second layer of full connection layer adopts a Sigmoid activation function; inputting the relation feature into the metric learner to obtain a value of [0,1 ]]A value of (a), which represents the similarity between two picture features, this similarity being referred to as the relationship score; obtaining N under the condition of one-time N-way k-shot few-sample image classification task setting²B similarity values; for each sample in the target set, taking the maximum value in the relation scores of the mean values of the samples of each type of the support set as a prediction label: finally, obtaining classification labels of the N x b target set samples; formulating a multi-scale relationship network as follows:

wherein, i is 1, … k; j is 1, … b; n is 1, … N; m is 1, … N;

a representation feature extractor; g_φA representative metric learner;

and

and

representing the multi-scale superpixel characteristics of the support set and the target set samples;

means representing each type of sample of the support set; [ … …]Representing and subtracting the three-dimensional vector reconnection and then taking an absolute value;

representing a relationship score;

a prediction tag representing a sample of the target set;

the network parameter design comprises the design of loss function, the selection of an optimizer and the design of an optimization algorithm, and the method comprises the following processes: for image classification tasks, cross entropy is generally selected to calculate the distance between a predicted probability distribution and an expected probability distribution; in the multi-scale relation network, the measurement learner describes the similarity between two images, and the last layer of network of the measurement learner has only one output node; the designed loss function specific formula is shown as follows;

wherein the content of the first and second substances,

represents a feature extractor parameter, phi represents a metric learner parameter; argmin means when the following formula reaches the minimum

And phi is taken; n represents the sample category number in a few-sample image classification task at a time; b represents the value of each type of picture of the target set, namely the batch number, y_nA label is predicted for the target set picture sample,

i.e., labels for each category of the support set;

calculated by the formula (1-4).

2. The method for learning and training the small sample elements of the medical image classification processing artificial intelligence as claimed in claim 1, wherein in the training process, after an epoch is finished, namely after the epoch different tasks are trained, the accuracy on the verification set is calculated, and the highest accuracy on the verification set up to now is recorded; when the continuous epochs are not optimal for multiple times, the accuracy rate can be considered not to be improved any more; when the accuracy is not improved any more or the accuracy is gradually reduced, the iteration can be stopped, a model corresponding to the highest accuracy is output, and then the model is used for testing; the calculation formula of the accuracy is as follows:

the epamode represents the number of tasks, and one task is a few-sample image classification task; n is_rRepresenting the total number of samples on the target set; b represents the value of each type of picture of the target set, namely the batch number; n represents the sample category number in a few-sample image classification task at a time;

i.e. labels for each category of the support set, y_nPredicting labels for the target set picture samples;

it means that the similarity of the same matching pair is 1 and the similarity of different matching pairs is 0.

3. The method as claimed in claim 1, wherein the process of outputting the whole includes inputting a support set and a target set into a feature extractor, obtaining a relationship feature by obtaining extracted multi-scale features of the support set and multi-scale features of the target set, and finally obtaining a relationship score by a metric learner.