CN114937288B

CN114937288B - Atypical data set balancing method, atypical data set balancing device and atypical data set balancing medium

Info

Publication number: CN114937288B
Application number: CN202210704826.0A
Authority: CN
Inventors: 林江莉; 韩霖; 彭建伟; 林江宇
Original assignee: Haihong Zhixiang Medical Science And Technology Tianjin Co ltd; Sichuan University
Current assignee: Haihong Zhixiang Medical Science And Technology Tianjin Co ltd; Sichuan University
Priority date: 2022-06-21
Filing date: 2022-06-21
Publication date: 2023-05-26
Anticipated expiration: 2042-06-21
Also published as: CN114937288A

Abstract

The invention relates to an atypical data set balancing method, a device and a medium, which concretely comprise the steps of preprocessing a data set; performing amplification of the traditional method aiming at unbalanced data sets of different categories to obtain balanced data sets; sending the balance data set into a set network model for training to obtain an atypical class data set; amplifying the atypical class data set to obtain an atypical class balanced data set; and finally, inputting the typical class data set and the atypical class equalization data set into a network model for training to obtain a trained network model and a trained classification result. By the method for amplifying the atypical data set, a new thought is provided for the field of unbalanced data classification tasks, and the specificity and accuracy of the classification model are greatly improved. Meanwhile, the weight trained on the natural image is introduced to carry out transfer learning training, so that the problems of gradient disappearance and explosion can be solved, and meanwhile, the rapid convergence of a network can be helped to improve the performance of the model.

Description

Atypical data set balancing method, atypical data set balancing device and atypical data set balancing medium

Technical Field

The invention relates to a deep learning technology, in particular to the field of deep learning of unbalanced data sets, and in particular relates to a processing technology for classifying disease images by using an atypical data balance method.

Background

The application of the deep learning technology in the field of medical image classification is greatly convenient for doctors to diagnose diseases, and before deep learning, the deep learning technology is basically researched based on a machine learning method. The machine learning method is to extract the features by using texture features, LBP operator and other methods, and then classify the images by using the traditional methods such as K-nearest neighbor algorithm or SVM. Many scholars obtain good classification results on diseases such as breast cancer, thyroid nodule and the like by using the method, and develop corresponding auxiliary diagnosis systems. Although the traditional machine learning algorithm achieves such a lot of achievement, the characteristic selection process of the machine learning method is complex and difficult to accurately extract, the requirement on researchers is high, the current medical image data set is usually more than ten thousand, and the classification result of the machine learning method on a large data set is unsatisfactory.

The sensitivity and specificity and accuracy of deep learning on skin cancer classification tasks exceed those of human expert, and a series of methods and means are introduced into the field of skin mirror image classification by a plurality of researchers in order to search for higher accuracy and classification efficiency. Skin cancer datasets, which mostly exhibit the characteristic of data distribution imbalance, are typical imbalance datasets. For machine learning, many students adopt different sampling methods to solve the problem of data imbalance, and common sampling methods include oversampling, such as SMOTE sampling methods, undersampling, such as NCL sampling methods, and mixed sampling, such as smote+tomek Links methods.

In the prior art, for deep learning, the traditional method for solving the data unbalance is to perform preprocessing balance on the data, optimize a loss function or improve a part of the structure of a network so that the model adapts to an unbalanced data set during training. For example, when the problem of natural language imbalance is solved, a dynamic K-means clustering method is added to perform data preprocessing, so that the classification accuracy of a text imbalance data set is improved well; in myocardial infarction signal processing, a CNN model and a Focal Loss function are used for optimizing training, so that the problem of unbalanced myocardial infarction signals is solved; in the aspect of skin diseases, an original image is processed by an auxiliary decoder, the number of samples is increased to balance a data set, the original image is sent to a decoder network after passing through CNN to obtain a new image, the new image and the original image are respectively sent to a corresponding CNN classifier, and the training loss of the two images and the loss of the decoder form the total loss of a classification model according to different weights. However, the skin cancer data set usually belongs to an unbalanced data set, and the problem of unbalance of the data set is solved by the Loss balance method or the model integration method, so that the existing effects are poor, and the accuracy of a network or a model is not high enough. Therefore, if a method for processing a skin disease imbalance data set is proposed, a method that can improve the classification effect of a network or a model, while improving the accuracy of classification is very important.

Disclosure of Invention

In view of the shortcomings of the prior art, the invention provides an atypical data set balancing method, a device and a medium, wherein the invention combines a traditional unbalanced data processing mode and an atypical data set balancing mode, uses a skin disease image to finish classification and identification of different types of diseases, and specifically comprises preprocessing a data set, carrying out traditional amplification on different types of skin disease unbalanced data sets to obtain a training set, realizing the state of overall balance of different types of skin disease data, sending the training set into a selected CNN network for training on the basis, classifying the training set by using the weight after obtaining the weight, obtaining two types of misprediction and misprediction, wherein the type of misprediction is called atypical, and then carrying out massive expansion on atypical types to obtain a new training set-atypical data balance data set; the new training set is sent into a network model for training to obtain a final training network model, and the prediction classification of different skin diseases is carried out based on the network model.

In a first aspect, the present invention proposes a method for balancing atypical class data sets, said method comprising the steps of:

preprocessing a data set;

amplifying unbalanced data sets of different categories to obtain balanced data sets;

sending the balance data set into a set network model for training to obtain an atypical class data set;

performing atypical class data amplification on the atypical class data set to obtain an atypical class balanced data set;

inputting the typical class data set and the atypical class equalization data set into a network model for training to obtain a trained network model and a trained classification result.

Further, specifically, preprocessing the data set specifically includes: the dataset is cropped and randomly image enhanced.

Preferably, the specific set of the set network model includes:

the network model is EfficientNetB0, specifically comprises MBConvBlock, MBConv, sepConv, depth separable convolution (DWConv), SE module;

the initial setting of the network model uses the weight trained on the natural image to carry out transfer learning training to obtain the initial model weight of the EfficientNetB0 network;

preferably, the network model adopts a Focal Loss function and a two-class cross entropy Loss function in the training process:

wherein, the value range of y is 1 or-1, 1 represents a positive sample, -1 represents a negative sample, and the value range of probability p is 0 to 1;

substituting pt for p gives equation 2:

equation 1 can be rewritten as equation 3:

CE(p，y)＝CE(pt)＝-log(pt) (3)

in order to control the weights of the positive and negative samples, difficult-to-classify and easy-to-classify samples, the Focal loss adds a modulation factor in front of equation 3 to obtain equation 4:

FL(pt)＝-α _t (1-pt) ^γ log(pt) (4)

wherein alpha is _t For reducing the weight of the negative samples, -alpha when the label is equal to 1 _t Equal to alpha, when the tag is equal to others, -alpha _t The value of alpha is equal to 1-alpha, and the range of alpha is 0 to 1.

Specifically, the data set expansion specifically includes: data amplification of the data set includes using one or more of histogram equalization, horizontal flipping, rotation by 30 degrees, 90 degrees, 150 degrees, 180 degrees, and random erasure.

Specifically, in an atypical class data set balancing method of the present invention, the method further comprises the steps of: and obtaining a target image, and carrying out prediction classification on the target image based on the trained network model.

In a second aspect, the present invention proposes an atypical data set balancing apparatus, which specifically includes:

the preprocessing module is used for preprocessing the data set;

the first amplification module is used for amplifying unbalanced data sets of different categories to obtain balanced data sets;

the first training module is used for sending the balance data set into the set network model for training to obtain an atypical class data set;

the second amplification module is used for carrying out atypical data amplification on the atypical data set to obtain an atypical balanced data set;

and the second training module is used for inputting the typical class data set and the atypical class balance data set into the network model for training to obtain a trained network model and a trained classification result.

In a third aspect, the invention proposes an electronic device characterized in that it comprises a memory and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by one or more processors, said one or more programs comprising steps for performing said atypical-type dataset balancing method.

In a fourth aspect, the present invention proposes a computer readable storage medium, on which a computer program is stored, the computer program implementing the steps of the atypical class data set balancing method according to any one of the above-mentioned methods when being executed by a processor.

The atypical data set balancing method, device and medium based on the invention realize the following beneficial technical effects:

(1) In the embodiment of the invention, aiming at unbalanced data sets, the data sets are preprocessed firstly; performing amplification of the traditional method aiming at unbalanced data sets of different categories to obtain balanced data sets; then sending the balance data set into a set network model for training to obtain an atypical class data set; performing atypical class data set amplification on the atypical class data set to obtain an atypical class balanced data set; finally, inputting the typical class data set and the atypical class equalization data set into a network model for training to obtain a trained network model and a trained classification result; by the method for amplifying the atypical data set, a new thought is provided for the field of unbalanced data classification tasks, and compared with the existing classification method, the specificity and the accuracy are greatly improved.

(2) In the embodiment of the invention, in the initialization setting of the network model (EfficientNet B0), the weight trained by EfficientNet B0 on the natural image is used for transfer learning training, so that the problem that the network falls into a local maximum value and the gradient disappears and explodes is solved to a certain extent, and meanwhile, the transfer learning can help the network to quickly converge and improve the performance of the model.

(3) In the embodiment of the invention, the improved cross entropy loss function of two classifications is used for verifying the loss balance strategy, so that the problems of sample unbalance and sample difficult classification are solved on a target detection network, the weight of a difficult sample can be well controlled, and a better training result is obtained. When the method is applied to skin cancer, a good skin cancer intelligent identification classification model is obtained.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are required to be used in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments described in the present invention, and other drawings may be obtained according to the drawings without inventive effort to those skilled in the art.

FIG. 1 is a schematic diagram of general technical steps of an atypical class data set balancing method according to an embodiment of the present invention.

Fig. 2 is a schematic diagram of an ISIC2019 training set label according to an embodiment of the present invention.

Fig. 3 is a schematic diagram of the data set before and after clipping according to an embodiment of the present invention.

FIG. 4 is a graph of a conventional post-equilibration dataset profile provided by an embodiment of the present invention.

Fig. 5 is a schematic diagram of an architecture of an afflicientnet network provided by an embodiment of the present invention.

FIG. 6 is a schematic diagram of an initial dataset-legacy balance-atypical class data balance dataset distribution provided by an embodiment of the present invention.

Detailed Description

Other advantages and effects of the present invention will become apparent to those skilled in the art from the following disclosure, which describes the embodiments of the present invention with reference to specific examples. The invention may be practiced or carried out in other embodiments that depart from the specific details, and the details of the present description may be modified or varied from the spirit and scope of the present invention. It should be noted that the following embodiments and features in the embodiments may be combined with each other without conflict.

In view of the shortcomings of the prior art, the invention provides an atypical data set balancing method, device and medium, and the main purpose of the invention is that under the condition that an automatic classification data set of diseases is unbalanced, the invention can realize high-efficiency and accurate classification and identification of various diseases by combining a traditional unbalanced data processing mode and an atypical data set balancing mode, and in the embodiment of the invention, skin diseases are taken as examples, and classification and identification of different types of diseases are completed by using different types of skin disease images. Compared with the prior art, the atypical data set balancing method realizes accurate prediction and classification of skin diseases and improves classification efficiency.

Specifically, the atypical data set balancing method specifically comprises preprocessing a data set, carrying out traditional amplification on different types of skin disease unbalanced data sets to obtain a training set, realizing the overall balanced state of the different types of skin disease data, sending the training set into a selected CNN network for training as shown in a figure 1 on the basis, classifying the training set by using the weight after obtaining the weight of training to obtain two types of misprediction and correct misprediction, wherein the type of misprediction is called atypical, and then expanding the atypical type to obtain a new training set, namely the atypical type data balancing data set; the new training set is sent into a network model for training to obtain a final training network model and a trained classification result, and the prediction classification of different skin diseases is carried out based on the network model.

Illustratively, the data set selects the ISIC2019 skin mirror image classification data set in the present invention, the data sources mainly include HAM10000, and the two major types include training data and test data, wherein the training data gives disease labels, and 25331 skin mirror images are stored in an Excel file manner, the label manner is similar to One-Hot encoding, the label manner is as shown in fig. 2, the first row represents disease category, the first column is picture name, and the column category with number 1 represents the label of the picture. The training data set collection is operated by a professional dermatologist, the data set information used in the experiment is shown in table 1, and the data imbalance can be seen from the number and the duty ratio of the tables, wherein the largest number of melanin nevi is the least skin fibroids, and the atypical data balancing means provided by the invention aims to solve the problem. When in use, part of pictures are reduced, then the training set of the game is divided into two parts, one fifth of pictures are collected and stored as a test set independently, data pollution is avoided, and the remaining four fifths are used as the training set. The total number of malignant skin cancer pictures in the whole training data set is 8473, the number of benign pictures is 16858, and according to the benign and malignant classification of diseases, only one half of benign diseases can be seen in the malignant pictures. The most melanoma is 4522 in total in malignant pictures, and the most moles are 12875 in total in benign pictures.

Table 1 isic2019 dataset detailed example

In the following embodiments of the present invention, specific exemplary technical solutions are all needed as examples, and the embodiments of the technical solutions are not limited thereto.

First embodiment

In one embodiment, an atypical class data set balancing method of the present invention comprises the steps of:

step S100, preprocessing the data set.

Specifically, when the data of ISIC2019 is subjected to pretreatment, it was found that the treatment of hair and the treatment of classification after segmentation have low marginal improvement effect of treatment, which is not more than 1%. In order to embody the atypical data balance effect, the pretreatment method is abandoned, and only simple cutting treatment is carried out. The data set has a large number of pictures with irrelevant black edges, as shown in a graph a and a graph d of fig. 3, for example, skin cancer is taken as an example, and skin cancer is classified by only paying attention to skin injury parts in the middle of the pictures, and cutting is performed on part of the pictures. The method comprises the following steps: firstly, all training data are binarized, a binarized picture is shown as b and e of fig. 3, black edges of the picture to be processed can be seen, a threshold value is set, the picture to be cut out is screened out, the picture to be cut out is cut out, and the aspect ratio of the picture to be cut out is the same as that of the initial image. The images before and after cutting are compared with the images a and c, d and f in fig. 3, so that the images after cutting can be seen to well retain the skin lesion parts and remove the black edges. Meanwhile, the preprocessing step of the invention further comprises the step of carrying out random image enhancement processing to avoid the overfitting phenomenon before inputting the network. The data preprocessing method can well improve the accuracy of the classification result.

Step 200, amplifying unbalanced data sets of different categories to obtain balanced data sets.

Specifically, taking skin diseases as an example, experimental data show that the number of individual categories of skin diseases is very small, and the most categories and the least categories differ by several tens of times. To balance this difference, an expansion balance policy is first adopted, which is a conventional expansion balance policy, and exemplary, the class pictures to be expanded are expanded by using methods of histogram equalization, horizontal flipping, rotation by 30 degrees, 90 degrees, 150 degrees, 180 degrees, random erasure (cutout) and the like, and the number of pictures of each class after expansion is approximately one half or one third of the maximum number of moles of melanin, depending on the initial number of pictures of each class. The data distribution after traditional balancing is as shown in fig. 4, compared with the initial distribution, the data distribution is more balanced, and the over-fitting problem in the training process can be avoided due to the increase of the data quantity of the small sample.

And step S300, sending the balance data set into a set network model for training to obtain an atypical data set.

Specifically, after the balanced data set is sent to the set network model for training and classifying, the output result comprises the typical class data set and the atypical class data set, namely, the data sets with correct classification and incorrect classification are obtained through the preliminary network model.

Specifically, in one embodiment, the set network model may be an efficiency net b0, VGG, res net, and google net. Preferably, the network model adopted by the invention is EfficientNet B0, the input size of all input pictures in the balanced data set is 224 multiplied by 224, and the random horizontal overturn and random rotation data enhancement operation is carried out, so that the phenomenon of fitting is avoided during training; and finally, normalizing the data and converting the data into vectors to be input into a network model. When the deep learning network is trained, the weight of the network is modified through counter propagation, the weight is randomly initialized during the training at the beginning of the deep learning, which leads to the problems of gradient explosion and gradient disappearance, so that the initial weight of the network model is initialized well at the beginning of the training, in one embodiment, the initial weight of the network model can be set through priori knowledge or is automatically randomly generated, preferably, the network model EfficientNetB0 of the invention uses the weight trained on a natural image to carry out migration learning training to obtain the initial model weight of the EfficientNetB0 network, the trained model weight can avoid the network from sinking into a local maximum value, the problems of gradient disappearance and gradient explosion can be solved to a certain extent, and meanwhile, the migration learning can help the network to quickly converge and improve the performance of the model.

Specifically, the main modules of the EfficientNetB0 include MBConvBlock, MBConv, sepConv, and the structures of a depth separable convolution (DWConv), an SE module and the like are used, and the specific structure is shown in fig. 5. Specifically, the network model training is divided into two parts of transfer learning and weight training after the transfer learning, and the setting of the network model specifically includes: the epoch is uniformly set to 100, the batch Size is set to 64, the learning rate of transfer learning is 0.01, the learning rate after transfer learning is 0.001, the SGD optimizer with the driving quantity is used in two training, and the momentum is set to 0.9. According to the identification requirement of the actual diseases, the types of the diseases can be set to be different in number, in one embodiment, the classification types are set to be eight types, and the network model is based on the fact that the network model is used for transfer learning from the ImageNet, so that the final full connection layer of the network is modified to enable the output of the network to meet the number of the types of the current experiment. Finally, the invention monitors the accuracy of the verification set in the process, and when the accuracy of the verification set is not improved in the period of 10 epochs, training is stopped in advance, and the weight with the highest accuracy of the model in the verification set is saved.

And step 400, carrying out atypical class data amplification on the atypical class data set to obtain an atypical class balanced data set.

Specifically, the data amplification of the atypical class data set includes expanding by using methods such as histogram equalization, horizontal inversion, rotation by 30 degrees, 90 degrees, 150 degrees, 180 degrees, random erasure (cutout) and the like, and the expanded data set is used as the atypical class equalization data set. In one embodiment, taking skin diseases as an example, while some categories have no atypical categories, the atypical category balance of the overall data is not affected and there is often an atypical category for each malignant skin cancer. After the atypical class is obtained, the data amplification method is used for amplifying the atypical class by twenty times and fifty times or more, the quantity of the atypical class in a data set is increased, so that the model can pay attention to the atypical class sample better, and the false negative rate of the model is reduced. The distribution of the data set after the initial data set-traditional balancing data set-atypical class data balancing is as in fig. 6, and the data distribution gradually tends to be balanced. The invention uses EfficientNet to test, then applies the atypical data balancing method provided by the invention to the classical network in the classification field, and verifies the optimization capability of the atypical data balancing method on the training effect.

And S500, inputting the typical class data set and the atypical class balance data set into a network model for training to obtain a trained network model and a trained classification result.

Specifically, a typical class data set and an atypical class data set are obtained based on network model training initial training, after the atypical class balanced data set is obtained through amplification, data distribution gradually tends to be balanced, a new data training set obtained by the typical class data set and the atypical class balanced data set is input into a network model for training to obtain a trained network model, and a final trained network model is obtained, and meanwhile, a classification result of the training data set is also obtained.

In step S500, it can be appreciated that the use of the Loss function in the network model may be a conventional Loss function, and in one embodiment, preferably, the network model of the present invention adopts Focal Loss to solve the problems of sample imbalance and sample difficult classification in the training process, and the network has a higher improvement in both training speed and training accuracy. Focal loss is modified above the underlying binary cross entropy loss function, the cross entropy loss function of the two classes is shown in equation 1 below,

p is replaced by pt, resulting in equation 2,

equation 1 can be rewritten as equation 3:

CE(p，y)＝CE(pt)＝-log(pt) (3)

FL(pt)＝-α _t (1-pt) ^γ log(pt) (4)

wherein alpha is _t For reducing the weight of the negative samples, -alpha when the label is equal to 1 _t Equal to alpha, when the tag is equal to others, -alpha _t The value of alpha is equal to 1-alpha, and the range of alpha is 0 to 1. Thus, the contribution of positive and negative samples to loss can be controlled by setting the value of α. Of which (1-pt) ^γ For controlling the weight of a sample which is difficult to classify, if pt has a larger value, that is to say the probability of belonging to a certain class is larger, 1-pt is smaller, and vice versa, so that (1-pt) can be set ^γ To control the contribution of difficult and easy-to-classify samples to loss. Of which (1-pt) ^γ Modulation factor called Focal Loss, alpha _t Is a coefficient commonly used in controlling the weight of positive and negative samples. When the gamma value is 0, the Focal Loss is a common binary cross entropy Loss function, and the weight of the samples difficult to classify is gradually increased along with the increase of the value, so that proper alpha and gamma values can be selected when the sample is used, and in one embodiment, the weight of the samples difficult to classify can be well controlled according to experience using 0.25, and a better training result is obtained.

Further, in an atypical class data set balancing method of the present invention, the method further comprises:

and obtaining a target image, and carrying out prediction classification on the target image based on the trained network model.

Specifically, a target image of the test set is obtained, the target image is input into a trained network model, and prediction classification is carried out on the target image of the test set. The method can be applied to various data sets needing prediction classification, particularly the prediction classification of images with different disease types, and is beneficial to applying a prediction network model to the classification of various target images. Can help to obtain an excellent intelligent skin cancer prediction model and intelligent prediction recognition classification when applied to skin cancer.

According to the invention, by using different classifier models, the atypical class data balance effect is verified on 20000 skin mirror image datasets of the ISIC2019, the sensitivity, F1 fraction, accuracy, specificity and accuracy of the atypical class data balance model are improved greatly compared with the model without atypical class data balance, wherein the F1 fraction of the GoogLeNet is improved by 12.7%, and the average accuracy is improved by about 5%. In the multi-classification task aiming at eight skin injuries such as melanoma, squamous cell carcinoma and the like, the accuracy of using an atypical data balance method and an EfficientNet model reaches 82.4%, and the accuracy of the champion model of the latest ISIC2019 competition is improved by about 20%. Therefore, the effectiveness of the intelligent identification classification strategy for the skin diseases provided by the method is fully described, and the atypical data balance strategy also provides a new thought for the field of unbalanced data classification tasks.

Example two

The invention also provides another implementation mode, and provides an atypical data set balancing device, which comprises the following components:

the preprocessing module is used for preprocessing the data set;

It will be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working process of the apparatus and modules described above may refer to the corresponding process in the foregoing method embodiment, which is not repeated herein.

Example III

The invention also provides another implementation mode, and the invention provides electronic equipment, which comprises: a processor 1 and a memory 2.

The memory 2 is used for storing a computer program.

The memory 2 includes: various media capable of storing program codes, such as ROM, RAM, magnetic disk, U-disk, memory card, or optical disk.

The processor 1 is connected to the memory 2 and is configured to execute a computer program stored in the memory 2, so that the apparatus for generating a standard section of a malformation of a fetal craniocerebral structure performs the above-mentioned atypical data set balancing method.

Preferably, the processor 1 may be a central processing unit (Central Processing Unit, CPU for short); an application specific integrated circuit (Application Specific Integrated Circuit, ASIC for short) is also possible.

Example IV

The present invention also provides another embodiment, namely, a computer-readable storage medium storing a computer program executable by at least one processor to cause the at least one processor to perform the steps of the atypical class data set balancing method as described above.

In the foregoing embodiments, the descriptions of the embodiments are emphasized, and in part, not described or illustrated in any particular embodiment, reference is made to the related descriptions of other embodiments.

Those of ordinary skill in the art will appreciate that the modules, units, and/or method steps of the various embodiments described in connection with the embodiments disclosed herein can be implemented as electronic hardware, or as a combination of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

In the several embodiments provided in the present invention, it should be understood that the disclosed apparatus, device and method may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of the modules is merely a logical function division, and there may be additional divisions when actually implemented, e.g., multiple modules or components may be combined or integrated into another device or system, or some features may be omitted or not performed.

In addition, each functional module in each embodiment of the present invention may be integrated into one processing module, or each module may exist alone physically, or two or more modules may be integrated into one module. The integrated units may be implemented in hardware or in software functional units.

The above embodiments are only for illustrating the technical solution of the present invention, and not for limiting the same; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims

1. A method of atypical class data set balancing, said method comprising the steps of:

s1, preprocessing a data set; wherein the dataset comprises an acquired medical image set;

the pretreatment comprises the following steps: cutting out images of the image set according to the initial aspect ratio, and reserving lesion parts in the images;

s2, amplifying unbalanced data sets of different categories to obtain balanced data sets;

s3, sending the balance data set into a set network model for training to obtain an atypical data set; the set network model specifically comprises the following steps: MBConvBlock, MBConv, sepConv, depth separable convolution (DWConv), SE module;

s4, carrying out atypical data amplification on the atypical data set;

s5, feeding the typical class data set and the atypical class equalization data set obtained in the step S4 back into a network model for training, and obtaining a trained network model and a trained classification result.

2. The atypical class data set balancing method of claim 1, wherein preprocessing the data set further comprises: random image enhancement processing.

3. The atypical class data set balancing method of claim 1, wherein the set network model specifically comprises:

the network model is EfficientNetB0;

and the initial model weight of the EfficientNetB0 network is obtained by performing migration learning training by using the weight trained on the natural image in the initial setting of the network model.

4. The method of atypical class data set balancing of claim 3,

the network model adopts Focal Loss and two-classification cross entropy Loss functions in the training process:

substituting pt for p gives equation 2:

equation 1 can be rewritten as equation 3:

CE(p，y)＝CE(pt)＝-log(pt) (3)

FL(pt)＝-α _t (1-pt) ^γ log(pt) (4)

5. The atypical class data set balancing method of claim 1, wherein the data set amplification specifically comprises:

data amplification of the data set includes using one or more of histogram equalization, horizontal flipping, rotation by 30 degrees, 90 degrees, 150 degrees, 180 degrees, and random erasure.

6. The atypical class data set balancing method of claim 1, further comprising the steps of: and obtaining a target image, and carrying out prediction classification on the target image based on the trained network model.

7. An atypical class data set balancing apparatus, comprising in particular:

the preprocessing module is used for preprocessing the data set; wherein the dataset comprises an acquired medical image set;

the first training module is used for sending the balance data set into the set network model for training to obtain an atypical class data set; the set network model specifically comprises the following steps: MBConvBlock, MBConv, sepConv, depth separable convolution (DWConv), SE module;

and the second training module is used for feeding the typical class data set and the atypical class balance data set back to the network model for training, and obtaining a trained network model and a trained classification result.

8. An electronic device comprising a memory and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by one or more processors, the one or more programs comprising steps for performing the atypical class data set balancing method of any one of claims 1-6.

9. A computer readable storage medium, characterized in that the computer readable storage medium has stored thereon a computer program which, when executed by a processor, implements the steps of the atypical class data set balancing method according to any one of claims 1-6.