CN112232448B

CN112232448B - Image classification method and device, electronic equipment and storage medium

Info

Publication number: CN112232448B
Application number: CN202011462359.2A
Authority: CN
Inventors: 李卫超; 赵雷; 唐轶; 李博超; 钟利伟; 金蒙
Original assignee: Beijing Daheng Prust Medical Technology Co ltd
Current assignee: Beijing Daheng Prust Medical Technology Co ltd
Priority date: 2020-12-14
Filing date: 2020-12-14
Publication date: 2021-04-23
Anticipated expiration: 2040-12-14
Also published as: CN112232448A

Abstract

The application provides an image classification method, an image classification device, an electronic device and a storage medium, wherein the method comprises the following steps: obtaining an original image set, wherein the original image set comprises a plurality of original images; cleaning, cutting, data enhancing and normalizing the original image set to generate a preprocessed image set; inputting the preprocessed image set into a backbone network for training to generate a characteristic image set; and inputting the extracted feature map into a full connection layer, classifying and outputting a classification result. A new network is designed, the network integrates the characteristics of the pictures under different granularities and different scales, and the detection effect can be further improved.

Description

Image classification method and device, electronic equipment and storage medium

Technical Field

The present application relates to the field of image recognition, and in particular, to an image classification method, apparatus, electronic device, and storage medium.

Background

Diabetic macular edema is a vision threatening diabetic retinopathy, and OCT (optical coherence tomography) technology can be used for diagnosis and guidance of age-related macular degeneration and diabetic macular edema. The existing automatic detection method can be divided into two types according to different implementation forms, namely a traditional image classification method and a processing method based on deep learning. The traditional image classification method needs manual feature extraction and is time-consuming and labor-consuming. In the current technical scheme of adopting deep learning to detect, the realization is mostly based on transfer learning, namely, a model is initialized by adopting pre-training weights in an ImageNet competition, and training is continued on the basis. However, the ImageNet competition is directed to natural pictures, and semantic information of medical pictures is greatly different from the natural pictures, so that when the amount of medical image data is small, the adjustment on the ImageNet pre-training weight is more suitable; on the premise that the amount of medical image data is sufficient, a better method is retraining instead of transfer learning.

Disclosure of Invention

An object of the embodiments of the present application is to provide an image classification method, an image classification device, an electronic device, and a non-transitory readable storage medium of an electronic device, so as to solve technical problems in the prior art.

In a first aspect, an embodiment of the present invention provides an image classification method, including: obtaining an original image set, wherein the original image set comprises a plurality of original images; cleaning, cutting, data enhancing and normalizing the original image set to generate a preprocessed image set; inputting the preprocessed image set into a backbone network for training to generate a characteristic diagram; and inputting the extracted feature map into a full connection layer, classifying and outputting a classification result.

In one embodiment, after the original image set is subjected to the cleaning, clipping, data enhancement and normalization processes, a preprocessed image set is generated, which includes: performing data enhancement and data equalization on the target area by adopting a data enhancement method to obtain data enhancement data, wherein the number of all classified pictures in the data enhancement data is the same; and normalizing the data enhancement data to generate a preprocessing image set.

In one embodiment, inputting the pre-processing image set into a backbone network for training to generate a feature map, includes: establishing a plurality of convolution blocks, wherein convolution kernels with different sizes are adopted in the convolution blocks to extract different scale characteristics, merging and adding operation is carried out on characteristic graphs output by convolution operation with different sizes of convolution kernels, and the number of channels describing the characteristics of the input convolution blocks and semantic information rich in a single channel are increased; merging the feature maps output by each rolling block to generate a feature map; the sizes of the feature maps output by the adjacent convolution blocks are sequentially halved, and the number of channels of the feature maps is sequentially doubled.

In an embodiment, inputting the extracted feature map into the full connection layer, classifying and outputting a classification result, including: inputting the extracted feature map into a full connection layer, and generating a probability value corresponding to a preset category; and outputting the classification result corresponding to the maximum probability value according to the probability value.

In a second aspect, an embodiment of the present invention provides an image classification apparatus, including: the system comprises a first acquisition module, a second acquisition module and a third acquisition module, wherein the first acquisition module is used for acquiring an original image set, and the original image set comprises a plurality of original images; the first generation module is used for generating a preprocessing image set after cleaning, cutting, data enhancement and normalization processing are carried out on an original image set; the second generation module is used for inputting the preprocessed image set into a backbone network for training to generate a feature map; and the first output module is used for inputting the extracted characteristic diagram into the full connection layer, classifying and outputting a classification result.

In an embodiment, the first generating module is further configured to: performing data enhancement and data equalization on the target area by adopting a data enhancement method to obtain data enhancement data, wherein the number of all classified pictures in the data enhancement data is the same; and normalizing the data enhancement data to generate a preprocessing image set.

In an embodiment, the second generating module is further configured to: establishing a plurality of convolution blocks, wherein convolution kernels with different sizes are adopted in the convolution blocks to extract different scale characteristics, merging and adding operation is carried out on characteristic graphs output by convolution operation with different sizes of convolution kernels, and the number of channels describing the characteristics of the input convolution blocks and semantic information rich in a single channel are increased; merging the feature maps output by each rolling block to generate a feature map; the sizes of the feature maps output by the adjacent convolution blocks are sequentially halved, and the number of channels of the feature maps is sequentially doubled.

In an embodiment, the first output module is further configured to: inputting the extracted feature map into a full connection layer, and generating a probability value corresponding to a preset category; and outputting the classification result corresponding to the maximum probability value according to the probability value.

In a third aspect, an embodiment of the present invention provides an electronic device, including: a memory to store a computer program; a processor configured to perform the method of any of the preceding embodiments.

In a fourth aspect, an embodiment of the present invention provides a non-transitory electronic device readable storage medium, including: a program which, when run by an electronic device, causes the electronic device to perform the method of any of the preceding embodiments.

The embodiments of the image classification method, the image classification device, the electronic device and the non-transitory electronic device readable storage medium provided by the application design a new network, the network is realized by adopting one branch without additionally constructing an image pyramid, and in addition, in the aspect of feature fusion, features of the image under different granularities and different scales are simultaneously fused in the embodiments, so that the detection effect can be further improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are required to be used in the embodiments of the present application will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and that those skilled in the art can also obtain other related drawings based on the drawings without inventive efforts.

Fig. 1 is a schematic structural diagram of an electronic device according to an embodiment of the present disclosure;

fig. 2 is a schematic view of an application scenario of an image classification method according to an embodiment of the present application;

fig. 3 is a flowchart of an image classification method according to an embodiment of the present application;

FIG. 4 is a flowchart of another image classification method provided in the embodiments of the present application;

fig. 5 is a structural diagram of an image classification apparatus according to an embodiment of the present application.

Icon: the system comprises an electronic device 1, a bus 10, a processor 11, a memory 12, a user terminal 100, a server 200, an image classification device 500, a first acquisition module 501, a first generation module 502, a second generation module 503 and a first output module 504.

Detailed Description

The technical solutions in the embodiments of the present application will be described below with reference to the drawings in the embodiments of the present application.

As shown in fig. 1, the present embodiment provides an electronic apparatus 1 including: at least one processor 11 and a memory 12, one processor being exemplified in fig. 1. The processor 11 and the memory 12 are connected by a bus 10, and the memory 12 stores instructions executable by the processor 11 and the instructions are executed by the processor 11.

In an embodiment, the electronic device 1 may be a mobile phone, a tablet computer, or a personal computer, the electronic device 1 may receive an externally-transmitted image, then pre-process the received image through the steps of data cleaning, picture cropping, data equalization, and picture normalization, scale the pre-processed image, perform convolution, normalization, linear rectification, and down-sampling, combine channels to output a feature map, and finally classify the feature map through a convolution layer, a full connection layer, and Dropout, and output a classification result.

The Memory 12 may be implemented by any type of volatile or non-volatile Memory device or combination thereof, such as Static Random Access Memory (SRAM), Electrically Erasable Programmable Read-Only Memory (EEPROM), Erasable Programmable Read-Only Memory (EPROM), Programmable Read-Only Memory (PROM), Read-Only Memory (ROM), magnetic Memory, flash Memory, magnetic disk or optical disk.

The present application also provides a computer-readable storage medium storing a computer program executable by the processor 11 to perform the image classification method provided by the present application.

Fig. 2 is an application scenario schematic diagram of a configuration method of an intelligent device according to an embodiment of the present application. As shown in fig. 2, the application scenario includes a user terminal 100 and a server 200. The image information can be transmitted and output between the user terminal 100 and the server 200 through a wired data transmission mode. Image information can be transmitted and output through wireless communication modes such as WIFI, 2.4G, 433M, GPRS (General Packet Radio Service) and the like.

The user terminal 100 may be a Personal Computer (PC) having an application installed therein, a tablet PC, a smart phone, a Personal Digital Assistant (PDA), or the like. The server 200 may be a server, a server cluster, or a cloud computing center. The user terminal 100 and the server 200 are connected through a wired or wireless network.

Please refer to fig. 3, which is a flowchart illustrating an image classification method according to an embodiment of the present application, which can be executed by the electronic device 1 shown in fig. 1 and used in the interactive scenario shown in fig. 2. The method comprises the following steps:

step 301: an original image set is acquired.

In this step, the raw image set contains several raw images, and the raw images may be OCT images that have been subjected to data annotation, and the OCT images are used for the examination of diabetic macular edema and age-related macular degeneration. Data labeling can be performed by an ophthalmologist, and since the OCT works by detecting light from refractive tissues of an eye to the retina, acquiring tissue thickness and distance information provided by different tissue interface reflections in the eye and restoring the information to images and data, occlusion or obstruction of any part or property in the optical path of intraocular plankton, corneal or lens opacity, intraocular fillers and the like can interfere with the reception of optical signals, and reduce signal intensity and image quality. The original image labeled with artificial data cannot be fully used for subsequent learning.

Step 302: and cleaning, cutting, enhancing data and normalizing the original image set to generate a preprocessed image set.

In this step, since the label of each original image is manually labeled, errors and labeling errors are easily generated. Therefore, screening is required according to the label information of the image, and images with wrong labels and images for measurement are filtered.

In an embodiment, the sizes of the images may be inconsistent, and the images may also be shifted when the same region is photographed, so that the original image needs to be cropped to intercept the target region. For example, the original OCT image is roughly divided into three parts, the left side is an en-face OCT image which generally shows the position of the section to be made, the right side is an OCT B scan image which shows the section information, and the lower side is basic information, where the target region may be the OCT B scan image.

In one embodiment, the classification accuracy is further improved by data enhancement, which includes random rotation of the picture, horizontal flipping, and brightness variation. In addition, during model training, imbalance of samples of Drusen (Drusen), CNV (choroidal neo vascular), DME (diabetic macular edema) and Normal (normalcy) causes the model prediction to be biased to the most abundant type, and further influences the model effect, so that data equalization is performed after data enhancement in the application, and the number of pictures in each category is guaranteed to be the same.

In an embodiment, the target image after data enhancement and data equalization may be normalized, and the normalization method may be mean variance normalization.

Step 303: and inputting the preprocessed image set into a backbone network for training to generate a characteristic diagram.

In this step, the preprocessed image is subjected to feature training through a neural network, and features of different sizes at the current granularity are obtained. By means of BN (batch normalization), accelerated convergence of BN can be adopted during training, then ReLu (Rectified Linear Unit) is carried out, convolution kernels with different sizes are adopted for the same feature map, and feature maps obtained through convolution are combined, so that the number of channels for describing features is increased, and semantic information of images is enriched.

In an embodiment, the convolution kernel in the backbone network may adopt convolution kernels 1 whose convolution kernels have sizes of 3x3, 5x5, and 1x1, and the convolution kernel 2 and the convolution kernel 3 perform convolution operations, which are denoted as convolution operation 1, convolution operation 2, and convolution operation 3, and then combine features output by the convolution operation 1 and the convolution operation 2 to obtain a feature map 1, and then adjust the feature map 2 output by the feature map 1 and the convolution operation 3 to feature maps 3 and 4 having the same number of channels through convolution of 1x1, and finally perform a summation operation on the feature maps 3 and 4 to generate a feature map as an output of a convolution block.

Step 304: and inputting the extracted feature map into a full connection layer, classifying and outputting a classification result.

In this step, a new feature map is formed after integration, which already contains features at various scales and various granularities, and then the features can be classified through the full connection layer and/or Dropout, and a classification result is output. For example, the extracted feature map is compressed into 1 dimension, and the classification result is output through a full connection layer, wherein the classified output is the probability that the input picture belongs to 4 classes.

Please refer to fig. 4, which is a flowchart illustrating an image classification method according to an embodiment of the present application, which can be executed by the electronic device 1 shown in fig. 1 and used in the interactive scenario shown in fig. 2. The method comprises the following steps:

step 401: an original image set is acquired. Please refer to the description of step 301 in the above embodiments.

Step 402: and performing data enhancement and data equalization on the target area by adopting a data enhancement method to obtain data enhancement data.

In this step, the sizes of the original images may be inconsistent, and after the same region is photographed for different original images, the region finally displayed in the image may also be shifted, and the original image needs to be cropped, and only the required portion is cropped, in an embodiment, the portion of the OCT B scanning region is cropped.

Step 403: and normalizing the data enhancement data to generate a preprocessing image set.

In order to further improve the accuracy, data enhancement operation needs to be performed on the target area image, and the data enhancement mode comprises random rotation of the image, horizontal turning and brightness change. In addition, during model training, imbalance of samples of Drusen (Drusen), CNV (choroidal neo vascular), DME (diabetic macular edema) and Normal (normalcy) causes the model prediction to be biased to the most abundant type, and further influences the model effect, so that data equalization is performed on the basis of data enhancement in the application, and the number of the pictures of each type is guaranteed to be the same.

In an embodiment, the target area image after data enhancement and data equalization may be further normalized, and the normalization method may be mean variance normalization.

Step 404: and establishing a plurality of volume blocks and extracting different scale features.

In this step, the convolution block is used to perform convolution operation on the images in the preprocessed image set, and feature vectors are extracted. The sizes of the feature maps output by the adjacent convolution blocks are sequentially halved, and the number of channels of the feature maps is sequentially doubled.

In an embodiment, the convolution block performs convolution operations with different convolution kernel sizes, and extracts scale features of different sizes at the current granularity.

In this step, convolution kernels of different sizes are used in the convolution block to perform convolution operation, so as to extract features of different scales at the current granularity. The feature graphs output by convolution operations with different convolution kernel sizes are merged and added, and the number of channels for describing features of input convolution blocks and semantic information rich in a single channel are increased. In an embodiment, different sizes of convolution kernels are used to extract different scale features, so it is more important to highlight that convolution is performed by using convolution kernels of different sizes, and the most direct method is to set the convolution step size to be 2 if the feature graph output by convolution is halved.

The method specifically comprises the following steps: firstly, convolution operations are carried out on a convolution kernel 1, a convolution kernel 2 and a convolution kernel 3 with convolution kernel sizes of 3x3, 5x5 and 1x1 respectively, the convolution operations are marked as convolution operation 1, convolution operation 2 and convolution operation 3, then features output by the convolution operations 1 and 2 are combined to obtain a feature map 1, then the feature maps 2 output by the feature map 1 and the convolution operation 3 are adjusted to feature maps 3 and 4 with the same number of channels through convolution of 1x1 respectively, and finally the feature maps 3 and 4 are subjected to a summing operation, and the generated feature maps serve as the output of a convolution block.

Step 405: and merging the channels of the feature maps extracted from the volume blocks.

In an embodiment, the number of the convolution blocks in the above step may be 4, and the convolution blocks are respectively a first convolution block, a second convolution block, a third convolution block, and a fourth convolution block, where the four convolution blocks have the same structure, and each convolution block adopts convolution kernels with different sizes to perform convolution on input features, extract features of different scales at the current granularity, perform BN and ReLu calculations after each convolution, and perform channel merging on the extracted features. The size of the feature map output by the 4 convolution blocks is reduced by half step by step, and the number of channels of the feature map is increased by multiple times.

In an embodiment, the way of channel merging may be that the feature results of two adjacent convolution blocks are channel merged, for example, after the feature result of a first convolution block is twice down sampled, the first merging result is channel merged with the feature result of a second convolution block to generate a first merging result, after the first merging result is 2 times down sampled, the first merging result is merged with the feature result of a 3 rd convolution block to generate a second merging result, after the second merging result is 2 times down sampled, the second merging result is merged with a 4 th convolution block to generate a final merging result, at this time, each feature image already contains features at each scale and each granularity, and the final merging result is passed through one layer of convolution layer, BN, and Relu to generate a final feature map.

In an embodiment, if the size of the feature map is reduced by half step by step, the output of the first convolution block needs to be downsampled by 8 times, the output of the second convolution block needs to be downsampled by 4 times, and the output of the third convolution block needs to be downsampled by 2 times to be channel-merged with the feature map of the 4 th convolution block.

Step 406: and inputting the extracted characteristic diagram into the full-connection layer to generate a probability value corresponding to a preset category.

In this step, the final feature map is compressed into 1 dimension, and then classified by the full link layer and Dropout, and a classification result is output. In an embodiment, the classification may be divided into 4 classes, for each picture, the network finally outputs 4 decimal numbers corresponding to the probability that the current picture belongs to the 4 classes, then the class with the highest probability is selected as the final classification result, and the activation function adopts softmax.

Step 407: and outputting the classification result corresponding to the maximum probability value according to the probability value.

Please refer to fig. 5, which is a flowchart of an image classification apparatus 500 provided in an embodiment of the present application, and implemented by the electronic device 1 shown in fig. 1 and used in the interactive scene shown in fig. 2 to receive an externally-transmitted image, pre-process the received image through the steps of data cleaning, picture cropping, data equalization and picture normalization, scale the pre-processed image, perform convolution, normalization, linear rectification and down-sampling, combine channels to output a feature map, and classify the image according to features via a full connection layer and Dropout, thereby outputting a classification result. The image classification apparatus 500 includes: the specific principle relationship among the first obtaining module 501, the first generating module 502, the second generating module 503, and the first output module 504 is as follows:

a first obtaining module 501 is configured to obtain an original image set, where the original image set includes a plurality of original images. Please refer to the description of step 301 in the above embodiments.

The first generating module 502 is configured to perform cleaning, clipping, data enhancement, and normalization on the original image set to generate a preprocessed image set. Please refer to the description of step 302 in the above embodiment.

In an embodiment, the first generating module 502 is further configured to: performing data enhancement and data equalization on the target area by adopting a data enhancement method to obtain data enhancement data, wherein the number of all classified pictures in the data enhancement data is the same; and normalizing the data enhancement data to generate a preprocessing image set. Please refer to the description of steps 402-403 in the above embodiment.

And a second generating module 503, configured to input the preprocessed image set into the backbone network for training, so as to generate a feature map. Please refer to the description of step 303 in the above embodiments.

In an embodiment, the second generating module 503 is further configured to: establishing a plurality of convolution blocks, wherein convolution kernels with different sizes are adopted in the convolution blocks to extract different scale characteristics, merging and adding operation is carried out on characteristic graphs output by convolution operation with different sizes of convolution kernels, and the number of channels describing the characteristics of the input convolution blocks and semantic information rich in a single channel are increased; merging the feature maps output by each rolling block to generate a feature map; the sizes of the feature maps output by the adjacent convolution blocks are sequentially halved, and the number of channels of the feature maps is sequentially doubled. Please refer to the description of step 404 and step 405 in the above embodiments.

And a first output module 504, configured to input the extracted feature map to a full connection layer, perform classification, and output a classification result. Please refer to the description of step 305 in the above embodiment.

In one embodiment, the first output module 504 is further configured to: inputting the extracted feature map into a full connection layer, and generating a probability value corresponding to a preset category; and outputting the classification result corresponding to the maximum probability value according to the probability value. Please refer to the description of

steps

406 and 407 in the above embodiments.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, a division of a unit is merely a division of one logic function, and there may be other divisions when actually implemented, and for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or units through some communication interfaces, and may be in an electrical, mechanical or other form.

In addition, units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

Furthermore, the functional modules in the embodiments of the present application may be integrated together to form an independent part, or each module may exist separately, or two or more modules may be integrated to form an independent part.

It should be noted that the functions, if implemented in the form of software functional modules and sold or used as independent products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

In this document, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions.

The above embodiments are merely examples of the present application and are not intended to limit the scope of the present application, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims

1. An image classification method, comprising:

obtaining an original image set, wherein the original image set comprises a plurality of original images;

cleaning, cutting, data enhancing and normalizing the original image set to generate a preprocessed image set;

inputting the preprocessed image set into a backbone network for training to generate a feature map;

inputting the extracted feature map into a full connection layer, classifying and outputting a classification result;

inputting the preprocessed image set into a backbone network for training to generate a feature map, wherein the feature map comprises:

establishing a plurality of convolution blocks, wherein convolution kernels with different sizes are adopted in the convolution blocks to extract different scale features, merging and adding operation is carried out on feature graphs output by convolution operation with different sizes of convolution kernels, and the number of channels describing the features of the input convolution blocks and semantic information rich in a single channel are increased;

merging the feature maps output by each rolling block to generate the feature maps; wherein the content of the first and second substances,

the sizes of the feature graphs output by the adjacent rolling blocks are sequentially halved, and the number of channels of the feature graphs is sequentially doubled;

the convolution block adopts convolution kernels with different sizes to extract different scale features, and the feature graphs output by convolution operations with different sizes of convolution kernels are merged and added, and the method comprises the following steps:

performing convolution operation by using a first convolution kernel, a second convolution kernel and a third convolution kernel with convolution kernel sizes of 3x3, 5x5 and 1x1 respectively, wherein the convolution operation is marked as a first convolution operation, a second convolution operation and a third convolution operation respectively;

combining the features output by the first convolution operation and the second convolution operation to obtain a first feature map;

the first feature map and the second feature map output by the third convolution operation are respectively adjusted into a third feature map and a fourth feature map with the same channel number through convolution of 1x 1;

adding the third feature map and the fourth feature map to generate a feature map as an output of the volume block;

the number of the volume blocks is four, and the merging of the feature maps output by the volume blocks includes:

after the feature result of the first convolution block is sampled twice, channel combination is carried out on the feature result of the first convolution block and the feature result of the second convolution block to generate a first combination result;

after the first combination result is sampled twice, the first combination result and the characteristic result of the third volume block are combined to generate a second combination result;

after the second combination result is sampled twice, the second combination result is combined with a fourth convolution block to generate a final combination result;

and after the final merging result passes through a convolution layer, BN and Relu, generating the characteristic diagram.

2. The method of claim 1, wherein the cleaning, cropping, data enhancement, and normalization of the original image set to generate a pre-processed image set comprises:

performing data enhancement and data equalization on a target area by adopting a data enhancement method to obtain data enhancement data, wherein the number of all classified pictures in the data enhancement data is the same;

and normalizing the data enhancement data to generate the preprocessing image set.

3. The method according to claim 1, wherein the inputting the extracted feature map into a full connection layer, classifying and outputting a classification result comprises:

inputting the extracted feature map into a full connection layer to generate a probability value corresponding to a preset category;

and outputting the classification result corresponding to the maximum probability value according to the probability value.

4. An image classification apparatus, comprising:

the system comprises a first acquisition module, a second acquisition module and a third acquisition module, wherein the first acquisition module is used for acquiring an original image set, and the original image set comprises a plurality of original images;

the first generation module is used for generating a preprocessing image set after cleaning, cutting, data enhancement and normalization processing are carried out on an original image set;

the second generation module is used for inputting the preprocessed image set into a backbone network for training to generate a feature map;

the first output module is used for inputting the extracted feature map into a full connection layer, classifying and outputting a classification result;

the second generation module is further to:

5. The apparatus of claim 4, wherein the first generating module is further configured to:

6. The apparatus of claim 4, wherein the first output module is further configured to:

7. An electronic device, comprising:

a memory to store a computer program;

a processor to perform the method of any one of claims 1 to 3.

8. A non-transitory electronic device readable storage medium, comprising: program which, when run by an electronic device, causes the electronic device to perform the method of any one of claims 1 to 3.