CN112396103A

CN112396103A - Image classification method, device and storage medium

Info

Publication number: CN112396103A
Application number: CN202011284218.6A
Authority: CN
Inventors: 陈超; 郭岑; 黄凌云; 刘玉宇; 肖京
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2020-11-16
Filing date: 2020-11-16
Publication date: 2021-02-23

Abstract

The invention relates to the technical field of intelligent decision, and discloses an image classification method, which comprises the following steps: acquiring a training image, and preprocessing the training image to generate a first image set; training the convolutional neural network according to the first image set to obtain an image classification model, wherein the image classification model comprises a first convolutional layer and a second convolutional layer, and the first convolutional layer and the second convolutional layer are convolutional layers formed by multiplying a residual error module and an attention module; acquiring an image to be classified, and preprocessing the image to be classified to generate a second image set; and classifying the images in the second image set according to the image classification model to obtain image categories, wherein the image categories comprise malignant images and benign images. According to the image classification method, the image classification model is optimized, and the accuracy of classifying the ultrasonic images by the image classification model is improved.

Description

Image classification method, device and storage medium

Technical Field

The present invention relates to the field of intelligent decision making technologies, and in particular, to an image classification method, an electronic device, and a computer-readable storage medium.

Background

A residual error network (ResNet) in the convolutional neural network can establish a deeper neural network by connecting features and improving shallow gradient. Although neural networks can be optimized by means of such high-dimensional non-linear mapping, it is difficult to achieve good results in image classification for some small data image sets, in particular medical data image sets. In the deep learning task, if the data volume is too small, the residual network cannot use channels with too high dimensions, and the number of layers of the neural network is also carefully controlled, because the overfitting is caused by training a large neural network with a small amount of data, the image classification accuracy is reduced.

Disclosure of Invention

In view of the above, it is necessary to provide an image classification method for accurately classifying ultrasound images.

The image classification method provided by the invention comprises the following steps:

acquiring a training image, and preprocessing the training image to generate a first image set;

training a convolutional neural network according to the first image set to obtain an image classification model, wherein the image classification model comprises a first convolutional layer and a second convolutional layer, and the first convolutional layer and the second convolutional layer are convolutional layers formed by multiplying a residual error module and an attention module;

acquiring an image to be classified, and preprocessing the image to be classified to generate a second image set;

and classifying the images in the second image set according to the image classification model to obtain image categories, wherein the image categories comprise malignant images and benign images.

Optionally, the training image and the image to be classified are a single thyroid ultrasound nodule lesion image, and the preprocessing includes:

identifying the positions of the nodules in the training image or the image to be classified;

acquiring a circumscribed rectangle of the nodule, zooming the training image or the image to be classified to a preset size to obtain a zoomed image according to the size of the longest side of the circumscribed rectangle, wherein one side of the zoomed image is coincided with the longest side of the circumscribed rectangle.

Optionally, the image classification model further includes a third convolutional layer, and the third convolutional layer is a convolutional network composed of residual modules.

Optionally, the residual module is formed by connecting a1 x 1 convolution of n channels with a3 x 3 convolution of n channels and then connecting a1 x 1 convolution of 4n channels, where n is the number of channels.

Optionally, the attention module adds 0.5 to all elements of the mask before overlaying the image input to the attention module with the mask.

Optionally, the attention module in the image classification model is activated using a sigmoid activation function, and the other modules are activated using a pRelu function.

Optionally, the image classification model further includes a classification layer, and the classification layer classifies the images processed by other layers of the image classification model by using a logistic regression function.

In addition, to achieve the above object, the present invention also provides an electronic device including: a memory, a processor, the memory having stored thereon an image classification program executable on the processor, the image classification program when executed by the processor implementing the steps of the image classification method as follows:

Further, to achieve the above object, the present invention also provides a computer-readable storage medium having stored thereon an image classification program, the image classification program being executable by one or more processors to implement the steps of the image classification method as follows:

Further, to achieve the above object, the present invention also provides an image classification apparatus including:

the first image module is used for acquiring a training image, and preprocessing the training image to generate a first image set;

the model training module is used for training the convolutional neural network according to the first image set to obtain an image classification model, the image classification model comprises a first convolutional layer and a second convolutional layer, and the first convolutional layer and the second convolutional layer are convolutional layers formed by multiplying a residual error module and an attention module;

the second image module is used for acquiring an image to be classified, and preprocessing the image to be classified to generate a second image set;

and the model classification module is used for classifying the images in the second image set according to the image classification model to obtain image categories, wherein the image categories comprise malignant images and benign images.

Drawings

FIG. 1 is a flowchart of an embodiment of an image classification method according to the present invention;

FIG. 2 is a diagram of an electronic device according to an embodiment of the invention;

fig. 3 is a block diagram of an image classification apparatus according to an embodiment of the present invention.

The implementation, functional features and advantages of the objectives of the present application will be further explained with reference to the accompanying drawings.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It should be noted that the description relating to "first", "second", etc. in the present invention is for descriptive purposes only and is not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In addition, technical solutions between various embodiments may be combined with each other, but must be realized by a person skilled in the art, and when the technical solutions are contradictory or cannot be realized, such a combination should not be considered to exist, and is not within the protection scope of the present invention.

Referring to FIG. 1, a flowchart of an embodiment of an image classification method according to the invention is shown, which includes steps S1-S4.

And S1, acquiring a training image, and preprocessing the training image to generate a first image set.

In one embodiment, the training image is an ultrasound image of an annotated image class. The first image set includes a plurality of training images that have been preprocessed. Specifically, the ultrasonic image is a single thyroid ultrasound nodule focus image. The pretreatment step comprises: after the ultrasonic image is obtained, the position of a node in the ultrasonic image is identified, a circumscribed rectangle of the node is taken, the training image or the image to be classified is zoomed to a preset size (for example 128 x 128) according to the size of the longest side of the circumscribed rectangle to obtain a zoomed image, and one side of the zoomed image is superposed with one longest side of the circumscribed rectangle. If the longest side of the circumscribed rectangle is the upper side and the lower side, the upper side of the zoomed image is superposed with the upper side of the circumscribed rectangle; and if the longest side of the external rectangle is the left side and the right side, the left side of the zoomed image is superposed with the left side of the external rectangle.

S2, training the convolutional neural network according to the first image set to obtain an image classification model, wherein the image classification model comprises a first convolutional layer and a second convolutional layer, and the first convolutional layer and the second convolutional layer are convolutional layers formed by multiplying a residual error module and an attention module.

In one embodiment, the first convolutional layer and the second convolutional layer are convolutional networks formed by multiplying a residual module and an attention module. The residual error module is used for deepening the learning depth of the convolutional neural network so as to improve the performance of the model, and the attention module is used for removing the influence of noise points in the image. And after deep semantic information of the image of the first image set is extracted through the residual error module, screening is carried out through the attention module multiplied by the residual error module, and key deep semantic information required by image classification is obtained.

In an embodiment, the image classification model further comprises a third convolutional layer, which is a convolutional network composed of residual modules. The third convolutional layer only adopts a residual module and does not adopt an attention module so as to prevent the model from generating an overfitting phenomenon.

In an embodiment, the image classification model further comprises a classification layer, and the classification layer adopts a logistic regression function softmax to classify the images processed by other layers of the image classification model.

Specifically, the residual module is formed by connecting 1 multiplied by 1 convolution of n channels with 3 multiplied by 3 convolution of n channels and then connecting 1 multiplied by 1 convolution of 4n channels, wherein n is the number of channels. The number of channels of the first convolution layer, the second convolution layer and the third convolution layer is respectively 64, 128 and 256.

The 64 channels of the first convolutional layer are used to extract 64-dimensional features, i.e., relatively simple dot and line structures, which vary in thickness direction. The 128 channels of the second convolutional layer further combine the 64-dimensional features extracted by the first convolutional layer into more 128-dimensional features. The third convolution layer further breaks up and combines the features into 128-dimensional features with better universal expression.

For example, for the benign and malignant classification of nodule lesions in the thyroid ultrasound image, more obvious point structures such as calcifications in the nodules or edge isoline structures are extracted at the first convolution layer, and more semantic information features such as different cystic or solid regions of the nodules are extracted at the second convolution layer and the third convolution layer. The image classification model performs key weighting on different regions through an attention module, suppresses other irrelevant regions, and outputs a weight map learned through back propagation, wherein the key weighted region is close to 1, and the suppressed region is close to 0.

Specifically, the attention module consists of a down-sampling coding structure and an up-sampling decoding structure and is activated by a sigmoid activation function. The attention module adds 0.5 to all elements of the mask before overlaying the image input to the attention module through the mask to maintain the integrity of the artwork information. Prior art attention modules typically add 1 to all elements of the mask before overlaying the image, primarily to process natural images. The image classification model is mainly used for processing ultrasonic images, and different from natural images, under the condition that most elements of a mask are far smaller than 1, the proportion relation of different positions of the mask is damaged by adding 1 to all the elements. Through a large number of experimental tests, the parameter is adjusted to 0.5, and the image classification model has the best effect when being used for processing the ultrasonic image under the parameter setting.

Specifically, modules of all layers except the attention module in the image classification model are activated by adopting a pRelu function, so that the condition that part of neurons in the early training stage of the image classification model are not activated and cannot act is prevented.

And S3, acquiring an image to be classified, and preprocessing the image to be classified to generate a second image set.

And S4, classifying the images in the second image set according to the image classification model to obtain image categories, wherein the image categories comprise malignant images and benign images.

Specifically, the classifying the images in the second image set according to the image classification model to obtain the image categories includes:

inputting the images in the second image set into a first convolution layer of the image classification model to obtain a first feature map, inputting the first feature map into a second convolution layer to obtain a second feature map, and inputting the second feature map into a third convolution layer to obtain a third feature map;

pooling the third feature map through a global average pooling layer, processing through a first full-connection layer and a second full-connection layer to obtain a classification output value, inputting the classification output value into a classification layer for classification to obtain an image category of the ultrasonic image corresponding to the classification output value, and classifying through a logistic regression function by the classification layer. The image categories of the ultrasound image include malignant images and benign images.

According to the embodiment, the image classification method provided by the invention obtains the image classification model by training the residual channel attention network, and classifies the image to be classified according to the image classification model to obtain the image category. The image classification model is optimized, and the accuracy of classifying the ultrasonic images by the image classification model is improved.

Fig. 2 is a schematic diagram of an electronic device 1 according to an embodiment of the invention. The electronic apparatus 1 is a device capable of automatically performing numerical calculation and/or information processing in accordance with a command set or stored in advance. The electronic device 1 may be a computer, or may be a single network server, a server group composed of a plurality of network servers, or a cloud composed of a large number of hosts or network servers based on cloud computing, where cloud computing is one of distributed computing and is a super virtual computer composed of a group of loosely coupled computers.

In the present embodiment, the electronic device 1 includes, but is not limited to, a memory 11, a processor 12, and a network interface 13, which are communicatively connected to each other through a system bus, wherein the memory 11 stores an image classification program 10, and the image classification program 10 is executable by the processor 12. While fig. 1 only shows the electronic device 1 with the components 11-13 and the image classification program 10, it will be understood by those skilled in the art that the structure shown in fig. 1 does not constitute a limitation of the electronic device 1, and may comprise fewer or more components than shown, or some components may be combined, or a different arrangement of components.

The storage 11 includes a memory and at least one type of readable storage medium. The memory provides cache for the operation of the electronic device 1; the readable storage medium may be a non-volatile storage medium such as flash memory, a hard disk, a multimedia card, a card type memory (e.g., SD or DX memory, etc.), a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a Read Only Memory (ROM), an Electrically Erasable Programmable Read Only Memory (EEPROM), a Programmable Read Only Memory (PROM), a magnetic memory, a magnetic disk, an optical disk, etc. In some embodiments, the readable storage medium may be an internal storage unit of the electronic apparatus 1, such as a hard disk of the electronic apparatus 1; in other embodiments, the storage medium may also be an external storage device of the electronic apparatus 1, such as a plug-in hard disk provided on the electronic apparatus 1, a Smart Media Card (SMC), a Secure Digital (SD) card, a flash memory card (FlashCard), and the like. In this embodiment, the readable storage medium of the memory 11 mainly includes a storage program area and a storage data area, where the storage program area is generally used for storing an operating system and various types of application software installed in the electronic device 1, such as codes of the image classification program 10 in an embodiment of the present invention; the storage data area may store data created according to the use of the blockchain node, etc., such as various types of data that have been output or are to be output.

Processor 12 may be a Central Processing Unit (CPU), controller, microcontroller, microprocessor, or other data processing chip in some embodiments. The processor 12 is generally used for controlling the overall operation of the electronic apparatus 1, such as performing control and processing related to data interaction or communication with other devices. In this embodiment, the processor 12 is configured to execute the program code stored in the memory 11 or process data, such as executing the image classification program 10.

The network interface 13 may comprise a wireless network interface or a wired network interface, and the network interface 13 is used for establishing a communication connection between the electronic device 1 and a client (not shown).

Optionally, the electronic device 1 may further include a user interface, the user interface may include a Display (Display), an input unit such as a Keyboard (Keyboard), and the optional user interface may further include a standard wired interface and a wireless interface. Optionally, in some embodiments, the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an organic light-emitting diode (OLED) touch device, or the like. The display, which may also be referred to as a display screen or display unit, is suitable for displaying information processed in the electronic apparatus 1 and for displaying a visualized user interface.

In one embodiment of the present invention, the image classification program 10 when executed by the processor 12 implements the following steps S1-S5.

As can be seen from the foregoing embodiment, in the electronic device 1 provided by the present invention, an image classification model is obtained by training a residual channel attention network, and an image to be classified is classified according to the image classification model to obtain an image category. The image classification model is optimized, and the accuracy of classifying the ultrasonic images by the image classification model is improved.

In other embodiments, the image classification program 10 may also be divided into one or more modules, and the one or more modules are stored in the memory 11 and executed by one or more processors (in this embodiment, the processor 12) to implement the present invention, where the module referred to in the present invention refers to a series of computer program instruction segments capable of performing a specific function for describing the execution process of the image classification program 10 in the electronic device 1.

Fig. 3 is a block diagram of an image classification apparatus 10 according to an embodiment of the present invention.

In an embodiment of the present invention, the image classification apparatus 10 includes a first image module 110, a model training module 120, a second image module 130, and a model classification module 140, which exemplarily:

the first image module 110 is configured to obtain a training image, and perform preprocessing on the training image to generate a first image set;

the model training module 120 is configured to train a convolutional neural network according to the first image set to obtain an image classification model, where the image classification model includes a first convolutional layer and a second convolutional layer, and the first convolutional layer and the second convolutional layer are convolutional layers formed by multiplying a residual error module and an attention module;

the second image module 130 is configured to obtain an image to be classified, and pre-process the image to be classified to generate a second image set;

the model classification module 140 is configured to classify the images in the second image set according to the image classification model to obtain image categories, where the image categories include malignant images and benign images.

The functions or operation steps of the first image module 110, the model training module 120, the second image module 130, and the model classification module 140 when executed are substantially the same as those of the above embodiments, and are not repeated herein.

In addition, the embodiment of the present invention further provides a computer-readable storage medium, which may be any one of or any combination of a hard disk, a multimedia card, an SD card, a flash memory card, an SMC, a Read Only Memory (ROM), an Erasable Programmable Read Only Memory (EPROM), a portable compact disc read only memory (CD-ROM), a USB memory, and the like. Included in the computer-readable storage medium is an image classification program 10, which when executed by a processor, performs the operations of:

a1, acquiring a training image, and preprocessing the training image to generate a first image set;

a2, training a convolutional neural network according to the first image set to obtain an image classification model, wherein the image classification model comprises a first convolutional layer and a second convolutional layer, and the first convolutional layer and the second convolutional layer are convolutional layers formed by multiplying a residual error module and an attention module;

a3, acquiring an image to be classified, and preprocessing the image to be classified to generate a second image set;

and A4, classifying the images in the second image set according to the image classification model to obtain image categories, wherein the image categories comprise malignant images and benign images.

The embodiment of the computer-readable storage medium of the present invention is substantially the same as the embodiment of the image classification method and the electronic device, and will not be described herein again.

The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, apparatus, article, or method that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, apparatus, article, or method. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, apparatus, article, or method that includes the element.

Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (such as a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present invention.

The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims

1. An image classification method, comprising:

2. The image classification method according to claim 1, wherein the training image and the image to be classified are a single thyroid ultrasound nodule lesion image, and the preprocessing includes:

3. The image classification method of claim 1, characterized in that the image classification model further comprises a third convolutional layer, which is a convolutional network composed of residual modules.

4. The image classification method according to claim 1, characterized in that the residual module consists of a1 x 1 convolution of n channels followed by a3 x 3 convolution of n channels followed by a1 x 1 convolution of 4n channels, where n is the number of channels.

5. The image classification method according to claim 1, characterized in that the attention module adds 0.5 to all elements of the mask before covering the image input to the attention module by the mask.

6. The image classification method of claim 1, characterized in that the attention module is activated with a sigmoid activation function and other modules in the image classification model are activated with a pRelu function.

7. The image classification method of claim 1, wherein the image classification model further comprises a classification layer that employs a logistic regression function to classify images processed by other layers of the image classification model.

8. An electronic device, comprising: a memory, a processor, the memory having stored thereon an image classification program executable on the processor, the image classification program when executed by the processor implementing the steps of the image classification method as follows:

9. A computer readable storage medium having stored thereon an image classification program executable by one or more processors to implement the steps of an image classification method as follows:

10. An image classification apparatus, characterized in that the apparatus comprises: