CN110909756A

CN110909756A - Convolutional neural network model training method and device for medical image recognition

Info

Publication number: CN110909756A
Application number: CN201811088294.2A
Authority: CN
Inventors: 苏宁; 郭文强
Original assignee: Individual
Current assignee: Individual
Priority date: 2018-09-18
Filing date: 2018-09-18
Publication date: 2020-03-24

Abstract

The invention discloses a convolutional neural network model training method and device for medical image recognition, a method and device for recognizing medical images, a computer-readable storage medium and an electronic device, wherein the method comprises the following steps: converting the medical image to an HSV color gamut; carrying out binarization processing on an S channel in the HSV color gamut to obtain a binarized image; performing morphological processing on the binary image to obtain an interested area; respectively extracting a positive sample and a negative sample in the region of interest by using a preset target region mask; and training the convolutional neural network model using the positive samples and the negative samples. The invention can improve the analysis efficiency and accuracy of the medical image.

Description

Convolutional neural network model training method and device for medical image recognition

Technical Field

The present invention relates to the field of computer technologies, and in particular, to a convolutional neural network model training method and apparatus for medical image recognition, a method and apparatus for recognizing a medical image, a computer-readable storage medium, and an electronic device.

Background

The main task of pathology and laboratory science is to undertake the work of disease diagnosis in the course of medical treatment, including the examination of abnormal cells in blood and bone marrow by biopsy, cast and fine needle cytology, to provide a definitive medical diagnosis for the clinic and to determine the nature of the disease. Pathological image analysis has been the most direct method for disease diagnosis. The continuous improvement of the capability of the medical pathological image scanning and analyzing system provides convenience for the high-resolution morphological digitization of medical sample images and brings opportunity for the research of computer-aided medical image analysis. The computer-aided analysis of the digitized medical pathological images is helpful for reducing the workload of clinicians and improving the efficiency of pathological and inspection doctors in analyzing the medical pathological images.

In the past, researchers often perform artificial feature extraction on medical pathological images through various morphological methods, and then learn a classifier by using a machine learning model, so that automatic classification of clinical samples is realized, but the precision is insufficient. In recent years, a deep learning model is rapidly developed in the field of artificial intelligence, wherein a convolutional neural network obtains the optimal performance superior to all traditional machine learning methods in computer vision tasks such as target tracking, image classification and the like aiming at computer vision recognition tasks. The deep convolutional neural network has the characteristics of automatic extraction of high abstract features and data driving, and can provide a more accurate and reliable classification recognition result compared with a traditional pattern recognition algorithm. Based on this, the analysis, identification and auxiliary diagnosis of medical pathological images by using convolutional neural networks are one of the subjects to be studied intensively in the medical imaging field.

The identification and recognition of lesion tissues realized based on the convolutional neural network face several technical problems, firstly, the ultrahigh resolution of the digital medical pathological image provides a great challenge to a preprocessing mode, and the existing image segmentation-based technology is generally used for image analysis under small resolution and cannot effectively process the ultrahigh resolution digital medical pathological image with huge data volume. In addition, in the task of medical pathological image recognition, on one hand, the samples used include pathological medical pathological image samples in different stages and puncture samples used for early screening, and on the other hand, the tissue forms, various shapes and area ratios in each digital medical pathological image are different, so that the calculation amount and the accuracy of sample extraction also become a pair of contradictions which are difficult to balance. The convolutional neural network shows excellent performance in a general image classification task, but in the field of medical pathological image recognition, if a mode of training a neural network model from zero is adopted, under a huge sample space brought by a large number of high-resolution digital medical pathological images, the network is difficult to converge or is easy to fall into local optimum too early, so that the analysis processing efficiency of the whole system is influenced.

Disclosure of Invention

In order to carry out efficient computer-aided lesion tissue identification and positioning on a high-resolution medical pathological image, the invention provides a convolutional neural network model training method and device for medical image identification, a method and device for identifying a medical image, a computer-readable storage medium and an electronic device.

According to an embodiment of the present invention, there is provided a convolutional neural network model training method for medical image recognition, the method including: converting the medical image to an HSV color gamut; carrying out binarization processing on an S channel in the HSV color gamut to obtain a binarized image; performing morphological processing on the binary image to obtain an interested area; respectively extracting a positive sample and a negative sample in the region of interest by using a preset target region mask; and training the convolutional neural network model using the positive samples and the negative samples.

Optionally, the morphological processing comprises: expansion, corrosion, opening and closing treatment.

Optionally, the morphologically processing the binarized image to obtain the region of interest includes: performing morphological processing on the binary image to obtain the outer contour of one or more tissue areas; generating a circumscribed rectangle frame for the outer contour of each tissue region; judging whether the area of each circumscribed rectangular frame is larger than a preset threshold value or not, if so, keeping the circumscribed rectangular frame, and if not, ignoring the circumscribed rectangular frame; judging whether a full inclusion relationship exists between any two external rectangular frames in the external rectangular frames, if so, keeping the external rectangular frame with a larger area in any two external rectangular frames, and neglecting the external rectangular frame with a smaller area; and using the finally obtained area defined by the circumscribed rectangle as the region of interest.

Optionally, the preset target region mask includes a lesion region identification mask and a normal region identification mask, and the method further includes: calculating marking information in an xml format through a PNPoly algorithm to generate the lesion area identification mask; and generating the normal region identification mask based on the lesion region identification mask and the binarized image.

Optionally, the calculating labeling information in xml format by PNPoly algorithm to generate the lesion area identification mask includes: respectively generating a mask of a lesion area and a rejection mask by using the labeling information in two xml formats; subtracting the elimination mask from the lesion area mask to obtain the lesion area identification mask; and the generating of the normal region identification mask based on the lesion region identification mask and the binarized image includes: and subtracting the lesion area identification mask from the binarized image to obtain the normal area identification mask.

Optionally, the labeling information stores vertex coordinates of a plurality of polygon labeling areas, and the method includes: and adjusting the priority of executing the PNPoly algorithm by each polygon labeling area by taking the descending order of the areas of the polygon labeling areas as a reference.

Optionally, the preset target area mask is generated at a level-4 of the image pyramid of the medical image, and the positive sample and the negative sample are extracted at a level-0 of the image pyramid of the medical image.

Optionally, the method further comprises: and randomly dithering the image color parameters of the positive sample, and randomly turning horizontally and vertically to expand the capacity of the positive sample.

Optionally, the image color parameter includes at least one of brightness, contrast, hue and saturation of the image.

Optionally, the method further comprises: the positive and negative examples are stored using TFRecord and extracted using a multi-threaded input data processing framework in tensflow.

Optionally, before converting the medical image to the HSV color gamut, the method further comprises: the medical image is dye-normalized using a dye normalization algorithm.

Optionally, the training the convolutional neural network model using positive samples and negative samples comprises: and adding shortcut connection between the feature map output layer and the global average pooling layer, so that the feature map output of each residual module directly acts on the global average pooling layer and the softmax layer.

Optionally, the weight of the connection between the feature map output layer and the global average pooling layer is w_j,cThe method further comprises the following steps: for the weight w using the following equation_j,cThe decomposition is done so that paths with different numbers of residual blocks have independent weights:

wherein p is_cOutput probability of path class c, A_jAnd for the output of the jth node of the global average pooling layer, i and j are integers greater than 0, f is a mapping including all function mappings in the residual error modules, y is the output of the residual error modules, l represents that the output of each subsequent residual error module can be quickly connected to the global average pooling layer from the ith residual error module, the global average pooling layer performs output mapping and weighting calculation on each feature map output layer, k is the number of the residual error modules, and k is an integer greater than 0.

Optionally, the method further comprises: training the convolutional neural network model by using the positive sample and the negative sample to obtain a stage model; generating a prediction heat map using the staging model; carrying out sample re-extraction on misjudgment gathering areas in the prediction heat map in an overlapping mode; and

the convolutional neural network model is further trained using the re-extracted samples and the previously extracted positive and negative samples.

The embodiment of the invention also provides a convolutional neural network model training device for medical image recognition, which comprises: a conversion module configured to convert the medical image to an HSV color gamut; the binarization processing module is used for carrying out binarization processing on the S channel in the HSV color gamut to obtain a binarization image; the region extraction module is used for performing morphological processing on the binary image to obtain a region of interest; the sample extraction module is used for respectively extracting a positive sample and a negative sample in the region of interest by utilizing a preset target region mask; and a training module for training the convolutional neural network model using the positive samples and the negative samples.

The embodiment of the invention also provides a method for identifying the medical image, which comprises the following steps: converting the medical image to an HSV color gamut; carrying out binarization processing on an S channel in the HSV color gamut to obtain a binarized image; performing morphological processing on the binary image to obtain a region of interest; and identifying the region of interest by using a convolutional neural network model trained by any one of the convolutional neural network model training methods for medical image identification.

Optionally, the identifying the region of interest by using the convolutional neural network model trained by any one of the above convolutional neural network model training methods for medical image identification includes: acquiring the resolution of the medical image and the area ratio of a tissue region; determining the resolution of the constructed heat map according to the resolution of the acquired medical image to be identified, the area ratio of the tissue area and a preset threshold; extracting slices from the tissue region with overlap; inputting the extracted slices into the convolutional neural network model for prediction, and forming a probability matrix by the probability of each slice obtained through prediction; and generating a prediction heat map based on the probability matrix.

An embodiment of the present invention further provides an apparatus for recognizing a medical image, where the apparatus includes: a conversion module configured to convert the medical image to an HSV color gamut; the binarization processing module is used for carrying out binarization processing on the S channel in the HSV color gamut to obtain a binarization image; the region extraction module is used for performing morphological processing on the binary image to obtain a region of interest; and the identification module is arranged to identify the region of interest by using the convolutional neural network model trained by any one of the convolutional neural network model training methods for medical image identification.

An embodiment of the present invention further provides a computer storage medium, on which a computer program is stored, where the computer program is executed by a processor to implement any one of the above-mentioned convolutional neural network model training methods for medical image recognition or any one of the above-mentioned methods for recognizing medical images.

An embodiment of the present invention further provides an electronic device, including: a processor; and a memory having computer readable instructions stored thereon which, when executed by the processor, implement any of the above convolutional neural network model training methods for medical image recognition or any of the above methods for recognizing medical images.

The technical scheme provided by the embodiment of the invention can have the following beneficial effects:

the convolutional neural network model training method and device for medical image recognition, the method and device for recognizing medical images, the computer-readable storage medium and the electronic device according to the embodiments of the present invention provide an efficient and accurate solution for computer-aided medical pathological image diagnosis, so that the analysis efficiency of medical pathological images is improved.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention, as claimed.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description, serve to explain the principles of the invention.

Fig. 1 shows a schematic diagram of an exemplary system architecture 100 of a convolutional neural network model training method or apparatus for medical image recognition, or a medical image recognition method or apparatus, to which an embodiment of the present invention may be applied.

FIG. 2 illustrates a schematic structural diagram of a computer system suitable for implementing an electronic device of an embodiment of the invention.

FIG. 3 is a flowchart illustrating a convolutional neural network model training method for medical image recognition, according to an exemplary embodiment.

FIG. 4 shows a flow diagram of an image staining normalization process according to an embodiment of the invention.

FIG. 5 is a diagram showing the effect of processing a slice sample in a 512 × 512 region on level-1.

Fig. 6 shows a front-back comparison diagram of binarization processing of a grayscale image and images of three channels HSV.

Fig. 7 (a) - (d) show the comparison before and after the expansion, erosion, opening, and closing processes of the binarized image, respectively.

FIG. 8 illustrates the processing of a tissue detection algorithm according to an embodiment of the invention.

FIG. 9 illustrates a structure of a markup file according to an embodiment of the present invention.

FIGS. 10 (a) and (b) show the mask output before optimization and the imaged annotation region mask after optimization, respectively.

FIG. 11 shows a flowchart of a WSI preprocessing system based on mask segmentation, according to an embodiment of the present invention.

Fig. 12 shows a schematic diagram of a PNPoly algorithm according to an embodiment of the present invention.

FIG. 13 shows a mask segmentation based tissue detection algorithm in accordance with an embodiment of the present invention.

FIG. 14 is a diagram illustrating an example of data amplification according to an embodiment of the invention.

Fig. 15 shows an integrated deployment diagram for ResNet according to an embodiment of the invention.

Figure 16 shows a block diagram of Multiscale-resenext-50, according to an embodiment of the invention.

FIG. 17 shows a flowchart of an algorithm for cascading hard case mining, according to an embodiment of the invention.

FIG. 18 shows a schematic of experimental analysis of the performance of ResNet-50 cured at different numbers of Residual blocks.

Fig. 19 (a) - (f) respectively show the accuracy of neural network training, the area (Auc) covered by ROC (receiver operating characteristic curve), the accuracy, the Recall (Recall), the False Positive samples (False Positive samples) and the number of False negative samples (False negative samples) with the variation trend of the training iteration number.

Fig. 20 (a) and (b) show the prediction results of the location of a lesion region on a certain normal tissue WSI (Test _047) and a certain lesion tissue WSI (Test _026) in the Test set, respectively, based on the Multiscale-resenext-50 model.

Fig. 21 shows a flow chart of a method of identifying a medical image according to an embodiment of the invention.

FIG. 22 shows a diagram of heatmap construction, according to an embodiment of the invention.

FIG. 23 is a diagram illustrating another heatmap construction, according to an embodiment of the invention.

FIG. 24 is a block diagram illustrating a convolutional neural network model training apparatus for medical image recognition, according to an exemplary embodiment.

FIG. 25 is a block diagram illustrating an apparatus for recognizing medical images according to an exemplary embodiment.

Detailed Description

Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many different forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art.

Furthermore, the described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to provide a thorough understanding of embodiments of the invention. One skilled in the relevant art will recognize, however, that the invention may be practiced without one or more of the specific details, or with other methods, components, devices, steps, and so forth. In other instances, well-known methods, devices, implementations or operations have not been shown or described in detail to avoid obscuring aspects of the invention.

The block diagrams shown in the figures are functional entities only and do not necessarily correspond to physically separate entities. I.e. these functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor means and/or microcontroller means.

The flow charts shown in the drawings are merely illustrative and do not necessarily include all of the contents and operations/steps, nor do they necessarily have to be performed in the order described. For example, some operations/steps may be decomposed, and some operations/steps may be combined or partially combined, so that the actual execution sequence may be changed according to the actual situation.

Fig. 1 shows a schematic diagram of an exemplary system architecture 100 of a convolutional neural network model training method or apparatus for medical pathology image recognition, or a medical pathology image recognition method or apparatus, to which an embodiment of the present invention may be applied.

As shown in fig. 1, the system architecture 100 may include one or more of

image capture devices

101, 102, 103,

networks

104 and 106, a server 105, and an image processing device 107. The network 104 serves as a medium for providing a communication link between the

image capturing devices

101, 102, 103 and the server 105, and the network 106 serves as a medium for providing a communication link between the server 105 and the image processing device 106.

Networks

104 and 106 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.

It should be understood that the number of image capturing devices, networks, servers, and image processing devices in fig. 1 is illustrative only. There may be any number of image capture devices, networks, servers, and image processing devices, as desired for implementation. For example, server 105 may be a cluster of multiple servers.

The

image capturing devices

101, 102, 103 may be various medical imaging devices, including but not limited to digital medical imaging devices, such as CT, MRI, ultrasound, nuclear medicine (PET, SPECT), etc., for taking pictures of the patient and obtaining medical images. The server 105 may be a server that provides various services (including storage).

The user may send medical images acquired by the

image acquisition devices

101, 102, 103 to the server 105 via the network 104. The server 105 stores the transmitted medical image. The image processing device 106 may extract and process the medical image from the server 105 via the network 106, train a convolutional neural network, or recognize the medical image using the trained convolutional neural network.

It should be noted that the computer system 200 of the electronic device shown in fig. 2 is only an example, and should not bring any limitation to the functions and the scope of the application of the embodiment of the present invention.

As shown in fig. 2, the computer system 200 includes a Central Processing Unit (CPU)201 that can perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)202 or a program loaded from a storage section 208 into a Random Access Memory (RAM) 203. In the RAM203, various programs and data necessary for system operation are also stored. The CPU201, ROM 202, and RAM203 are connected to each other via a bus 204. An input/output (I/O) interface 205 is also connected to bus 204.

The following components are connected to the I/O interface 205: an input portion 206 including a keyboard, a mouse, and the like; an output section 207 including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage section 208 including a hard disk and the like; and a communication section 209 including a network interface card such as a LAN card, a modem, or the like. The communication section 209 performs communication processing via a network such as the internet. A drive 210 is also connected to the I/O interface 205 as needed. A removable medium 211 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 210 as necessary, so that a computer program read out therefrom is mounted into the storage section 208 as necessary.

In particular, according to an embodiment of the present invention, the processes described below with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the invention include a computer program product comprising a computer program embodied on a computer-readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication section 209 and/or installed from the removable medium 211. The computer program executes various functions defined in the system of the present application when executed by a Central Processing Unit (CPU) 201.

It should be noted that the computer readable medium shown in the present invention can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present invention, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present invention, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units described in the embodiments of the present invention may be implemented by software, or may be implemented by hardware, and the described units may also be disposed in a processor. Wherein the names of the elements do not in some way constitute a limitation on the elements themselves.

As another aspect, the present application also provides a computer-readable medium, which may be contained in the electronic device described in the above embodiments; or may exist separately without being assembled into the electronic device. The computer readable medium carries one or more programs which, when executed by an electronic device, cause the electronic device to implement the method as described in the embodiments below. For example, the electronic device may implement the steps shown in fig. 3 to 6.

Before explaining the technical solutions of the embodiments of the present invention in detail, some related technical solutions, terms and principles are described below.

Convolutional Neural Network (CNN)

CNN is a multi-layered supervised learning neural network that deals with image-related machine learning problems.

A typical CNN consists of a convolutional layer (Convolution), a Pooling layer (firing), and a fully connected layer (FullyConnection). The low hidden layer generally consists of convolution layers and pooling layers, wherein the convolution layers are used for enhancing the original signal characteristics of the image and reducing noise through convolution operation, and the pooling layers are used for reducing the calculation amount while keeping the image rotation invariance according to the principle of image local correlation. The fully connected layer is located at the upper layer of the CNN, the input of the fully connected layer is a feature image obtained by feature extraction of the convolutional layer and the pooling layer, the output of the fully connected layer is a connectable classifier, and the input image is classified by adopting logistic regression, Softmax regression or Support Vector Machine (SVM).

The CNN training process generally adopts a gradient descent method to minimize a loss function, weight parameters of all layers in the network are reversely adjusted layer by layer through a loss layer connected behind a full connection layer, and the accuracy of the network is improved through frequent iterative training. The training sample set of CNN is usually composed of vector pairs in the form of "input vector, ideal output vector", and the weighting parameters of all layers of the network may be initialized with some different small random numbers before training is started. Because CNN can be regarded as an input-to-output mapping in nature, and a large number of input-to-output mapping relationships can be learned without any precise mathematical expressions between inputs and outputs, CNN can be trained with a training sample set composed of known vector pairs to have the capability of mapping between input-output pairs.

Softmax layer

The Softmax function maps multiple scalars into a probability distribution with each value range of the output being (0, 1). The softmax function is often used in the last layer of the neural network as the output layer for multi-classification.

Residual Neural Network (ResNet)

Typical network structures for CNNs include ResNet, AlexNet, VGGNet, GoogleNet, SENet, and the like.

Compared with other network structures, the ResNet is the most different in that it can set a bypass branch to connect the input directly into the layer behind the network, so that the layer behind the network can also directly learn the residual error. The method can solve the problem that the original information is lost more or less when the information is transferred by the traditional CNN, thereby protecting the integrity of the data.

ImageNet data set

The ImageNet dataset is a large visual database for visual object recognition software research. Image URLs in excess of 1400 million were manually annotated by ImageNet to indicate objects in the picture; a bounding box is also provided in at least one million images.

ASAP (automatic Slide Analysis Platform, automatic slice Analysis Platform)

ASAP is an open source platform for histopathology WSI (Whole slide image) analysis, which integrates functions of browsing, marking and the like. ASAP is built based on a number of mature open source software packages, such as OpenSlide, Qt, OpenCV, and the like.

TensorFlow

Tensorflow is a second generation artificial intelligence learning system developed by Google based on DistBerief, and the naming of the Tensorflow comes from the operation principle of the Tensorflow. Tensor means an N-dimensional array, Flow means computation based on a dataflow graph, and TensorFlow is a computation process in which tensors Flow from one end of the Flow graph to the other. TensorFlow is a system that transports complex data structures into artificial intelligent neural networks for analysis and processing.

TFRecord data format

TFRecord is a data format that may allow arbitrary data to be converted to formats supported by the tensrflow, which may make the data set of the tensrflow more easily compatible with network application architectures.

PNPoly algorithm

The algorithm is proposed by w. The idea of the algorithm is as follows: and (4) starting a ray from the point to be measured, and then judging the number of intersection points of the ray and the irregular area. If the number of the intersection points on the two sides of the point is odd, the point is in the polygon, otherwise, the point is outside the polygon. This algorithm works for any irregular pattern.

The principle and implementation details of the technical solution of the embodiments of the present invention are explained in detail below.

FIG. 3 is a flowchart illustrating a convolutional neural network model training method for medical image recognition, according to an exemplary embodiment. As shown in FIG. 3, the model training method may be performed by any computing device and may include the following

steps

310 and 350.

In step 310, the medical image is converted to the HSV color gamut.

HSV (Hue, Saturation) is a color space created by a.r. smith in 1978, also known as the hexagonal cone Model (Hexcone Model), based on the intuitive nature of color. The parameters of the colors in this model are: hue (H), saturation (S), lightness (V).

In an embodiment, before step 310, i.e. before converting the medical image to the HSV color gamut, the method may further comprise: the medical image is dye-normalized using a dye normalization algorithm.

The dyeing homogenization treatment is carried out on the medical image to avoid the problem of inconsistent sample coloring strength caused by a series of changing factors such as color strength, sample fading, cell substance extravasation and the like. In an example, a dye normalization algorithm is firstly adopted to perform dye homogenization on all full-slice scanning images (WSI), i.e., medical images (for example, in an Ndpi format), and this process can effectively improve the robustness of the subsequent region segmentation method based on the color gamut space.

As shown in fig. 4, the image staining normalization process includes the following steps 410-460.

In step 410, image I is converted from RGB space to RGB-OD (Optical Density) space.

In step 420, the pixels with optical density value (OD Intensity) less than β in RGB-OD space are removed.

In step 430, an image I of RGB-OD is subjected to SVD (Singular Value Decomposition) and projected onto a space R spanned by two maximum Singular vectors.

In step 440, each pixel is normalized by unit length in space R.

In step 450, α and 100- α are used as thresholds in space R to find the vectors with the largest and smallest phase angles as the H component and E component, respectively.

In step 460, the H and E components are converted to RGB-OD domains, resulting in a normalized H & E staining vector.

The normalized H & E stain vector is then transformed into the OD domain using the following equations (1) - (3), such that a stain-normalized WSI in RGB space can be obtained, as follows.

The idea of the color Deconvolution (color Deconvolution) algorithm is that the color of each pixel in the OD domain can be represented by a linear combination of stain vectors, the intensity of light transmitted through the sample slide according to Lambert-Beer law, the relationship between the intensity of sample staining and the absorption factor is represented by the following equation (0-1):

wherein, A is the action amount of the coloring agent, Ic is incident light, Io, c is emergent light, μ c is the light absorption coefficient, and subscript c represents one of RGB channels. It can be seen that in the color space model, the intensity of each color component is a non-linear function of the intensity of the staining of the sample, and cannot be directly used to quantify the staining component, and therefore, equation (0-1) needs to be transformed to the RGB-OD domain, i.e., equation (0-2) below:

the image space Ψ in the RGB-OD domain has the following relationship to the space defined by the staining components:

let H&E mixed dye image is I ═ C, Ψ_RGB) After it is transferred to the OD domain, it is denoted as I_ODRemoving pixel points with optical density value less than β in OD domain, projecting image pixel data to a plane opened by eigenvectors corresponding to two maximum singular values in OD domain, normalizing vector represented by each pixel point to unit length, and finding two vectors [ A ] represented by maximum phase angle and minimum phase angle in the plane by using α and 100- α as phase angle size percentage threshold values_H,A_E]The matrix form is shown in the following equation (1):

A_Hand A_EThe maximum staining intensity of the H component and E component in the input image are represented, respectively. Transforming matrix with HE as reference, and transforming A_HAnd A_EFrom the HE domain into the OD domain, a normalized staining vector can be obtained, as in equation (2) below:

using equations (0-3), the pathological tissue scan image normalized by staining intensity can be calculated as shown in equation (3) below.

The staining normalization algorithm balances the staining intensity between different samples based on a reference H & E (hematoxylin-eosin staining) Optical Density (OD) matrix.

The method uses a staining normalization algorithm to perform preliminary processing on WSI in a data set, and FIG. 5 shows a processing effect graph of a certain slice sample in a 512x512 region on level-1. Fig. 5 (a) shows a source image sample, and the upper graph in fig. 5 (a) shows blue-shifted nuclei and the lower graph shows red-shifted nuclei. Fig. 5 (b) shows an image obtained by subjecting the top and bottom images of fig. 5 (a) to a staining normalization algorithm. Comparing the two sets of images, it was found that the staining of the two processed images in (b) was more balanced than the two unprocessed images in (a).

After the images are dyed and homogenized, the obtained images in the RGB format are converted into HSV color gamut.

In addition, in the WSI processing process of the case image, the method can firstly adopt closed operation to fill up tissue cavities in a large tissue area, and then adopts open operation to eliminate external scattered tissue fragments caused by tearing and other factors in the sampling process in the WSI. The open operation is a process of performing erosion before expansion on the image, and the close operation is a process of performing expansion before erosion on the image.

In one embodiment, the images of the HSV color gamut may be subjected to a close process and an open process.

In step 320, the S channel in the HSV color gamut is binarized to obtain a binarized image.

An S-channel based tissue detection algorithm will be described below.

The tissue detection algorithm aims at extracting a tissue region in a WSI (medical image information) of a case picture, and comprises the consideration of two aspects of accuracy and efficiency, on one hand, the accurate outline of a tissue in the WSI is obtained, and irrelevant backgrounds such as cavities, cracks and the like in the tissue region are removed; on the other hand, the fuzzy tissue area under the large scale is utilized to prejudge the tissue area, and the tissue area detection is accelerated. To achieve the above purpose, it is necessary to obtain a coarse-grained and a fine-grained tissue region mask at the same time. The fine mask with fine granularity can be obtained by binarizing the original image, the fuzzy mask with coarse granularity needs further morphological processing on the basis of the former, the algorithms of color gamut space conversion, binarization processing (for example, OTSU (maximum inter-class variance method) threshold segmentation) and morphological processing are organically combined and experimentally analyzed, and the tissue region identification mask is generated on the basis of a single channel of an HSV space. The results are shown in FIG. 6.

Fig. 6 shows a front-back comparison diagram of binarization processing of a grayscale image and images of three channels HSV. Fig. 6 (a) shows a front-to-back comparison of OTSU on a gray scale; (b) showing a front-back comparison graph of OTSU for the H channel of the HSV color gamut; (c) showing a front-back comparison graph of OTSU for S channel of HSV color gamut; (d) a front-to-back comparison of OTSU for the V channel of the HSV gamut is shown.

As can be seen from fig. 6, after the binarization processing, the image contrast before and after the processing is greatly improved for both the grayscale image and the HSV three-channel image.

Comparing the regions marked by the rightmost oblique ellipses in the binarized images in (a) - (d) of fig. 6, it can be seen that the OTSU algorithm cannot well process the tissue holes and the tissue sparse regions in the WSI for the binarization of the H channel, and the OTSU algorithm can maximally erase the tissue sparse regions in the WSI when acting on the S channel. Therefore, through the color gamut conversion from RGB to HSV and the binarization processing of the separated S channel by using the OTSU algorithm, a fine tissue area mask image with fine granularity can be obtained.

In step 330, the binarized image is morphologically processed to obtain a region of interest.

In one embodiment, the morphological processing may include: expansion, corrosion, opening and closing treatment.

In one embodiment, the morphological processing of the binarized image to obtain the region of interest (tissue region) comprises: performing morphological processing on the binary image to obtain the outer contour of one or more tissue areas; generating a circumscribed rectangle frame for the outer contour of each tissue area; judging whether the area of each circumscribed rectangular frame is larger than a preset threshold value or not, if so, keeping the circumscribed rectangular frame, and if not, ignoring the circumscribed rectangular frame; judging whether a full inclusion relationship exists between any two external rectangular frames in the external rectangular frames, if so, keeping the external rectangular frame with a larger area in any two external rectangular frames, and neglecting the external rectangular frame with a smaller area in any two external rectangular frames; the finally obtained circumscribed rectangular frame is used as the region of interest.

Specifically, the S channel subjected to binarization processing may be opened and closed to obtain a coarse-grained fuzzy mask, so as to extract an outer contour of the tissue region, and generate a circumscribed rectangular frame for the outer contour detected in the WSI, which is shown in (c), (d), and (e) in fig. 8.

FIG. 8 illustrates the processing of a tissue detection algorithm according to an embodiment of the invention. As shown in fig. 8, where (a) shows a WSI, a binarization process is performed on the WSI to obtain a fine-grained mask shown in (b), an S channel of the fine-grained mask in (b) is opened and closed to obtain a coarse-grained mask shown in (c), then an outline of a tissue region is extracted from the coarse-grained mask, as shown in (d), and then a circumscribed rectangle frame is generated for the detected outline in the WSI, as shown in (e).

In order to reduce the broken and scattered detection area, a value-taking strategy may be designed, for example, using 100000 as a threshold, and performing an ignoring process on a rectangular box with an area smaller than the threshold. In addition, in the aspect of programming, for rectangular frames with different sizes, if all inclusion relations exist among the rectangular frames, the rectangular frame with a smaller area is ignored, and repeated sampling is avoided. The above-described processing is performed on the plurality of circumscribed rectangular frames in (e) of fig. 8, that is, the rectangular frames having smaller areas and being included in the large region are subjected to filtering processing, so that an optimized circumscribed rectangular frame shown in (f) is obtained, and the region defined by the optimized circumscribed rectangular frame is used as the identification region of interest (region of interest) ROI of the medical image.

In step 340, a preset target area mask is used to extract a positive sample and a negative sample in the region of interest.

In an embodiment, the preset target area mask includes a lesion area identification mask and a normal area identification mask, and the method may further include: calculating marking information in an xml format through a PNPoly algorithm to generate a lesion area identification mask; and generating a normal region identification mask based on the lesion region identification mask and the binarized image.

In one embodiment, calculating labeling information in xml format by PNPoly algorithm to generate lesion region identification mask may include: respectively generating a mask of a lesion area and a rejection mask by using the labeling information in two xml formats; subtracting the eliminating mask from the lesion area mask to obtain a lesion area identification mask; and the generating of the normal region identification mask based on the lesion region identification mask and the binarized image comprises: and subtracting the lesion area identification mask from the binarized image to obtain a normal area identification mask.

The labeling information may store vertex coordinates of a plurality of polygon labeling areas, and the method includes: and adjusting the priority of executing the PNPoly algorithm by each polygon marking area by taking the descending order of the areas of the polygon marking areas as a reference.

In the present embodiment, the target area mask is set in advance using the ASAP stage, but the present invention is not limited thereto, and other stages may be used as long as the setting of the target area mask can be achieved.

The ASAP stores the labeling information by using an extensible markup language XML, and the labeling information stores the vertex coordinates of each polygonal labeling area. FIG. 9 illustrates a structure of a markup file according to an embodiment of the present invention.

The ASAP source code does not consider the case of overlapping or embedding between polygons, and therefore, when the overlapping or embedding occurs between polygons, only two mask images can be output separately, which brings great inconvenience to the subsequent processing of the present WSI preprocessing system, as shown in (a) of fig. 10, which shows the mask output before optimization.

Since some diseased areas simultaneously contain a larger area of normal tissue. The pathologist needs to use the area labeling tool of the ASAP together to remove the normal tissue area in the pathology labeling area. Therefore, the invention optimizes the flow of the PNPoly algorithm of ASAP (see fig. 11 and 12, where fig. 11 shows a flow chart of a WSI preprocessing system based on mask segmentation according to an embodiment of the invention, and fig. 12 shows a schematic diagram of a PNPoly algorithm according to an embodiment of the invention), and adjusts the priority of execution of the PNPoly algorithm by each polygon based on the descending order of the area of the polygon labeled region, and the optimized algorithm can directly give out a mask image after normal tissues in a lesion region are removed. The optimized imaged labeling area mask is shown in fig. 10 (b).

The sample extraction strategy is explained below.

In an embodiment, a preset target area mask may be generated at a level-4 of an image pyramid of the medical image, and the positive sample and the negative sample may be extracted at a level-0 of the image pyramid of the medical image.

FIG. 13 shows a mask segmentation based tissue detection algorithm in accordance with an embodiment of the present invention. As shown in fig. 13, a cascaded tissue detector from low resolution to high resolution is constructed using inter-layer mapping of the image pyramid. In order to quickly determine the tissue area of the pathological image under limited memory resources, the extraction and conversion of the mask are completed under level-4, and samples are extracted under the magnification of level-0 and 40X through image pyramid coordinate mapping and are respectively marked as a positive sample and a negative sample. In the process, a positive sample and a negative sample are balanced, the positive sample is extracted, and then the negative sample is extracted randomly according to the number of the positive samples.

In practical situations, the sum of the area of the normal tissue is much larger than that of the lesion labeling area, so that the class distribution of the training data needs to be balanced in a balanced sampling manner, and the subsequent CNN training process is more stable. Therefore, a balanced sampling strategy based on the number of positive samples is adopted. The identification method adopts a digital medical image processing framework Openslide in a computer programming language python to carry out a series of processing such as loading and cutting on pathological images, and the positions of the pathological images in a pyramid level structure are described in different levels in the Openslide. Taking a positive sample (lesion area) as an example, in order to quickly determine a tissue area of a pathological image under limited memory resources, a downsampled image at a pyramid level (level-4, q ═ 24) where 16 times downsampling of an original image is located is loaded for analysis, the original image and an annotation mask are loaded at level-4, the original image is subjected to dyeing standardization and tissue detection, and a fuzzy mask (without considering a cavity or a crack in a tissue) of the tissue area is obtained. And scanning the marking mask under level-4 by using the scanning window reduced by the same down-sampling factor, mapping the scanning window to a level-0 layer if the white pixel proportion in the scanning window reaches a preset threshold value, performing tissue detection in the scanning window on the level-0 layer, and then counting the white pixel proportion to achieve the purpose of accurately eliminating background areas in tissues such as cavities, cracks and the like under the scale of fine granularity. The above extraction process is shown in fig. 13.

The sample extraction process adopts a non-overlapping mode to extract samples. According to an embodiment of the present invention, the extracted sample size is 256x 256. And packaging sample data by using a TFRecords format under a Tensorflow framework, wherein each piece of data comprises information such as an image, a label, coordinates, a source WSI (wireless sensor interface), a storage path and the like of the sample. This process is shown in FIG. 11 (WSI with lesion marked areas as an example).

In an embodiment, the method may further include: the image color parameters of the positive sample are randomly dithered and randomly horizontally and vertically flipped to expand the capacity of the positive sample. Optionally, the image color parameter may include at least one of brightness, contrast, hue, and saturation of the image.

In the process of extracting the sample based on the image block, random dithering within a certain threshold is carried out on image color parameters including brightness, contrast, hue and saturation, and random horizontal and vertical overturning are carried out, so that data enhancement can be carried out on the positive sample, and the capacity of the positive sample is expanded; and a fast mode and a slow mode are set in the process of randomly dithering the color parameters, and the fast mode only carries out random transformation on the brightness and the saturation so as to properly accelerate the preprocessing process. Alternatively, the color parameters may be randomly dithered in a slow mode. The slow mode comprises the adjustment of all four parameters such as brightness, contrast, hue, saturation and the like so as to obtain a larger adjustment range.

In step 350, the convolutional neural network model is trained using the positive and negative samples.

In one embodiment, the TFRecord may be used to store positive and negative samples and the multi-threaded input data processing framework in the tensflow is used to extract the positive and negative samples in the TFRecord. The TFRecord is used for uniformly storing the sample data, so that the multi-dimensional information in the sample can be effectively recorded, and the Input and Output (IO) times in the sample reading process are reduced. And extracting sample information in TFRecodes by using a multithread input data processing framework in Tensorflow, so that the efficiency of reading sample data can be improved.

The training the convolutional neural network model using the positive samples and the negative samples comprises: and adding shortcut connection between the feature map output layer and the global average pooling layer, so that the feature map output of each residual module directly acts on the global average pooling layer and the softmax layer.

The weight of the connection between the feature map output layer and the global average pooling layer is w_jAnd c, the method further comprises: the weight w is measured using the following equation (4)_jC, decomposition is performed so that paths with different numbers of residual blocks have independent weights:

wherein p is_cIs the output probability of class c, A_jAnd for the output of the jth node of the global average pooling layer, i and j are integers greater than 0, f is a mapping including all function mappings in the residual error modules, y is the output of the residual error modules, l represents that the output of each subsequent residual error module can be quickly connected to the global average pooling layer from the ith residual error module, the global average pooling layer performs output mapping and weighting calculation on each feature map output layer, k is the number of the residual error modules, and k is an integer greater than 0.

Deep learning model training will be described in detail below.

The characteristics of pathological elements in medical images are usually represented in the receptive fields of different scales, so the CNN for medical pathological image analysis should add adaptability to multi-scale features. Specifically, different numbers of 3 × 3 convolutional layers stacked on top of each other will result in different sized receptive fields. The features extracted from different sensitive fields have multi-scale characteristics, and the model should select the multi-scale features in the training process so as to reduce feature redundancy and improve the utilization efficiency of the model for the multi-scale features.

The ResNet is deeply known, and through the effect of the shortcut connection (skip connection), the ResNet can be represented as being implicitly formed by overlapping exponential different numbers of Residual modules (Residual blocks), and when there are n Residual blocks in the network, the ResNet can be regarded as an integration of 2n networks, as shown in fig. 15.

Based on the 3-layer Residual block shown in fig. 15, the input-output relationship thereof can be described using table 1 below.

TABLE 1

As shown in Table 1, 8 message flow paths exist in the network, and since convolutional layers in a Residual block (Residual block) are all 3x3 largeThe small convolution kernel, and therefore the fields formed by an equal number of Residual block stacks, are of the same size, and therefore the paths can be divided into 4 classes by the equivalent field size. Therefore, if the 4 route classifications in table 1 are given the weight w respectively_iThe network can be selectively adapted to the features extracted under different receptive fields. This weight w_iAnd representing the connection weight between the feature map output layer and the global average pooling layer in the model. Considering Residual block shown in FIG. 15, for the kth Residual block in ResNet, let the output be y_kIts input is from the output of the last Residualblock, set to y_k-1As in equation (5) below.

y ═ h (x) ═ F (x, { Wi }) + x equation (5)

This equation (5) can be rewritten as the following equation (6).

y_k＝y_k-1+f(y_k-1) Equation (6)

Wherein, the mapping f contains all function mappings in the Residual block. By generalizing the above equation to a general form, the output relationship of each Residual block in ResNet can be obtained, as shown in equation (7) below.

In the above formula, substituting l for k-l, the following equation (8) can be obtained:

in the ResNet model, a classifier consists of a global average pooling layer and a softmax layer, and the j-th node output of the global average pooling layer is set as A_j，p_cIs the output probability of class c, w_iAnd c is the weight in the transformation matrix between the averaged pooled output to the probability output, the following equation (9) can be obtained:

substituting equation (8) into equation (9), where a denotes an Average pooling layer (Average pool), the following equation (10) can be obtained:

as can be seen from the above equation, all information flow paths in ResNet have equal weight w_jC, to extend the multiscale adaptability of ResNet, for the weight w_jC is decomposed so that paths with different numbers of Residual blocks have independent weights, which can be rewritten as equation (11) above with equation (10) below:

the weight decomposition is equivalent to the network performing global average pooling on the feature map outputs from a plurality of previous layers, so that compared with the original structure, the fast connection between each feature map output layer and the global average pooling layer is increased, and the specific structure can be seen in fig. 15. The global average pooling layer output is added with depth information respectively representing information paths with different receptive field sizes, and meanwhile, the weight matrix output by the global average pooling layer to the probability output is also added with depth information to represent the output response of the model to the feature maps under different receptive fields, such as the following equation (12), which shows the multi-scale feature map weighted output:

the ResNet-50 network structure combined with the multi-scale feature expression and the corresponding equivalent receptive field size of each level are shown in the following Table 2. The network model described in Table 2 is referred to as Multiscale-ResNet-50 model. Starting from the last Residual block of Conv3_ x, the output of each subsequent Residual block can be quickly connected to the global average pooling layer, and the global average pooling layer performs output mapping and weighting calculation on each feature map output layer. W0 to W9 are mapping weight matrices between 10 signatures with different receptive fields to the global average pooling layer, such as the output signature of Conv3_ x and the output signatures of Residual blocks of Conv4_ x and Conv5_ x, respectively.

TABLE 2

The Multiscale-ResNeXt-50 model based on transfer learning and multi-scale features adopts ResNet-50 pre-trained on an ImageNet [54] data set as a pre-training model, the ResNet-50 is divided into four convolution blocks (Conv blocks) according to the depth of a convolution layer in a Residual Unit, and the system loads and solidifies pre-training parameters for the first two Conv blocks in the ResNet-50 to obtain shallow graphic features with strong robustness learned from big data training. And conv4_ x and conv5_ x are combined with the ResNeXt idea to carry out parallelization stack design on the paths, so that the expression capability of the model on high-level abstract features is increased. Weighted average pooling (weighted average pool) in the output layer represents weighted global average pooling corresponding to multi-scale feature expression. Figure 16 shows a block diagram of Multiscale-resenext-50, according to an embodiment of the invention. Table 3 below compares the structure of ResNet-50 with the Multiscale-ResNeXt-50 model proposed by the present system:

TABLE 3

In an embodiment, the method further includes cascading hard case mining: training the convolutional neural network model by using the positive sample and the negative sample to obtain a stage model; generating a prediction heat map using the staging model; carrying out sample re-extraction on misjudgment gathering areas in the prediction heat map in an overlapping mode; and further training the convolutional neural network model using the re-extracted samples and the previously extracted positive and negative samples.

A relatively high false positive rate is a common phenomenon in medical image recognition tasks. In the learning process of the convolutional neural network, processing the samples of the two labels in an equal manner can lead to the waste of iteration times on normal samples which do not have useful parameter updating, and the convergence time of the CNN is increased. The method comprises the steps of adding difficult cases into a training set again in a cascading network optimization mode, firstly training a CNN on an original training set, then generating a prediction heat map for WSI in the training set by adopting a converged network, carrying out sample re-extraction on a misjudgment (mainly false positive) gathering area generated on the prediction heat map, supplementing the misjudgment into the training set, and then carrying out fine model tuning on the expanded training set, so that the false positive rate of a model is reduced. The algorithm flow is shown in fig. 17.

For example, the Multiscale-ResNeXt-50 model may be trained using sample data to obtain a one-phase diagnostic model. The one-stage diagnosis model is utilized to generate a prediction heat map for the WSI, and the misjudgment gathering areas in the heat map are subjected to sample re-extraction in an overlapping mode to expand the original training set. And fine-tuning the Multiscale-ResNeXt-50 model by using the expanded training set so as to obtain a two-stage diagnosis model, so that the tissue analysis and recognition model can be realized.

The evaluation of the deep learning model will be specifically described below.

A one-stage diagnostic model was obtained according to the multiscale-ResNeXt-50 model described above. The adaptability of high-dimensional abstract features extracted by ImageNet to the study objects of the system is shown in FIG. 18, which is a schematic diagram of experimental analysis of ResNet-50 performance under different numbers of Residual blocks, and shows the influence of Residual blocks with different numbers of Residual blocks on ResNet-50 performance.

Taking a set of prostate image data sets as an example, it can be obtained from the comprehensive analysis of fig. 18 that when the solidified residalblock is greater than two, the model cannot fit the features of the set of prostate image data sets well due to the large difference of the high-dimensional features between different data sets, and the prediction performance of the network is reduced. Meanwhile, it can be seen that when the number of Residual blocks solidified by the model is less than two, the parameters of the first two blocks (blocks) in the network can be updated, which is equivalent to forgetting and relearning the low-dimensional features with strong robustness obtained by learning from ImageNet, and also causes loss to the accuracy to which the model can be converged.

Therefore, based on the results of fig. 18, the present system only solidifies the parameters of conv2_ x, conv3_ x, and reconstructs conv4_ x, conv5_ x to have a parallel path stacked structure, and further studies the accuracy of different cardinalities to ResNetXt-50. As in table 4 below, which shows the ResNet-50 accuracy at different cardinalities.

TABLE 4

Table 5 below gives a comparison of the performance of resenext-50 at different bases.

In both experiments with

cardinality

2 and 4, the performance of ResNeXt-50 was improved less compared to baseline (baseline), while the performance of ResNeXt-50 increased significantly as the number of parallel paths increased. It is worth noting that at radix of 16, ResNeXt-50 has achieved performance of 101 layers of ResNet, and on average forward efficiency, the model with 32 parallel paths only slightly exceeded ResNet-101, but the accuracy and AUC were improved by 2.35% and 3.28%, respectively. It is noted that the packet convolution depth set according to the radix makes ResNeXt-50 substantially consistent with the model parameters of baseline.

TABLE 5

On the basis of the above, the quick connection of the outputs of the respective Residual blocks of conv4_ x and conv5_ x to the average pooling layer is established. The classifier is enabled to automatically learn the contribution of the feature maps under different scale receptive fields to image classification, and the final network structure is shown in table 5, so that the network added with the multi-scale feature expression method has 10 shortcut connections more than the original network, and the increase of the parameter quantity brought by the network is negligible. Table 5 gives the equivalent receptive fields for each residalblock in the network for the input layer. We evaluated the Multiscale-resenext-50 model by way of comparison. Firstly, sampling on a mammary gland WSI with multiple magnifications (40X, 20X and 10X), training multiple models according to different input scales, and finally improving the performance of the models in a model fusion mode, considering that the fusion of the multiple models will cause the reduction of prediction efficiency and the multiplication of calculation cost, training two multi-scale models and fusing according to the two magnifications of 40X (the sample extraction magnification adopted herein) and 20X, and comparing the results with the multi-scale feature expression method of the system, wherein the results are shown in the following table 6. The model accuracy is hardly improved by adopting the magnification of 20X to extract the sample, and the model trained by adopting the two scales of 20X and 40X can be improved by about 2 percent based on the original model by adopting the fusion. However, since two models are simultaneously adopted, the prediction efficiency is greatly reduced. The Multiscale-ResNeXt-50 provided by the system replaces multi-model fusion on a sample level by a characteristic receptive field fusion mode, and the prediction performance of the model can be improved on the premise of not increasing the prediction efficiency of the original model. Compared with the multi-scale model fusion, the model accuracy and AUC are respectively improved by 3.74% and 3.96%.

TABLE 6

Fig. 19 (a) - (f) respectively show the training accuracy of neural network training, the area (Auc) covered by ROC (receiver operating characteristic curve), the accuracy, the recall rate, the number of false positive and false negative samples, and the variation trend of the number of false negative samples with the training iteration number.

With the pathological tissue morphology identification system proposed and constructed by the system, a prediction heat map can be generated for the WSI, and the prediction results of the location of a pathological region on a certain normal tissue WSI (Test _047) and a certain pathological tissue WSI (Test _026) in a Test set based on a Multiscale-ResNeXt-50 model are respectively shown in (a) and (b) of FIG. 20. Specifically, in (a) and (b) of fig. 20, the left graph shows the lesion WSI, and the right graph shows the prediction result thereof. The more red the corresponding pixel in the image (the more white the gray scale image shows), the less probability of lesion (as shown in fig. 2 (b)); conversely, the more blue the corresponding pixel in the image (the more black the gray scale image appears), the higher the probability of the lesion (see (a) of fig. 2).

The method of identifying medical images, i.e. predictive data visualization, will be explained in detail below.

Fig. 21 shows a flow chart of a method of identifying a medical image according to an embodiment of the invention. As shown in fig. 21, a method of recognizing a medical image according to an embodiment of the present invention may include the following steps.

In step 410, the medical image is converted to the HSV color gamut.

In step 420, the S channel in the HSV gamut is binarized to obtain a binarized image.

In step 430, the binarized image is morphologically processed to obtain a region of interest.

In step 440, a region of interest is identified using a convolutional neural network model trained according to the convolutional neural network model training method for medical image identification described above.

For example, the above-mentioned step 410-430 includes preprocessing the medical pathology image, namely, a series of means such as dyeing homogenization, RGB-to-HSV color gamut conversion, erosion kernel and expansion kernel processing, threshold value division, morphological processing and the like are adopted to extract the outline of the tissue region in the current medical pathological image, an external limit frame is used as an interested identification area ROI of the medical pathological image, a result of subtracting a marking mask and a removing mask is used as an identification mask of a lesion area, when the limit frame is scanned by a sliding window, firstly, removing a background area according to a tissue mask, dividing an image into a tissue area and a background area, meanwhile, a mask is generated by utilizing the label, the tissue area is further divided into a positive sample and a negative sample (a positive sample and a negative sample), the sample image data and the corresponding information thereof are packaged into TFrecord, the TFRecord provides a unified data storage format for the tensflow.

And calculating marking information in an xml format by using a PNPoly algorithm to generate a lesion marking area mask. And calculating and generating a normal region mask by combining the tissue region binary mask. And scanning a marked area mask and a normal area mask under the level-4 by utilizing the image pyramid inter-layer mapping, extracting samples under the amplification factors of the level-0 and 40X according to the mask information, and respectively marking the samples as positive samples and negative samples. In the process, an extraction strategy of positive and negative sample balance is adopted, firstly positive samples are extracted, and then the negative samples are randomly extracted according to the number of the positive samples. And (4) serializing the sample data by using the TFRecord, and recording multi-dimensional information such as images, labels, coordinates, source WSIs, storage paths and the like in the sample. The data information in the TFRecord is preloaded and the data set is divided into a training set and a test set in a ratio of, for example, 3:1, depending on the number of samples. For WSIs which cannot be determined consistently by pathologists, all WSIs can be classified into a training set and only positive samples are extracted for processing.

The step 410-.

In one example, firstly, for each digital medical pathological image, the paths of the marking mask and the removing mask are re-calibrated based on a file name matching method, all source digital medical pathological images and the attached masks thereof are automatically preloaded, digital medical pathological image files which cannot be read are marked, if the digital medical pathological image files exist, the medical pathological image files are removed, and if not, the next step is carried out.

The steps 410-430 are similar to the steps 310-330 in the convolutional neural network model training method for medical image recognition described in fig. 3, and therefore, the description thereof is omitted here.

In the above prediction data visualization step, the resolution of the sample medical pathological image is first obtained, then the area ratio of the tissue region in the medical pathological image is calculated by binarization, when the number of sample pixels is less than 50000 × 50000 or the area ratio is less than 30%, a pixel-based lesion prediction heat map (probability heat map) is constructed at Level-5 on the pyramid of the sample medical pathological image, and if none of the above conditions is true, a heat map is constructed on Level-7 to reduce the amount of calculation, as shown in fig. 22, which shows a diagram of heat map construction according to an embodiment of the present invention.

In the step of visualizing the predicted data and the process of generating the heat map, an all-zero initial matrix is constructed, the matrix element position is used as the pixel coordinate, the matrix element position is mapped to the coordinate on the pyramid Level-0 of the sample medical pathological image, the coordinate is used as the center, the sampling is carried out in the size of 256 × 256, in the sampling process, the tissue proportion in a sampling window is firstly calculated by using a sample tissue area mask obtained before sampling, the tissue sample is extracted by taking 20% as a threshold value, the sample is sequentially recorded and is serialized into TFRecord, and the probability value of the pixel is obtained through the forward propagation of the trained Multiscale-resext-50, as shown in fig. 23, which shows another constructed diagram of the heat map according to the embodiment of the invention.

In the step of visualizing the prediction data, after a probability distribution matrix is obtained, the probability distribution matrix is converted into a lesion area (for example, cancer) positioning identification map based on probability heat distribution.

The morphological identification system for the lesion tissue identification system based on the convolutional neural network is constructed and mainly comprises a digital medical pathological image WSI preprocessing module, a deep convolutional network prediction model module and a prediction data visualization module.

In an example, the resolution and the tissue area ratio information of the sample image to be predicted can be utilized to adaptively extract test samples with sliding windows with different overlapping proportions, and the test samples are mapped to the upper layer of the image pyramid to perform pixel-by-pixel lesion probability prediction.

A specific example will be given below to illustrate a convolutional neural network model training method for medical image recognition. The method mainly comprises the following steps or modules.

Step 1: digital medical pathology image preprocessing

S11: the method comprises the steps of performing color standardization on a digital medical image, converting RGB into HSV, further extracting an S channel in an HSV space, performing threshold segmentation processing, opening and closing morphological processing, obtaining a fine mask with fine granularity and a fuzzy mask with coarse granularity, extracting the outline of a tissue area in the current medical image, and dividing the image into the tissue area and a background area.

S12: and calculating marking information in an xml format by optimizing the PNPoly algorithm of the ASAP to generate a mask picture of the lesion marking area. And calculating and generating a normal region mask by combining the tissue region binary mask.

S13: and scanning a marked area mask and a normal area mask under the level-4 by utilizing the image pyramid inter-layer mapping, extracting samples under the amplification factors of the level-0 and 40X according to the mask information, and respectively marking the samples as positive samples and negative samples. In the process, an extraction strategy of positive and negative sample balance is adopted, firstly positive samples are extracted, and then the negative samples are randomly extracted according to the number of the positive samples. And the TFRecord is used for uniformly storing sample data, so that the multi-dimensional information in the sample is effectively recorded, and the IO times in the sample reading process are reduced.

Step 2: deep learning training assessment

S21: the color parameters of the positive sample image are randomly jittered and randomly turned horizontally and vertically, so that data enhancement is implemented and the capacity of the positive sample is expanded.

S22: and extracting sample information in TFRecodes by using a multithread input data processing framework in Tensorflow, and improving the efficiency of reading sample data.

S23: the Multiscale-ResNeXt-50 model is trained using sample data to obtain a one-stage diagnostic model. A first-stage diagnosis model is utilized to generate a prediction heat map for the WSI, and a sample is extracted from a misjudgment gathering area in the heat map in an overlapping mode, so that the original training set is expanded. The Multiscale-ResNeXt-50 model was fine-tuned using the extended training set. And acquiring a two-stage diagnosis model to realize a tissue analysis and identification model.

And step 3: predictive data visualization

S31: and acquiring the resolution ratio and the area ratio of the tissue area of the medical pathological image of the sample, and determining the resolution of the constructed heat map according to the two items of information and a preset threshold value.

S32: and mapping the element coordinates on the zero matrix to a square area on the pyramid Level-0 of the medical pathological image to be predicted, extracting a tissue area sample, and performing forward propagation by combining a trained model to obtain a probability value.

S33: and after the probability distribution matrix is obtained, converting the probability distribution matrix into a lesion area positioning identification map based on probability heat distribution.

Further, in the step S11:

removing a transparency channel from a source digital medical pathological image S to obtain an RGB image, converting the RGB image into an HSV space from the RGB space to obtain a binary image M of the digital medical pathological image₀Filling scattered cavities in a large tissue area by using an expansion nucleus, eliminating noise by using a corrosion nucleus, segmenting out independent tissue area elements, and finally extracting the outline of the tissue area in the medical pathological image and the tissue mask image M₁The background is distinguished from the tissue region. At the same time, at M₁And generating an external limit frame of each separated white block.

Further, in the step S12:

respectively generating pathological change area masks M by utilizing two xml labeling files_objAnd removing mask M_excHandle M_obj-M_excAs a result of (1) identifying the lesion area mask M_posHandle M₀-(M_obj-M_exc) As a result of the normal region identification mask M_neg. When extracting, recording information of medical pathological images of sample sources, sample numbers, sample labels, storage paths, sample center coordinates and the like, storing the information in a txt form, reading the txt information after finishing the medical pathological image scanning, reading samples and serializing the samples into TFRecord.

Further, in the step S13:

and uniformly storing sample data by using the TFRecord, and recording multi-dimensional information in the sample, wherein the multi-dimensional information comprises a sample image data file, a sample label, a central coordinate on an original image Level-0 coordinate system, a file name of a medical pathological image from a sample source, a sample file name, a sample storage path and the like.

Further, in the step S21:

the image color parameters of the positive sample such as brightness, contrast, hue and saturation are subjected to random dithering within a certain threshold value, random horizontal and vertical turning is carried out, data enhancement is implemented, and the capacity of the positive sample is expanded. Wherein, two modes of speed and speed are set in the process of randomly dithering the color parameters, and the fast mode only carries out random transformation on the brightness and the saturation so as to properly accelerate the preprocessing process.

Further, the step S21 can be replaced by a step S21', specifically, the color parameters are randomly dithered in a slow mode. The slow mode comprises the adjustment of all four parameters such as brightness, contrast, hue, saturation and the like so as to obtain a larger adjustment range.

Further, in the step S22:

the method comprises the steps of firstly obtaining a file list for storing sample data in a training TFRecord format, generating and maintaining an input file queue by using tf. Meanwhile, the data enhancement operation in step S21 runs in parallel in multiple threads through the mechanism provided by tf.

Further, in the step S23:

loading persistent model parameters trained by a Multiscale-ResNeXt-50 model, using an AdamaOptizer as an optimizer and a deep neural network model for training, and using area AUC under ROC curve, real rate TPR (TruePositive rate), accuracy (Precision) and Recall (Recall) as evaluation standards of the model when evaluating the network.

Further, in the step S31:

when the training model is used for prediction, the resolution information of the medical pathological image to be predicted is obtained first, and then the step S11 is performedMask M₀. And calculating the proportion of the white pixels of the whole image in the tissue area limit frame. When the number of sample pixels is less than 50000 multiplied by 50000 or the average area ratio is less than 30%, when any condition is met, constructing a pathological change prediction heat map based on the pixels on the Level-5 of the sample medical pathological image pyramid to obtain a good mark dividing rate, and if the condition is not met, constructing a prediction heat map on the Level-7 to reduce the calculated amount.

Further, in the step S32:

constructing an all-zero initial matrix M with the same dimensions as the prediction heat map in step S31_preMapping the matrix element position as a coordinate to the digital medical pathological image S in the step S11, analyzing a region of 256 × 256 nearby by taking a mapping point as a center, extracting the region in the sliding window as a sample to be calculated when the white pixel proportion in the region is more than 15%, recording each item of information of each sample by txt in the process, reading the txt after scanning is finished, reading the sample and serializing the sample into TFRecord. Obtaining probability values of each sample through the trained ResNet-50 forward propagation, and mapping and assigning the probability values to an initial zero matrix M_preOn the element of the corresponding position.

Matrix M_preAnd updating, wherein the value of each element in the matrix is the disease probability value of the 256 multiplied by 256 samples which are mapped to the central coordinate on the S coordinate system of the digital medical pathological image, so that the disease variable region probability distribution of the original medical pathological image is obtained, the probability value interval is (0,1), wherein the probability value is larger, which indicates that the disease possibility of the region is higher when the model judges the region. The probability matrix M_preConversion into RGB three-channel image, M_preThe smaller the value of the middle element, the more blue (appearing to be black in the gray scale) the corresponding pixel in the corresponding RGB image tends to be, whereas M tends to be_preThe larger the value of the middle element is, the more red the corresponding pixel in the RGB image is (the gray scale image shows to be white), thereby obtaining a heat image which can visually judge the affected area.

The invention discloses a medical pathological image analysis, identification and auxiliary diagnosis system based on a convolutional neural network, which comprises the following modules:

the digital medical pathological image preprocessing module: the method is used for color gamut space transfer, expansion and corrosion of the digital medical pathological image, obtaining a binary image, extracting the outline of a tissue area in the current medical pathological image, and dividing the image into the tissue area and a background area. Further, a sample identification mask is generated by using the marking information in the xml format, and a positive sample and a negative sample are respectively extracted from the divided tissue areas according to the pixel proportion of the sample identification mask identification area. And finally, uniformly storing sample data by using the TFRecord, and recording multi-dimensional information in the sample.

Deep learning training evaluation module: and extracting sample information in TFRecords by using a multithread input data processing framework in Tensorflow, and performing data enhancement on positive samples in TFRecords to expand the capacity of the positive samples. Based on the preprocessed samples as input, the original Multiscale-ResNeXt-50 is adopted as a neural network model, and the deep network model is trained and evaluated.

A predictive data visualization module: after the probability distribution matrix is obtained, the probability distribution matrix is converted into a lesion area positioning identification map based on probability heat distribution. The method for acquiring the probability distribution matrix comprises the steps of firstly determining the size of the resolution of a constructed heat map according to the resolution of a medical pathological image of a sample, two items of information of tissue area ratio and a preset threshold value, then mapping element coordinates on a zero matrix to a square area on a pyramid Level-0 of the medical pathological image to be predicted, extracting a tissue area sample, and carrying out forward propagation by combining a trained model to obtain the probability value distribution matrix.

Preferably, the digital medical pathology image preprocessing module comprises a tissue region extraction module, a multi-mask sample extraction module and a sample serialization module.

Further, the tissue region extraction module firstly removes a transparency channel from the source digital medical pathological image S to obtain an RGB image, and then converts the RGB image into an HSV space to obtain a binary image M of the digital medical pathological image₀Filling scattered cavities in a large tissue area by using an expansion nucleus, eliminating noise by using a corrosion nucleus, segmenting out independent tissue area elements, and finally extracting to obtain a medical pathological imageContour of the mesotissue region and tissue mask image M₁The background is distinguished from the tissue region. At the same time, at M₁And generating an external limit frame of each separated white block.

Further, the multi-mask sample extraction module respectively generates the masks M of the lesion area by using two xml markup files_objAnd removing mask M_excHandle M_obj-M_excAs a result of (1) identifying the lesion area mask M_posHandle M₀-(M_obj-M_exc) As a result of the normal region identification mask M_neg. When extracting, recording information of medical pathological images of sample sources, sample numbers, sample labels, storage paths, sample center coordinates and the like, storing the information in a txt form, reading the txt information after finishing the medical pathological image scanning, reading samples and serializing the samples into TFRecord.

Further, the sample serialization module uses a TFRecord to uniformly store sample data and records multi-dimensional information in the sample, wherein the multi-dimensional information comprises a sample image data file, a sample label, a central coordinate on an original image Level-0 coordinate system, a file name of a medical pathological image from the sample, a sample file name, a sample storage path and the like.

Preferably, the deep learning training evaluation module comprises a data enhancement module, a multi-thread input module and a neural network training module.

Further, the data enhancement module randomly jitters image color parameters such as brightness, contrast, hue and saturation of the positive sample within a certain threshold, randomly turns the image color parameters horizontally and vertically, enhances data and expands the capacity of the positive sample. Wherein, two modes of speed and speed are set in the process of randomly dithering the color parameters, and the fast mode only carries out random transformation on the brightness and the saturation so as to properly accelerate the preprocessing process.

It should be noted that the data enhancement module may also be replaced by a method of randomly dithering the color parameters in a slow mode. The slow mode comprises the adjustment of all four parameters such as brightness, contrast, hue, saturation and the like so as to obtain a larger adjustment range.

Further, the multithread input module first obtains a file list storing sample data in a training TFRecord format, generates and maintains an input file queue by using tf. Meanwhile, the data enhancement module runs in a plurality of threads in parallel through a mechanism provided by tf.

Further, the neural network training module adopts Multiscale-ResNeXt-50 as a neural network model, loads persistence model parameters based on ImageNet pre-training, updates parameters of all network layers, uses cross entropy with weight as a loss function, uses AdamaOptimizer as an optimizer, trains a deep neural network model, and uses area AUC under an ROC curve, a real rate TPR, accuracy Precision and Recall rate Recall as evaluation standards of the model when evaluating the network.

Preferably, the predictive data visualization module comprises a predictive heat map construction module, a forward propagation module, and a predictive heat map generation module.

Preferably, the prediction heat map construction module acquires resolution information of the medical pathological image to be predicted first, and then passes through a mask M in the tissue region extraction module₀. And calculating the proportion of the white pixels of the whole image in the tissue area limit frame. When any one condition that the number of pixels of a sample is less than 50000 multiplied by 50000 or the average area ratio is less than 30% is met, a lesion prediction heat map based on the pixels is constructed on a Level-5 of a pyramid of the sample medical pathological image to obtain a good mark dividing rate, and if the conditions are not met, a prediction heat map is constructed on a Level-7 to reduce the calculated amount.

Preferably, the forward propagation module constructs an all-zero initial matrix M having the same dimension as the prediction heat map_preMapping the matrix element position as coordinate onto the digital medical pathological image S, analyzing the nearby 256 × 256 area with the mapping point as center, and sliding when the white pixel ratio in the area is more than 15%The region in the window is extracted as a sample to be calculated, each item of information of each sample is recorded by txt in the process, the txt is read after the scanning is finished, and the sample is read and serialized into TFRecord. Obtaining probability values of each sample through the trained ResNet-50 forward propagation, and mapping and assigning the probability values to an initial zero matrix M_preOn the element of the corresponding position.

Preferably, the predictive heat map generating module, matrix M_preAnd updating, wherein the value of each element in the matrix is the disease probability value of the 256 multiplied by 256 samples which are mapped to the central coordinate on the S coordinate system of the digital medical pathological image, so that the disease variable region probability distribution of the original medical pathological image is obtained, the probability value interval is (0,1), wherein the probability value is larger, which indicates that the disease possibility of the region is higher when the model judges the region. The probability matrix M_preConversion into RGB three-channel image, M_preThe smaller the value of the middle element, the more blue (appearing to be black in the gray scale) the corresponding pixel in the corresponding RGB image tends to be, whereas M tends to be_preThe larger the value of the middle element is, the more red the corresponding pixel in the RGB image is (the gray scale image shows to be white), thereby obtaining a heat image which can visually judge the affected area.

Preferably, the medical pathology image recognition system based on the convolutional neural network is composed of a storage medium and computer equipment, wherein the storage medium is used for storing corresponding computer programs for realizing the functions of the three modules, and the computer equipment contains the medium. The program is written based on a python and Tensorflow framework under Linux and is divided into four parts of preprocessing, neural network training, neural network evaluation and heat map prediction generation. When the computer processor executes the program files of the four parts, the specific functions described by the digital medical pathological image preprocessing module, the deep learning training and evaluating module and the prediction data visualization module can be respectively realized.

The following are embodiments of the apparatus of the present invention that may be used to perform the above-described embodiments of the image processing method of the present invention. For details that are not disclosed in the embodiments of the apparatus of the present invention, please refer to the embodiments of the image processing method of the present invention.

FIG. 24 is a block diagram illustrating a convolutional neural network model training apparatus for medical image recognition, according to an exemplary embodiment. The model training device, as shown in fig. 9, includes but is not limited to: a transformation module 510, a binarization processing module 520, a region extraction module 530, a sample extraction module 540, and a training module 550.

The conversion module 510 is arranged to convert the medical image to an HSV gamut.

The binarization processing module 520 is configured to perform binarization processing on the S channel in the HSV color gamut to obtain a binarized image.

The region extraction module 530 is configured to perform morphological processing on the binarized image to obtain a region of interest.

The sample extraction module 540 is configured to extract positive and negative samples, respectively, within the region of interest using a preset target area mask.

The training module 550 is configured to train the convolutional neural network model using the positive samples and the negative samples.

FIG. 25 is a block diagram illustrating an apparatus for recognizing medical images according to an exemplary embodiment. The apparatus for recognizing medical images, as shown in fig. 25, includes but is not limited to: a conversion module 610, a binarization processing module 620, an area extraction module 630 and an identification module 640.

The conversion module 610 is arranged to convert the medical image to be identified to the HSV color gamut.

The binarization processing module 620 is configured to perform binarization processing on the S channel in the HSV color gamut to obtain a binarized image.

The region extraction module 630 is configured to perform morphological processing on the binarized image to obtain a region of interest.

The recognition module 640 is configured to recognize the region of interest using a convolutional neural network model trained using the convolutional neural network model training method for medical image recognition described above.

Embodiments of the present invention also provide a computer storage medium having a computer program stored thereon, where the computer program is executed by a processor to implement any one of the above-mentioned methods for training a convolutional neural network model for medical image recognition or a method for recognizing a medical image according to any one of the above-mentioned methods.

Compared with the prior art, the technical scheme of the invention has the beneficial effects that:

1. according to the method, the source high-resolution image data are preprocessed according to the labeling information, and efficient, accurate and rapid regional sample extraction and data sample automatic serialization are realized on the high-resolution medical pathological image samples through color gamut conversion, morphological processing and multi-mask information.

2. The invention adopts the technical scheme of only enhancing the data of the positive sample to obtain the training set with balanced positive and negative samples, and simultaneously weakens the factors irrelevant to image recognition. The generalization performance and the robustness of the neural network model are improved.

3. A method is provided for medical image preprocessing based on a binary mask. On one hand, an ASAP framework is applied to realize an improved XML tag information imaging algorithm and realize the conversion from text tag information to a target area identification mask. On the other hand, a dyeing homogenization algorithm, a maximum between-class variance algorithm and a series of morphological operations are experimentally verified, on the basis, a WSI (white Slide image) preprocessing system based on mask segmentation is realized, and sample extraction is performed on a tissue region and a lesion marking region of the WSI in a medical pathological image by combining image pyramid interlayer mapping, so that a strategy for performing balanced sampling on positive and negative samples is provided.

4. The invention adopts the TFRecord format to store sample data uniformly, records multidimensional information in the sample, extracts the sample information in the TFRecodes in a multithreading way by utilizing a multithreading input data processing framework in Tensorflow at the input end of the neural network, and greatly improves the efficiency of reading the sample data by the neural network.

5. The method combines the design ideas of transfer learning and ResNeXt parallel stacking residual paths, further improves a ResNet-50 model based on the characteristic that medical image features have multi-scale, enables the feature map output of each residual module to directly act on a classifier, enables the classifier to comprehensively consider the feature map output under the equivalent receptive field of each scale, and provides a Multiscale-ResNeXt-50 model structure by combining the improvements so as to improve the expression capability of a neural network on the medical image features.

6. In the testing process, the invention adaptively extracts the testing samples with different overlapping percentages by utilizing the information of the resolution ratio, the tissue area ratio and the like of the source sample image, so that the predicted heat maps under different source map resolutions can obtain the correct balance of the processing efficiency and the resolution fineness of the heat maps.

With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.

It should be noted that although in the above detailed description several modules or units of the device for action execution are mentioned, such a division is not mandatory. Indeed, the features and functionality of two or more modules or units described above may be embodied in one module or unit, according to embodiments of the present disclosure. Conversely, the features and functions of one module or unit described above may be further divided into embodiments by a plurality of modules or units. The components shown as modules or units may or may not be physical units, i.e. may be located in one place or may also be distributed over a plurality of network units. Some or all of the modules can be selected according to actual needs to achieve the purpose of the disclosed solution.

Through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments described herein may be implemented by software, or by software in combination with necessary hardware. Therefore, the technical solution according to the embodiment of the present invention can be embodied in the form of a software product, which can be stored in a non-volatile storage medium (which can be a CD-ROM, a usb disk, a removable hard disk, etc.) or on a network, and includes several instructions to enable a computing device (which can be a personal computer, a server, a touch terminal, or a network device, etc.) to execute the method according to the embodiment of the present invention.

Other embodiments of the invention will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the invention and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the invention being indicated by the following claims.

It will be understood that the invention is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the invention is limited only by the appended claims.

Claims

1. A convolutional neural network model training method for medical image recognition, the method comprising:

converting the medical image to an HSV color gamut;

carrying out binarization processing on an S channel in the HSV color gamut to obtain a binarized image;

performing morphological processing on the binary image to obtain an interested area;

respectively extracting a positive sample and a negative sample in the region of interest by using a preset target region mask; and

training the convolutional neural network model using the positive samples and the negative samples.

2. The method of claim 1, wherein the morphological processing comprises: expansion, corrosion, opening and closing treatment.

3. The method according to claim 1, wherein the morphologically processing the binarized image to obtain a region of interest comprises:

performing morphological processing on the binary image to obtain the outer contour of one or more tissue areas;

generating a circumscribed rectangle frame for the outer contour of each tissue region;

judging whether the area of each circumscribed rectangular frame is larger than a preset threshold value or not, if so, keeping the circumscribed rectangular frame, and if not, ignoring the circumscribed rectangular frame;

judging whether a full inclusion relationship exists between any two external rectangular frames in the external rectangular frames, if so, keeping the external rectangular frame with a larger area in any two external rectangular frames, and neglecting the external rectangular frame with a smaller area;

and using the finally obtained area defined by the circumscribed rectangle as the region of interest.

4. The method of claim 1, wherein the predetermined target area mask comprises a lesion area identification mask and a normal area identification mask, the method further comprising:

calculating marking information in an xml format through a PNPoly algorithm to generate the lesion area identification mask; and

and generating the normal region identification mask based on the lesion region identification mask and the binarized image.

5. The method of claim 4,

the calculating labeling information in xml format by PNPoly algorithm to generate the lesion region identification mask includes:

respectively generating a mask of a lesion area and a rejection mask by using the labeling information in two xml formats;

subtracting the elimination mask from the lesion area mask to obtain the lesion area identification mask; and

the generating of the normal region identification mask based on the lesion region identification mask and the binarized image includes:

and subtracting the lesion area identification mask from the binarized image to obtain the normal area identification mask.

6. The method of claim 4, wherein the label information stores vertex coordinates of a plurality of polygon label areas, the method comprising:

and adjusting the priority of executing the PNPoly algorithm by each polygon labeling area by taking the descending order of the areas of the polygon labeling areas as a reference.

7. The method of claim 1, wherein the preset target region mask is generated at level-4 of an image pyramid of the medical image, and the positive and negative samples are extracted at level-0 of the image pyramid of the medical image.

8. The method of claim 1, further comprising:

and randomly dithering the image color parameters of the positive sample, and randomly turning horizontally and vertically to expand the capacity of the positive sample.

9. The method of claim 8, wherein the image color parameters comprise at least one of brightness, contrast, hue, and saturation of the image.

10. The method of claim 1, further comprising: the positive and negative examples are stored using TFRecord and extracted using a multi-threaded input data processing framework in tensflow.

11. The method of claim 1, further comprising, prior to converting the medical image to the HSV color gamut: the medical image is dye-normalized using a dye normalization algorithm.

12. The method of claim 1, wherein training the convolutional neural network model using positive and negative samples comprises:

and adding shortcut connection between the feature map output layer and the global average pooling layer, so that the feature map output of each residual module directly acts on the global average pooling layer and the softmax layer.

13. The method of claim 12, wherein the weight of the connection between the profile output layer and the global average pooling layer is w_j,cThe method further comprises the following steps: for the weight w using the following equation_j,cThe decomposition is done so that paths with different numbers of residual blocks have independent weights:

14. The method of claim 1, further comprising:

training the convolutional neural network model by using the positive sample and the negative sample to obtain a stage model;

generating a prediction heat map using the staging model;

carrying out sample re-extraction on misjudgment gathering areas in the prediction heat map in an overlapping mode; and

15. A convolutional neural network model training apparatus for medical image recognition, comprising:

a conversion module configured to convert the medical image to an HSV color gamut;

the binarization processing module is used for carrying out binarization processing on the S channel in the HSV color gamut to obtain a binarization image;

the region extraction module is used for performing morphological processing on the binary image to obtain a region of interest;

the sample extraction module is used for respectively extracting a positive sample and a negative sample in the region of interest by utilizing a preset target region mask; and

a training module configured to train the convolutional neural network model using the positive samples and the negative samples.

16. A method of identifying a medical image, the method comprising:

converting the medical image to an HSV color gamut;

performing morphological processing on the binary image to obtain a region of interest;

identifying the region of interest using a convolutional neural network model trained using the convolutional neural network model training method for medical image identification according to any one of claims 1-14.

17. The method according to claim 16, wherein the identifying the region of interest using the convolutional neural network model trained using the convolutional neural network model training method for medical image recognition according to any one of claims 1 to 14 comprises:

acquiring the resolution of the medical image and the area ratio of a tissue region;

determining the resolution of the constructed heat map according to the resolution of the acquired medical image, the area ratio of the tissue region and a preset threshold;

extracting slices from the tissue region with overlap;

inputting the extracted slices into the convolutional neural network model for prediction, and forming a probability matrix by the probability of each slice obtained through prediction; and

a prediction heat map is generated based on the probability matrix.

18. An apparatus for recognizing a medical image, the apparatus comprising:

an identification module arranged to identify the region of interest using a convolutional neural network model trained using the convolutional neural network model training method for medical image identification according to any one of claims 1-14.

19. A computer storage medium on which a computer program is stored which, when being executed by a processor, carries out a convolutional neural network model training method for medical image recognition as set forth in any one of claims 1-14 or a method of recognizing a medical image as set forth in any one of claims 16-17.

20. An electronic device, comprising:

a processor; and

memory having stored thereon computer readable instructions which, when executed by the processor, implement the convolutional neural network model training method for medical image recognition of any one of claims 1-14 or the method of recognizing a medical image of any one of claims 16-17.