CN114743195A

CN114743195A - Thyroid cell pathology digital image recognizer training method and image recognition method

Info

Publication number: CN114743195A
Application number: CN202210384671.7A
Authority: CN
Inventors: 陈旭琳; 姚沁玥; 汪进; 陈睿
Original assignee: Severson Guangzhou Medical Technology Service Co ltd
Current assignee: Severson Guangzhou Medical Technology Service Co ltd
Priority date: 2022-04-13
Filing date: 2022-04-13
Publication date: 2022-07-12
Anticipated expiration: 2042-04-13
Also published as: CN114743195B

Abstract

The embodiment of the application provides a thyroid cell pathology digital image recognizer training method, an image recognition method, training equipment, an image recognition device and a computer readable storage medium, wherein the thyroid cell pathology digital image recognizer comprises a classifier and a target detector, and the training method comprises the following steps: acquiring a first image set for training a thyroid cell pathology digital image recognizer; training a classifier using a first loss value corresponding to the first set of images; the target detector is trained with the second loss value in response to determining from the labels that the image block input to the classifier is a first type of image block. The thyroid cell pathology digital image recognizer training method provided by the embodiment of the application can realize training of the thyroid cell pathology digital image recognizer under the condition that only the type or the content of the image block is marked and the position of the positive cell is not marked, so that the thyroid cell pathology digital image recognizer can judge whether the positive cell exists in the image or not and can position the positive cell in the image.

Description

Thyroid cell pathology digital image recognizer training method and image recognition method

Technical Field

The application relates to the field of digital image processing, in particular to a thyroid cell pathology digital image recognizer training method and an image recognition method.

Background

The fine needle puncture cell method is an important means for doctors to diagnose whether the nodule is benign or malignant. The process of the method is that a doctor pierces a lesion area by using a needle, cuts cells or small tissue blocks in a focus by lifting a needle, makes a cell slide after slide making and staining, reads the slide under a microscope and finishes pathological diagnosis. However, searching positive cells from a cell slide and judging the type of a lesion is very challenging work, the requirement on the professional ability of a pathologist is high, and the pathologist is easy to have reading fatigue. Therefore, it is necessary to make the cytology slide into a cytology slide digital image, then use the digital image processing technology to locate and classify the positive cells in the cytology slide digital image, and give the diagnosis suggestion of the whole slice image, thereby effectively improving the diagnosis efficiency of the doctor and reducing the misjudgment rate of the doctor.

However, to locate the positive cells in the image by using the digital image processing technology, a large number of samples labeled with the positions of the positive cells are required to train the recognizer, and the labeling cost is high.

Disclosure of Invention

In view of this, embodiments of the present disclosure provide a training method, an image recognition method, a training device, an image recognition apparatus, and a computer-readable storage medium for a thyroid cell pathology digital image recognizer, which can train the thyroid cell pathology digital image recognizer under the condition that only the type or content of an image block is labeled but the position of a positive cell is not labeled, so that the thyroid cell pathology digital image recognizer can not only determine whether the image has the positive cell, but also locate the positive cell in the image.

In a first aspect, an embodiment of the present application provides a thyroid cell pathology digital image recognizer training method, where the thyroid cell pathology digital image recognizer includes an object detector and a classifier, the method includes:

a first image set acquisition step of acquiring a first image set for training a thyroid cell pathology digital image recognizer; the image block processing method comprises the steps that a first image set comprises image blocks with labels, the labels are used for indicating contents contained in the image blocks, the first image set comprises a first type of image blocks and a second type of image blocks, the contents of the first type of image blocks comprise positive cells, and the contents of the second type of image blocks comprise negative cells and negative environments;

a classifier first training step of training a classifier using a first loss value corresponding to the first image set; the classifier is configured to receive the image block carrying the label and output a first feature map and a classification confidence corresponding to the image block; the classification confidence is used for indicating that the image block input into the classifier is a first type image block or a second type image block; determining a first loss value corresponding to the first image set according to the classification confidence and the label carried by the image block;

a target detector training step of training a target detector by using a second loss value in response to the fact that the image block input to the classifier is the first type image block according to the label; the target detector is configured to receive the first feature map output by the classifier and output a second feature map; the elements of the second feature map correspond to candidate regions in the image block, and the values of the elements correspond to the probability that the candidate regions contain positive cells; and the second loss value is determined according to the classification confidence degree obtained by inputting the sub image blocks generated by the candidate area into the classifier, the labels carried by the image blocks corresponding to the sub image blocks and the elements of the second feature map.

In a second aspect, an embodiment of the present application provides an image recognition method, where the method is performed by a thyroid cell pathology digital image recognizer, the thyroid cell pathology digital image recognizer includes a classifier and an object detector, and the method includes:

acquiring an image block; the image blocks comprise a first type of image blocks and a second type of image blocks, the content of the first type of image blocks comprises positive cells, and the content of the second type of image blocks comprises only negative cells and a negative environment;

inputting the image block into a classifier to obtain a first feature map; the first feature map corresponds to an image block of the input classifier, and the classification confidence is used for indicating that the image block of the input classifier is a first type image block or a second type image block;

inputting the first characteristic diagram into a target detector to obtain at least one second characteristic diagram;

determining a classification confidence according to the first feature map and the at least one second feature map;

and after all elements of at least one second feature map are normalized and subjected to non-maximum value inhibition, taking a candidate region corresponding to the element with the value larger than the first set value as a region containing positive cells in the image block.

In a third aspect, an embodiment of the present application provides a training apparatus, including at least one control processor and a memory, which is communicatively connected to the at least one control processor; the memory stores instructions executable by the at least one control processor to enable the at least one control processor to perform a thyroid cytopathology digital image recognizer training method as in the first aspect.

In a fourth aspect, an embodiment of the present application provides an image recognition apparatus, including at least one control processor and a memory, wherein the memory is in communication connection with the at least one control processor; the memory stores instructions executable by the at least one control processor to enable the at least one control processor to perform the image recognition method of the second aspect.

In a fifth aspect, the present application provides a computer-readable storage medium, wherein the computer-readable storage medium stores computer-executable instructions for causing a computer to execute the thyroid cell pathology digital image recognizer training method as in the first aspect or the image recognition method as in the second aspect.

The embodiment of the application comprises the following steps: a thyroid cell pathology digital image recognizer training method, an image recognition method, training equipment, an image recognition device and a computer readable storage medium. According to the scheme provided by the embodiment of the application, the labels carried by the image blocks are only labeled with the types of the image blocks and are not labeled with the positions of the positive cells in the image blocks, the image blocks are utilized to train the classifier, and the output of the classifier is utilized to train the target detector, so that the thyroid cell pathology digital image recognizer can classify the types of the image blocks through the classifier and can also position the positive cells in the image blocks through the target detector.

Drawings

The accompanying drawings are included to provide a further understanding of the claimed subject matter and are incorporated in and constitute a part of this specification, illustrate embodiments of the subject matter and together with the description serve to explain the principles of the subject matter and not to limit the subject matter.

The present application is further described with reference to the following figures and examples;

FIG. 1 is a digital image of a cytology slide made from a cytology slide;

FIG. 2 is a block of images taken from a digital image of a cytological slide;

FIG. 3 is a block diagram of a thyroid cell pathology digital image identifier according to an embodiment of the present application;

FIG. 4 is a candidate area framed from an image block;

FIG. 5 is a flowchart illustrating the steps of a thyroid cell pathology digital image recognizer training method according to an embodiment of the present application;

FIG. 6 is a flowchart illustrating the steps of a thyroid cell pathology digital image recognizer training method according to an embodiment of the present application;

FIG. 7 is a flowchart illustrating the steps of a thyroid cell pathology digital image recognizer training method according to an embodiment of the present application;

FIG. 8 is a schematic block diagram of a thyroid cell pathology digital image identifier according to another embodiment of the present application;

FIG. 9 is a schematic diagram of a workflow of a thyroid cell pathology digital image identifier provided by an embodiment of the present application;

FIG. 10 is a flowchart illustrating the steps of a thyroid cell pathology digital image recognizer training method according to another embodiment of the present application;

FIG. 11 is a flowchart illustrating the steps of a thyroid cell pathology digital image recognizer training method according to another embodiment of the present application;

FIG. 12 is a schematic diagram of a target detector structure and a workflow provided by an embodiment of the present application;

FIG. 13 is a flowchart illustrating the steps of a thyroid cell pathology digital image recognizer training method according to another embodiment of the present application;

FIG. 14 is a flowchart illustrating the steps of a thyroid cell pathology digital image recognizer training method according to another embodiment of the present application;

FIG. 15 is a flowchart illustrating the steps of a thyroid cell pathology digital image recognizer training method according to another embodiment of the present application;

FIG. 16 is a flowchart illustrating the steps of a thyroid cell pathology digital image recognizer training method according to another embodiment of the present application;

FIG. 17 is a flowchart illustrating the steps of a thyroid cell pathology digital image recognizer training method according to another embodiment of the present application;

FIG. 18 is a flowchart illustrating the steps of a thyroid cell pathology digital image recognizer training method according to another embodiment of the present application;

FIG. 19 is a schematic diagram of a training apparatus according to an embodiment of the present application;

fig. 20 is a schematic structural diagram of an image recognition apparatus according to an embodiment of the present application.

Detailed Description

The present application is further described with reference to the following figures and specific examples. The described embodiments should not be considered as limiting the present application, and all other embodiments obtained by a person skilled in the art without making any inventive step are within the scope of protection of the present application.

In the description of the present application, if there are first, second, third and fourth described only for the purpose of distinguishing technical features, they are not to be understood as indicating or implying relative importance or implicitly indicating the number of indicated technical features or implicitly indicating the precedence of the indicated technical features. It should be noted that although functional blocks are partitioned in a schematic diagram of an apparatus and a logical order is shown in a flowchart, in some cases, the steps shown or described may be performed in a different order than the partitioning of blocks in the apparatus or the order in the flowchart.

In the following description, reference is made to "some embodiments" which describe a subset of all possible embodiments, but it is understood that "some embodiments" may be the same subset or different subsets of all possible embodiments, and that the embodiments provided herein may be combined with each other without conflict.

Unless defined otherwise, terms such as defining, arranging, mounting, connecting and the like are to be broadly construed, and all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs, and the specific meaning of the above terms in this application can be reasonably determined by one of ordinary skill in the art in view of the details of the technical solution. The terminology used herein is for the purpose of describing embodiments of the present application only and is not intended to be limiting of the application.

The embodiments of the present application will be further explained with reference to the drawings.

The fine needle puncture cell method is an important means for doctors to diagnose the quality and malignancy of nodules. The process of the method is that a doctor pierces a lesion area by using a needle, cuts cells or small tissue blocks in a focus by lifting a needle, makes a cell slide after slide making and staining, reads the slide under a microscope and finishes pathological diagnosis. However, finding positive cells from a cell slide and determining the type of a lesion is a very challenging task, has high requirements on the professional ability of a pathologist, and is easy to cause reading fatigue for the pathologist. Therefore, it is necessary to make the cytology slide into a cytology slide digital image, then use the digital image processing technology to locate and classify the positive cells in the cytology slide digital image, and give the diagnosis suggestion of the whole slice image, thereby effectively improving the diagnosis efficiency of the doctor and reducing the misjudgment rate of the doctor.

In order to solve the problem of positioning positive cells in an image block and reduce labeling cost, the embodiment of the application provides a training method, the training method is used for training a thyroid cell pathology digital image recognizer provided by the embodiment of the application, and the thyroid cell pathology digital image recognizer comprises a classifier and an object detector. In the inference process, classifying the type of the image block through a classifier so as to judge the type of a lesion, and determining a candidate region containing positive cells through a target detector so as to position the positive cells in the image block; in the training process, the target detector is trained by utilizing the output of the classifier, so that the training of the target detector can be completed under the condition that only the type of the image block is labeled but the position of the positive cell in the image block is not labeled, and the target detector can position the positive cell in the image block by determining a candidate area containing the positive cell. FIG. 1 is a digital image of a cytology slide made from a cytology slide. Since the size of the cytopathology digital image is generally ten thousand resolution, in some cases, the cytology digital image needs to be divided into a plurality of image blocks as shown in fig. 2, the thyroid cytopathology digital image recognizer provided by the embodiment of the present application determines whether a lesion is malignant or not by determining whether positive cells are contained in the image blocks, and provides the location of the positive cells in the image blocks, and the type of the lesion is verified according to the located positive cells.

Example one

Fig. 3 is a schematic structural diagram of modules of a thyroid cell pathology digital image identifier provided in an embodiment of the present application, where the thyroid cell pathology digital image identifier includes a classifier and an object detector. The classifier is configured to receive an image block, perform linear transformation and nonlinear transformation on the received image block and output a first feature map and a classification confidence coefficient; the image blocks may be a first type image block or a second type image block, the content of the first type image block includes positive cells, and the content of the second type image block includes only negative cells and a negative environment; the first feature map corresponds to the image block input to the classifier, that is, the first feature map corresponds to the image block input to the classifier, and the classification confidence is used to indicate that the image block input to the classifier is the first type image block or the second type image block. The target detector is configured to receive the first characteristic diagram and output a second characteristic diagram after linear transformation and nonlinear transformation are carried out on the received first characteristic diagram; wherein, the elements in the second feature map correspond to the candidate regions in the image block, that is, the position of each element in the second feature map corresponds to the coordinate of a candidate region in the image block and the size of the candidate region, and the value of the element in the second feature map corresponds to the probability that the candidate region contains positive cells. In some embodiments, the candidate area is an area framed in the image block as shown in FIG. 4.

Referring to fig. 5, the training method for the above thyroid cell pathology digital image recognizer in this embodiment includes, but is not limited to, step S110, step S120, and step S130:

step S110: acquiring a first image set for training a thyroid cell pathology digital image recognizer; wherein the first image set comprises image blocks carrying labels indicating the content comprised by the image blocks.

This step may be referred to as a first image set acquisition step. The first image set comprises a first type of image blocks and a second type of image blocks, the content of the first type of image blocks comprises positive cells, and the content of the second type of image blocks comprises only negative cells and a negative environment.

Step S120: a classifier is trained using a first loss value corresponding to the first set of images.

This step may be referred to as the classifier first training step.

For example, the first loss value may be an output value of a cross entropy loss function or an output value of a mean square error loss function. In some embodiments, the first loss function corresponding to the first set of images is a cross entropy loss function as follows:

wherein the content of the first and second substances,

is a first loss value corresponding to the first image set, the base of log is e,

are labels of image blocks I in the first image set,

representing the image block I as an image block of the first type,

representing image block I as a second type of image block, p_iThe classification confidence that the image block I is a first type image block is represented, i.e. the probability that the image block I is a first type image block.

In some embodiments, the labels carried by the image patches are used to indicate the content contained by the image patches, the content of the first type of image patches includes positive cells, wherein the positive cells include follicular tumor cells, eosinophilic tumor cells, papillary thyroid carcinoma cells, medullary thyroid carcinoma cells, atypical cells of undefined significance, and the like; the contents of the second type of image block only include negative cells including benign follicular nodule cells, erythrocytes, lymphocytes, phagocytes, parietal cells, etc., and negative environments including various environments such as blood, dust of glass sections, blanks, etc. It should be noted that the content of the second type of image blocks does not include positive cells. In these embodiments, the first loss function corresponding to the first set of images is a cross-entropy loss function as follows:

wherein the content of the first and second substances,

is a first loss value corresponding to the first image set, N is the number of samples, i.e. the number of image blocks in the first image set, N represents the nth image block, K is the number of classes,

is the label of the nth image block I, if the category label of the image block I is c

Otherwise

Is the classification confidence that the image block I belongs to the class c, i.e. the probability that the image block I belongs to the class c, i.e. the elements in the first confidence vector.

In these embodiments, the category of the image block is negative when the content of the image block does not include positive cells or only includes negative cells and/or a negative environment, when c ═ 0, the image block is a second type image block;

when the content of the image block comprises follicular tumor cells, the category of the image block is follicular tumor, and when c is 1, the image block is a first-type image block;

when the content of the image block comprises eosinophilic tumor cells, the category of the image block is eosinophilic tumor, and when c is 2, the image block is a first type image block;

when the content of the image block comprises papillary thyroid carcinoma cells, the category of the image block is papillary thyroid carcinoma, and when c is 3, the image block is a first category image block;

when the content of the image block comprises medullary thyroid carcinoma cells, the type of the image block is medullary thyroid carcinoma, and when c is 4, the image block is a first type image block;

the category of the image block is atypical when the content of the image block includes atypical cells of undefined meaning, when c is 5, the image block being a first-type image block.

In some embodiments, the classifier includes a backbone convolutional neural network and a first fully-connected layer, and step S120 may include: inputting the image block with the label into a backbone convolutional neural network, and performing linear transformation and nonlinear transformation on the image block through the backbone convolutional neural network to obtain a first characteristic diagram corresponding to the image block; averaging the matrix of each channel in the first characteristic diagram corresponding to the image block (averaging the matrix means summing all elements in the matrix and dividing the sum by the number of the matrix elements) to obtain a characteristic vector corresponding to the image block; inputting the feature vector corresponding to the image block into a first full-connection layer, and performing nonlinear transformation on the output of the first full-connection layer to obtain a first confidence coefficient vector; wherein the elements in the first confidence vector are classification confidence, the classification confidence being a probability for indicating that the content in the image patch includes follicular tumor cells, or eosinophilic tumor cells, or papillary thyroid carcinoma cells, or medullary thyroid carcinoma cells, or atypical cells with ambiguous meanings, or negative cells and a negative environment, the classification confidence being usable for indicating that the image patch input to the classifier is a first type image patch or a second type image patch; and determining an output value of a first loss function corresponding to the first image set according to the elements in the first confidence coefficient vector and the labels carried by the image blocks, and optimizing parameters of the backbone convolutional neural network and the first full-link layer by using a back propagation algorithm according to the output value of the first loss function, namely the formula (2). The output value of the first loss function corresponding to the first image set is the first loss value corresponding to the first image set.

Step S130: the target detector is trained with the second loss value in response to determining from the labels that the image block input to the classifier is a first type of image block.

This step may be referred to as a target detector training step.

The target detector is configured to receive the first feature map output by the classifier and output a second feature map; the elements of the second feature map correspond to candidate regions in the image block, and the values thereof correspond to the probability that the candidate regions contain positive cells; the second loss value is determined according to the classification confidence degree obtained by inputting the sub image blocks generated by the candidate region into the classifier and the elements of the second feature map.

Before a target detector is trained by utilizing the output of a classifier and a second loss function, whether an image block input into the classifier currently is a first-class image block needs to be judged according to a label carried by the image block, if the judgment result is yes, a first feature map and a classification confidence coefficient generated after the image block is input into the classifier are utilized to train the target detector; and if the judgment result is negative, the target detector is trained without utilizing the first feature map and the classification confidence coefficient generated after the image block is input into the classifier.

In some embodiments, the target detector is a convolutional neural network, and step S130 may include: judging whether the image block currently input into the classifier is the first type image block according to the label carried by the image block, if so, executing the following steps: inputting the first feature map corresponding to the image block into a target classifier, and performing linear transformation and nonlinear transformation on the first feature map corresponding to the image block through the target classifier to obtain a second feature map; sorting all elements in the second feature map according to the numerical values and carrying out non-maximum suppression to obtain a plurality of elements, and generating a plurality of sub image blocks according to candidate areas corresponding to the plurality of elements; respectively inputting the plurality of sub image blocks into a classifier to obtain classification confidence coefficients respectively corresponding to the plurality of sub image blocks; and determining an output value of a second loss function according to the classification confidence degrees corresponding to the plurality of sub-image blocks, the labels carried by the image blocks corresponding to the plurality of sub-image blocks and the numerical values of the elements in the second feature map, and optimizing the parameters of the target detector by using a back propagation algorithm according to the output value of the second loss function. The output value of the second penalty function is the second penalty value.

In other embodiments, step S130 may include: inputting the first feature map output by the classifier into a target detector, and performing linear transformation and nonlinear transformation on the first feature map output by the classifier through the target detector to obtain at least one second feature map; sequencing all elements in at least one second feature map according to the numerical values and carrying out non-maximum suppression to obtain a plurality of elements, and generating a plurality of sub image blocks according to candidate areas corresponding to the plurality of elements; inputting a plurality of sub image blocks into a backbone convolutional neural network, and then taking a mean value to obtain a plurality of feature vectors; the plurality of feature vectors correspond to the plurality of sub image blocks one by one; inputting the characteristic vector into a third full-connection layer, and carrying out nonlinear transformation on the output of the third full-connection layer to obtain a second confidence coefficient vector; wherein, the elements in the second confidence coefficient vector are classification confidence coefficients; and determining the output value of the second loss function according to the numerical values of the elements in the second confidence coefficient vector and the elements in the second feature map, and optimizing the parameters of the target detector by using a back propagation algorithm according to the output value of the second loss function. The output value of the second penalty function is the second penalty value.

In some embodiments, the sub-functions of the second loss function are as follows:

wherein n represents the n-th image block, R'_iRepresenting the ith sub image block, M representing the number of sub image blocks，

Is a sub image block R'_iThe confidence of the classification belonging to the class c,

is equal to sub image block R 'in the second feature map'_iNumerical value of the corresponding element, W_dRepresenting parameters of the target detector. In that

In (3), max represents the maximum value in the demot. When the temperature is higher than the set temperature

Namely, sub image block R'_iThe classification confidence belonging to the class c is less than that of the subimage block R'_jThe objective of the optimization of the penalty function is to make the confidence of the classification belonging to class c

I.e. the result of the optimization is to make the elemental values of the second profile of the target detector output

The second penalty function can be written as follows:

wherein L is_dectThe value of (a) is the second loss value,

the number of image blocks of the first type in the first image set. It is understood that formula (4) is a superposition of formula (3).

In these embodiments, the category of a sub image block is negative when the content of the sub image block does not include positive cells or only includes negative cells and/or a negative environment, when c is 0, the image block corresponding to the sub image block is a second type image block; when the content of the sub image block comprises follicular tumor cells, the category of the image block is follicular tumor, and when c is 1, the image block corresponding to the sub image block is a first-type image block; when the content of the sub image block comprises eosinophilic tumor cells, the category of the sub image block is eosinophilic tumor, and when c is 2, the image block corresponding to the sub image block is a first type image block; when the content of the sub image block comprises papillary thyroid carcinoma cells, the category of the sub image block is papillary thyroid carcinoma, and when c is 3, the image block corresponding to the sub image block is a first-type image block; when the content of the sub image block comprises medullary thyroid carcinoma cells, the category of the sub image block is medullary thyroid carcinoma, and when c is 4, the image block corresponding to the sub image block is a first category image block; when the content of the sub image block includes an atypical cell with ambiguous meaning, the category of the sub image block is atypical, and when c is 5, the image block corresponding to the sub image block is the first type image block.

In order to further improve the classification accuracy of the classifier, as shown in fig. 6, the thyroid cell pathology digital image recognizer training method provided in the embodiment of the present application may further include the following steps:

step S210: inputting a first feature map corresponding to the first type of image or the second type of image block into a target detector to obtain a second feature map, sequencing elements in the second feature map according to the numerical values and inhibiting non-maximum values to obtain top-M elements, and generating top-M sub-image blocks according to candidate areas corresponding to the top-M elements;

this step may be referred to as a sub image block generating step. In this step, the first feature map generated by either the first type image block or the second type image block is input to the target detector to generate top-M sub-image blocks.

Step S220: training a classifier with a first loss value corresponding to top-M sub image blocks;

the step can be called a second training step of the classifier, wherein a first loss value corresponding to the top-M sub image blocks is determined according to a classification confidence obtained by inputting the top-M sub image blocks into the classifier and labels carried by the image blocks corresponding to the top-M sub image blocks. In some embodiments, the classifier further includes a third fully connected layer, and step S220 may include:

inputting one sub image block in top-M sub image blocks into a backbone convolutional neural network, and performing linear transformation and nonlinear transformation on the one sub image block through the backbone convolutional neural network to obtain a first characteristic diagram corresponding to the one sub image block;

averaging the matrixes of all channels in the first characteristic diagram corresponding to the sub-image block to obtain a characteristic vector corresponding to the sub-image block;

inputting the feature vector corresponding to the sub-image block into a third full-connection layer, and performing nonlinear transformation on the output of the third full-connection layer to obtain a third confidence coefficient vector; wherein, the elements in the third confidence coefficient vector are classification confidence coefficients; repeating the steps until all the top-M sub image blocks are subjected to the steps;

and determining the output value of a first loss function corresponding to the top-M sub image blocks, namely the formula (5), according to the elements in the third confidence coefficient vector and the labels carried by the image blocks corresponding to the top-M sub image blocks, and optimizing the parameters of the backbone convolutional neural network and the third full connection layer by using a back propagation algorithm according to the output value of the first loss function corresponding to the top-M sub image blocks, namely the formula (5). The output value of the first loss function corresponding to top-M sub image blocks is the first loss value corresponding to top-M sub image blocks.

In some embodiments, the first loss function corresponding to top-M sub image blocks is a cross-entropy loss function as follows:

wherein the content of the first and second substances,

is a first loss value, M, corresponding to top-M sub image blocks_topN is a table of the number of sub image blocks, i.e. top-MThe nth sub image block is shown, K is the number of classes,

is n 'sub image block R'_iLabel of corresponding image block I, if sub-image block R'_iThe class label of the corresponding image block I is c

Otherwise

The classification confidence for a sub image block I belonging to class c, i.e. the probability that a sub image block I belongs to class c, i.e. the elements in the third confidence vector.

In these embodiments, the category of a sub image block is negative when the content of the image block corresponding to the sub image block does not include positive cells or only includes negative cells and/or a negative environment, when c is 0;

when the content of the image block corresponding to the sub image block comprises follicular tumor cells, the category of the sub image block is follicular tumor, and at this time, c is 1;

when the content of the image block corresponding to the sub image block comprises eosinophilic tumor cells, the category of the sub image block is eosinophilic tumor, and when c is 2;

when the content of the image block corresponding to the sub image block comprises thyroid papillary carcinoma cells, the category of the sub image block is thyroid papillary carcinoma, and at this time, c is 3;

when the content of the image block corresponding to the sub image block comprises medullary thyroid carcinoma cells, the category of the sub image block is medullary thyroid carcinoma, and at this time, c is 4;

when the content of the image block corresponding to the sub image block includes an atypical cell with ambiguous meaning, the category of the sub image block is atypical, and c is 5.

In some embodiments, the steps 120 and S130 may be repeatedly performed and then the steps S210 and S220 may be performed to further increase the probability that the top-M sub image blocks obtained by inputting the first type image block into the target detector include positive cells.

In order to improve Robustness (Robustness) of the classifier and further improve the recognition accuracy of the thyroid cell pathology digital image recognizer, as shown in fig. 7, in the training method of the thyroid cell pathology digital image recognizer provided in the embodiment of the present application, the first training step of the classifier includes:

step S310: inputting the image blocks corresponding to the top-M sub image blocks into a classifier, performing linear transformation and nonlinear transformation on the images corresponding to the top-M sub image blocks through the classifier to obtain a first feature map, and averaging the matrix of each channel in the first feature map to obtain the feature vector of the image block corresponding to the top-M sub image blocks;

it is understood that, since all image blocks in the first image set are input to the classifier in step S120 (the first training step of the classifier) and all the image blocks are subjected to linear transformation and nonlinear transformation by the classifier, the image blocks corresponding to the top-M sub image blocks are also input to the classifier in step S120 (the first training step of the classifier) and generate the first feature map.

The thyroid cell pathology digital image recognizer training method provided by the embodiment of the application can further comprise the following steps of:

step S320: generating a cascade characteristic vector according to the Top-M sub image blocks and the characteristic vectors of the image blocks corresponding to the Top-M sub image blocks;

this step may be referred to as a concatenated feature vector generation step. In some embodiments, step S320 may include: selecting top-T sub image blocks from top-M sub image blocks; inputting the top-T sub image blocks into a classifier, and performing linear transformation and nonlinear transformation on the top-T sub image blocks through the classifier to obtain top-T first feature maps; taking an average value of the matrixes of all channels in the top-T first characteristic graphs to obtain top-T characteristic vectors; cascading top-T characteristic vectors and one characteristic vector to obtain cascadeA feature vector. For example, for n vectors

Cascading, which is equivalent to connecting the n vectors end to generate a vector

In some embodiments, the top-T sub image blocks correspond to the top-T elements with the largest value among the top-M elements obtained by sorting the elements in the second feature map according to the value size and performing non-maximum suppression.

Step S330: the classifier is trained with first penalty values corresponding to the concatenated feature vectors.

This step may be referred to as a third training step of the classifier, where the first loss value corresponding to the concatenated feature vectors is determined according to the classification confidence obtained by inputting the concatenated feature vectors into the classifier and the labels carried by the image blocks corresponding to the concatenated feature vectors. In some embodiments, the classifier further includes a second fully connected layer, and step S330 may include:

inputting the cascade characteristic vector into a second full-connection layer, and carrying out nonlinear transformation on the output of the second full-connection layer to obtain a fourth confidence coefficient vector; wherein the elements in the fourth confidence vector are classification confidence;

and determining an output value of a first loss function corresponding to the cascade feature vector according to elements in the fourth confidence vector and a label carried by the image block corresponding to the fourth confidence vector, wherein the output value of the formula (6) is obtained by optimizing parameters of the target detector and the second fully-connected layer by using a back propagation algorithm according to the output value of the first loss function corresponding to the cascade feature vector, and the image block corresponding to the fourth confidence vector is an image block corresponding to the cascade feature vector, namely an image block corresponding to a top-M sub image block. The output value of the first penalty function corresponding to the concatenated feature vector is the first penalty value corresponding to the concatenated feature vector.

In some embodiments, the first loss function corresponding to the concatenated feature vectors is a cross-entropy loss function as follows:

wherein the content of the first and second substances,

is a first loss value corresponding to the concatenated feature vectors, N is the number of samples, i.e. the number of image blocks in the first image set, N represents the nth image block, K is the number of classes,

is the label of the nth image block I corresponding to the cascade feature vector, if the class label of the image block I corresponding to the cascade feature vector concat is c

Otherwise

The classification confidence that the concatenated feature vector concat belongs to the class c, i.e. the probability that the sub image block I belongs to the class c, i.e. the elements in the fourth confidence vector.

In these embodiments, when the content of the image block corresponding to the cascade feature vector does not include positive cells or includes only negative cells and/or a negative environment, the category of the cascade feature vector is negative, where c is 0;

when the content of the image block corresponding to the cascade feature vector comprises follicular tumor cells, the class of the cascade feature vector is follicular tumor, and at this time, c is 1;

when the content of the image block corresponding to the cascade feature vector comprises eosinophilic tumor cells, the category of the cascade feature vector is eosinophilic tumor, and at this time, c is 2;

when the content of the image block corresponding to the sub-cascade feature vector comprises papillary thyroid carcinoma cells, the category of the cascade feature vector is papillary thyroid carcinoma, and at this time, c is 3;

when the content of the image block corresponding to the cascade feature vector comprises medullary thyroid carcinoma cells, the type of the cascade feature vector is medullary thyroid carcinoma, and at this time, c is 4;

when the content of the image block corresponding to the cascade feature vector includes an atypical cell with ambiguous meaning, the class of the cascade feature vector is atypical, and c is 5.

It is noted that in some embodiments, the parameters and structures of the first fully-connected layer, the second fully-connected layer, and the third fully-connected layer of the classifier are different, i.e., the first fully-connected layer, the second fully-connected layer, and the third fully-connected layer are three different fully-connected layers; in other embodiments, the parameters and structure of the first fully-connected layer and the third fully-connected layer of the classifier may be identical, i.e., the first fully-connected layer and the third fully-connected layer are the same fully-connected layer. Since the second fully-connected layer corresponds to the concatenated feature vector, which has a higher dimension than the feature vector, the parameters and structure of the second fully-connected layer are necessarily different from those of the first and third fully-connected layers.

It should be noted that, in a general case, the number of execution times of all the steps provided in the embodiments of the present application is not limited, that is, each step may be repeatedly executed alone or in combination with other steps as needed.

Example two:

the thyroid cell pathology digital image recognizer provided by the embodiment has a structure as shown in the figure, and comprises a classifier and an object detector, wherein the classifier comprises a backbone convolution neural network and a full connection layer. In some embodiments, the backbone convolutional neural network may be a convolutional neural network such as ResNet 50; in other embodiments, the backbone convolutional neural network may be a convolutional neural network designed as desired.

The thyroid cell pathology digital image recognizer training method provided by the embodiment comprises the following steps of:

a first image set acquisition step, wherein a first image set used for training a thyroid cell pathology digital image recognizer is acquired; the image processing method comprises the steps that a first image set comprises image blocks with labels, the labels are used for indicating contents contained in the image blocks, the first image set comprises a first type of image blocks and a second type of image blocks (namely, a certain image block can be a first type of image block and can also be a second type of image block), the contents of the first type of image blocks comprise positive cells, and the contents of the second type of image blocks only comprise negative cells and negative environments; in some embodiments, positive cells include follicular tumor cells, eosinophilic tumor cells, papillary thyroid carcinoma cells, medullary thyroid carcinoma cells, atypical cells of undefined significance; the negative cells comprise benign follicular nodule cells, erythrocytes, lymphocytes, phagocytes, cyst wall cells and the like, and the negative environment comprises various environments such as dust and blank of blood and glass slices;

a target detector training step of training a target detector by using a second loss value in response to the fact that the image block input to the classifier is the first type image block according to the label; the target detector is configured to receive the first feature map output by the classifier and output a second feature map; the elements of the second feature map correspond to candidate regions in the image block, and the values of the elements correspond to the probability that the candidate regions contain positive cells; the second loss value is determined according to the classification confidence degree obtained by inputting the sub image blocks generated by the candidate region into the classifier and the elements of the second feature map.

The thyroid cell pathology digital image recognizer training method provided in this embodiment is described below with reference to fig. 9.

In this embodiment, as shown in fig. 10, the first training step of the classifier includes the following sub-steps:

step S121: inputting the image block with the label into a backbone convolutional neural network, and performing linear transformation (convolution) and nonlinear transformation (nonlinear activation) on the image block through the backbone convolutional neural network to obtain a first characteristic diagram corresponding to the image block;

in this embodiment, the first characteristic diagram is denoted as F₁∈R^C×H×WI.e. the first characteristic diagram F₁There are C channels, each channel corresponding to an H W matrix.

Step S122: taking an average value of matrixes of all channels in a first characteristic diagram corresponding to the image blocks to obtain characteristic vectors corresponding to the image blocks;

in this embodiment, the feature vector corresponding to the image block is denoted as F₁₂∈R^C×1The term averaging of the matrix means summing all elements in the matrix and dividing by the number of matrix elements.

Step S123: inputting the feature vector corresponding to the image block into the full-connection layer, and performing nonlinear transformation on the output of the full-connection layer to obtain a first confidence coefficient vector;

in this embodiment, performing nonlinear transformation on the output of the fully-connected layer means adding a SoftMax function to the output of the fully-connected layer for activation; the first confidence vector is noted

Elements thereof

Is used to indicate the probability (classification confidence) that the content in the image block comprises only negative cells and negative circumstances,

respectively, for indicating the probability (classification confidence) that the contents in the image patch include follicular tumor cells, eosinophilic tumor cells, papillary thyroid carcinoma cells, medullary thyroid carcinoma cells, atypical cells with ambiguous meanings.

In some embodiments, the number of the full connection layers may be multiple, that is, the full connection layers include a first full connection layer, a second full connection layer, and a third full connection layer, wherein the full connection layer in step S123 is the first full connection layer.

Step S124: and determining an output value of a first loss function corresponding to the first image set according to the elements in the first confidence coefficient vector and the labels carried by the image blocks, and optimizing parameters of the backbone convolutional neural network and the full-link layer by using a back propagation algorithm according to the output value of the first loss function corresponding to the first image set.

In this embodiment, the first loss function corresponding to the first image set is the same as formula (2) in the first embodiment. In some embodiments, the fully-connected layer in step S124 is a first fully-connected layer. The output value of the first loss function corresponding to the first image set is the first loss value corresponding to the first image set.

In the present embodiment, as shown in fig. 11, the target detector training step includes the following sub-steps:

step S131: judging whether the content in the image block comprises any one of the following contents according to the label carried by the image block: follicular tumor cells, eosinophilic tumor cells, papillary thyroid carcinoma cells, medullary thyroid carcinoma cells, atypical cells of undefined significance;

if not, finishing the training step of the target detector using the current image block for training; if yes, executing the following steps:

step S132: inputting the first feature map corresponding to the image block into a target detector, and performing linear transformation (convolution) and nonlinear transformation (ReLU function activation) on the first feature map corresponding to the image block through the target detector to obtain at least one second feature map;

in this embodiment, the target detector is a convolutional neural network, the structure of which is shown in fig. 12; the number of the second characteristic graphs is three, and the second characteristic graphs are respectively marked as

And

wherein the second characteristic diagram

There are 6 channels, one for each channel

A matrix of (a); second characteristic diagram

There are 6 channels, one for each channel

A matrix of (a); second characteristic diagram

There are 6 channels, one for each channel

A matrix of (c). In some embodiments, the number of the second feature maps output by the target detector may be one or more, and it is only necessary to correspondingly reduce or increase the number of layers of the convolutional neural network of the target detector.

Step S133: sequencing all elements in at least one second feature map according to the numerical values and inhibiting non-maximum values to obtain a plurality of elements, and generating a plurality of sub image blocks according to candidate areas corresponding to the plurality of elements;

step S134: inputting a plurality of sub image blocks into a backbone convolutional neural network, and then taking a mean value to obtain a plurality of second feature vectors; wherein each second feature vector corresponds to one of the plurality of sub image blocks, respectively;

inputting a plurality of sub image blocks into a backbone convolution neural network, and then taking the mean value: and inputting the plurality of sub image blocks into a backbone convolution neural network, performing convolution (linear transformation) and activation (nonlinear transformation) on the sub image blocks through the backbone convolution neural network to obtain a plurality of second feature maps, and averaging the matrix of each channel in the first feature map corresponding to the sub image blocks. Averaging the matrix means summing all elements in the matrix and dividing by the number of matrix elements.

Step S135: inputting a plurality of second characteristic vectors into the full-connection layer, and performing nonlinear transformation on the output of the full-connection layer to obtain second confidence coefficient vectors;

in this embodiment, performing nonlinear transformation on the output of the fully-connected layer means adding a SoftMax function to the output of the fully-connected layer for activation; the second confidence vector is noted

Elements thereof

Is the classification confidence, wherein the value of c is 0

Is used to indicate the probability that the content in the image block comprises only negative cells and a negative environment; when the value of c is 1 to 5, the probability indicating that the contents in the image block include follicular tumor cells, eosinophilic tumor cells, papillary thyroid carcinoma cells, medullary thyroid carcinoma cells, and atypical cells with ambiguous meanings is indicated. In some embodiments, the fully connected layer in step S135 is a third fully connected layer.

Step S136: and determining the output value of the second loss function according to the numerical value of the element in the second characteristic diagram of the element in the second confidence coefficient vector, and optimizing the parameter of the target detector by using a back propagation algorithm according to the output value of the second loss function.

The output value of the second penalty function is the second penalty value.

It is understood that the second confidence vectors correspond to the second feature vectors, the second feature vectors correspond to the second feature maps, the second feature maps correspond to the sub image blocks, and the sub image blocks correspond to the image blocks that produce the sub image blocks. In this embodiment, the second loss function is consistent with equation (3) in embodiment one.

In order to further improve the classification accuracy of the classifier, as shown in fig. 6, the training method of the thyroid cell pathology digital image recognizer provided in this embodiment may further include the following steps:

a sub-image block generation step, namely inputting a first feature map corresponding to a first type of image or a second type of image block into a target detector to obtain a second feature map, sequencing elements in the second feature map according to the numerical values and carrying out non-maximum suppression to obtain top-M elements, and generating top-M sub-image blocks according to candidate areas corresponding to the top-M elements;

it is noted that in this step, the first feature map generated by either the first type image block or the second type image block is input into the target detector to generate top-M sub-image blocks.

A second training step of the classifier, namely training the classifier by using the first loss value corresponding to the top-M sub image blocks; the first loss value corresponding to the top-M sub image blocks is determined according to the classification confidence coefficient obtained by inputting the top-M sub image blocks into the classifier and the labels carried by the image blocks corresponding to the top-M sub image blocks.

In this embodiment, as shown in fig. 13, the second training step of the classifier includes the following sub-steps:

step S221: inputting one sub image block in top-M sub image blocks into a backbone convolutional neural network, and performing linear transformation (convolution) and nonlinear transformation (activation) on one sub image block through the backbone convolutional neural network to obtain a first characteristic diagram corresponding to one sub image block;

step S222: averaging the matrixes of all channels in the first characteristic diagram corresponding to one sub-image block to obtain a characteristic vector corresponding to one sub-image block;

step S223: inputting a feature vector corresponding to one sub-image block into the full connection layer, and performing nonlinear transformation on the output of the full connection layer to obtain a third confidence coefficient vector;

in this embodiment, performing nonlinear transformation on the output of the full connection layer refers to adding a SoftMax function to the output of the full connection layer for activation; the third confidence vector is noted

Elements thereof

Is the classification confidence, wherein c takes the value 0

Is used to indicate the probability that the content in the image block comprises only negative cells and a negative environment; when the value of c is 1 to 5, the probability indicating that the contents in the image block include follicular tumor cells, eosinophilic tumor cells, papillary thyroid carcinoma cells, medullary thyroid carcinoma cells, and atypical cells with ambiguous meanings is indicated. In some embodiments, the fully-connected layer in step S223 is a third fully-connected layer.

Step S224: and determining an output value of a first loss function corresponding to the top-M sub image blocks according to elements in the third confidence coefficient vector and labels carried by the image blocks corresponding to the top-M sub image blocks, and optimizing parameters of the backbone convolutional neural network and the full connection layer by using a back propagation algorithm according to the output value of the first loss function corresponding to the top-M sub image blocks.

The output value of the first loss function corresponding to top-M sub image blocks is the first loss value corresponding to top-M sub image blocks. In this embodiment, the first loss function corresponding to top-M sub image blocks is consistent with equation (5) in the first embodiment. In some embodiments, the fully connected layer in step S224 is a third fully connected layer.

In order to further improve the recognition accuracy of the thyroid cell pathology digital image recognizer and further improve the robustness of the classifier, as shown in fig. 7, in the training method of the thyroid cell pathology digital image recognizer provided by the embodiment of the present application,

a classifier first training step comprising: inputting the image blocks corresponding to the top-M sub image blocks into a classifier, performing linear transformation and nonlinear transformation on the images corresponding to the top-M sub image blocks through the classifier to obtain a first feature map, and averaging the matrix of each channel in the first feature map to obtain the feature vector of the image block corresponding to the top-M sub image blocks;

it is to be understood that, since all image blocks in the first image set are input to the classifier in the first training step of the classifier, and all the image blocks are subjected to linear transformation and nonlinear transformation by the classifier, the image blocks corresponding to top-M sub image blocks are also input to the classifier in the first training step of the classifier, and the first feature map is generated.

The thyroid cell pathology digital image recognizer training method provided by the embodiment of the application further comprises the following steps:

a cascade feature vector generation step, namely generating a cascade feature vector according to the Top-M sub image blocks and the feature vectors of the image blocks corresponding to the Top-M sub image blocks;

a third training step of the classifier, namely training the classifier by utilizing the cascade characteristic vectors, the labels carried in the image blocks corresponding to the cascade characteristic vectors and the first loss function corresponding to the cascade characteristic vectors;

in this embodiment, as shown in fig. 14, the concatenated feature vector generation step may include the following sub-steps:

step S321: selecting top-T sub image blocks from top-M sub image blocks;

in this embodiment, the top-T sub image blocks correspond to the top-T elements with the largest value among the top-M elements obtained by sorting the elements in the second feature map according to the magnitude of the values and performing non-maximum suppression.

Step S322: inputting the top-T sub image blocks into a classifier, and performing convolution (linear transformation) and activation (nonlinear transformation) on the top-T sub image blocks through the classifier to obtain top-T first feature maps;

step S323: taking an average value of the matrixes of all channels in the top-T first characteristic graphs to obtain top-T characteristic vectors;

in this embodiment, averaging the matrix means summing all the elements in the matrix and dividing by the number of matrix elements.

Step S324: cascading the top-T characteristic vectors and the characteristic vectors of the image blocks corresponding to the top-M sub image blocks to obtain cascaded characteristic vectors;

for example, for n vectors

In this embodiment, as shown in fig. 15, the third training step of the classifier may include the following sub-steps:

step S331: inputting the cascade characteristic vector into a full-connection layer, and carrying out nonlinear transformation on the output of the full-connection layer to obtain a fourth confidence coefficient vector;

in this embodiment, performing nonlinear transformation on the output of the fully-connected layer means adding a SoftMax function to the output of the fully-connected layer for activation; the fourth confidence vector is noted as y^concat∈R^6×1Elements thereof

Is the classification confidence, wherein c takes the value 0

Is used to indicate the probability that the content in the image block comprises only negative cells and a negative environment; when the value of c is 1 to 5,

respectively, for indicating the probability that the contents in the image patch include follicular tumor cells, eosinophilic tumor cells, papillary thyroid carcinoma cells, medullary thyroid carcinoma cells, atypical cells with ambiguous meanings. In some embodiments, the fully-connected layer in step S331 is a second fully-connected layer.

Step S332: and determining an output value of a first loss function corresponding to the cascade characteristic vector according to elements in the fourth confidence coefficient vector and the label carried by the image block corresponding to the cascade characteristic vector, and optimizing parameters of the target detector and the full connection layer by using a back propagation algorithm according to the output value of the first loss function corresponding to the cascade characteristic vector.

The output value of the first penalty function for the concatenated feature vector is the first penalty value for the concatenated feature vector.

In this embodiment, the first loss function corresponding to the concatenated feature vector is consistent with equation (6) in the first embodiment. In some embodiments, the fully connected layer in step S332 is a second fully connected layer.

The beneficial effects are that: by training the target detector by using the output of the classifier, the training of the target detector can be completed under the condition that only the type of the image block is labeled but the position of the positive cell in the image block is not labeled, so that the target detector can locate the positive cell in the image block by determining the candidate area containing the positive cell.

Beneficial effects 2: when the image block input into the classifier is judged to be the first-class image block, the output of the classifier corresponding to the image block is used for training the target detector, because for the cytopathology digital image, the thyroid cytopathology digital image recognizer not only can distinguish benign follicular nodules and positive cells (malignant cells), but also can distinguish negative environments (negative background) and positive cells (malignant cells), and for the second-class image block, negative cells such as follicular nodule cells, red cells, lymphocytes, phagocytes and cyst wall cells and various negative environments such as blood, dust of glass slices, blank cells and the like may be contained; in this case, if the candidate region obtained by the target detector contains positive cells, the output value of the second loss function is small (or called reward obtained is large), whereas if the candidate region obtained by the target detector does not contain positive cells, the output value of the second loss function is large (or called reward obtained is small). If the second type image block is also used for training the target detector, the output value of the second loss function is small even if the candidate region obtained by the target detector only contains negative cells and negative environments, so that the trained target detector tends to give a result of only containing negative cells and negative environments, and therefore, the effect obtained by training the target detector by using the output of the classifier corresponding to the image block cannot be obtained when the image block input to the classifier is judged to be the first type image block; moreover, for the cytopathology digital image, among a plurality of image blocks made of the same cell slide, only a few image blocks are the first type of image blocks, and a large number of image blocks are the second type of image blocks, so that the adverse effect of using the second type of image blocks also for training the target detector is more obvious.

Beneficial effect 3: the image blocks are input into the target detector to generate top-M sub image blocks, and the top-M sub image blocks are used for training the classifier, so that the effect of increasing the number of training samples can be achieved, and the classification precision of the classifier is further improved. It should be noted that, when the first type of image block is input to the target detector to generate a top-M sub-image block, due to the precision problem of positioning positive cells by the target detector, some sub-image blocks may not include positive cells, but the label indicating image block carried by the image block corresponding to the sub-image block not including positive cells is positive, which is equivalent to training the classifier with an erroneous labeled sample, so a person skilled in the art generally does not train the classifier with the sub-image block input to the target detector to avoid training the classifier with the erroneous labeled sample. However, in the embodiment of the present application, the elements in the second feature map output by the target detector have been sorted and suppressed by a non-maximum value, so as to obtain a top-M sub image blocks most likely to include positive cells, even if sub image blocks not including positive cells exist in the top-M sub image blocks, the proportion of these sub image blocks not including positive cells is small, which makes the degree of convergence of the classifier to the wrong direction small, and as the target detector training step (step S130) is repeatedly executed subsequently, the precision of the target detector in locating positive cells is higher and higher, the probability and proportion of sub image blocks not including positive cells existing in the top-M sub image blocks are also lower and lower, that is, the cost of training the classifier by using the wrongly labeled sub image blocks in the embodiment of the present application is lower and lower as the training is performed, the final effect is to make the classification precision of the classifier higher and higher.

Beneficial effect 4: when the image block is input into the target detector to generate a top-M sub-image block and the classifier is trained by using the top-M sub-image block, and when the image block input into the classifier is judged to be the first-type image block, the output of the classifier corresponding to the image block is used to train the target detector, the two are combined, so that the additional beneficial effects are brought: because the target detector is trained by using the output of the classifier corresponding to the image block only when the image block input into the classifier is the first-type image block, namely the target detector is trained only by using the first feature map corresponding to the image block containing the positive cells and the classification confidence coefficient, the target detector is inclined to locate the positive cells, and the target detector searches for the area most similar to the positive cells; thus, when a second type of patch (patch containing only negative cells and the environment) is input to the target detector and subimage patches are generated, the negative cells contained in these subimage patches are the negative cells with the most similar or closest morphology to the positive cells, such as benign follicular nodules; training the classifier by using the subimage blocks containing the negative cells with the shapes most similar to or closest to the positive cells can effectively improve the capability of the classifier for distinguishing the true positive cells from the negative cells with the shapes most similar to or closest to the positive cells.

The beneficial effects are that: the feature vectors corresponding to the Top-M sub image blocks and the feature vectors of the image blocks corresponding to the Top-M sub image blocks are cascaded to generate cascaded feature vectors, and the cascaded feature vectors are used for training the classifier, so that the Robustness (Robustness) and the classification accuracy of the classifier can be improved, for the following reasons: the image block of the cytopathology digital image may contain contents such as positive cells, negative cells, a negative environment and the like, and the feature vector corresponding to the image block may contain features of the positive cells, the negative cells and the negative environment, so that more interference is introduced when the classifier is trained only by using the feature vector corresponding to the image block; since the Top-M subimage blocks are generated after non-maximum inhibition, the contents of the Top-M subimage blocks tend to include positive cells or cells closest to the positive cells (such as benign follicular nodule cells), i.e. most negative cells and influence and interference of negative environment are excluded; therefore, the classifier is trained by using the cascade feature vectors, and the local features of Top-M sub image blocks can be combined on the basis of the global features of the image blocks, so that the interference of irrelevant contents in the image blocks is eliminated to a great extent, and the Robustness (Robustness) and the classification precision of the classifier are improved.

The beneficial effects are that: in some embodiments, the parameters and structures of the first fully-connected layer, the second fully-connected layer, and the third fully-connected layer are different, that is, the first fully-connected layer, the second fully-connected layer, and the third fully-connected layer are three different fully-connected layers, so that the mutual interference between different training steps can be avoided. Because the parameters of the first fully-connected layer are optimized by using the image blocks in the first image set, the candidate regions corresponding to the elements in the second feature map have different sizes, and the features of the sub-image blocks corresponding to the larger candidate regions are closer to the features of the image blocks, if the target detector is trained by directly using the classification confidence generated by the first fully-connected layer, the output value of the first loss function generated by the sub-image blocks corresponding to the larger candidate regions in the training process of the target detector is smaller (the reward generated by the sub-image blocks corresponding to the larger candidate regions is larger), so that the values of the elements corresponding to the larger candidate regions in the second feature map output by the trained target detector are larger than the values of the elements corresponding to the smaller candidate regions, that is, the target detector tends to frame the larger regions. However, in a cytopathology digital image, positive cells may exist as single cells, or may exist as clumps of varying sizes; in view of this, in the training method provided in the embodiment of the present application, the first fully-connected layer is optimized in the first training step of the classifier, the third fully-connected layer is optimized in the second training step of the classifier, and the classification confidence for training the target detector is generated via the third fully-connected layer, where structures and parameters of the first fully-connected layer and the third fully-connected layer are independent from each other, such an arrangement can isolate the influence of different training steps on the fully-connected layers, and avoid that the target detector tends to frame a larger area, so that the target detector can more accurately select positive cells existing in the form of a single cell or in the form of different-sized clumps with candidate area frames of different sizes.

The beneficial effects are that: in some embodiments, the second signature of the target detector output is a plurality; at this time, the sizes of the candidate regions corresponding to the elements in the same second feature map are the same, and the sizes of the candidate regions corresponding to the elements in different second feature maps are different; this allows for more convenient and intuitive use of candidate regions to frame positive cell clumps of different sizes or individual positive cells.

EXAMPLE III

The thyroid cell pathology digital image identifier provided in this embodiment is shown in fig. 3, and includes a classifier and an object detector. The classifier is configured to receive an image block, perform linear transformation and nonlinear transformation on the received image block and output a first feature map and a classification confidence coefficient; the image blocks comprise a first type of image blocks and a second type of image blocks, the content of the first type of image blocks comprises positive cells, and the content of the second type of image blocks comprises only negative cells and a negative environment; the first feature map corresponds to the image block input to the classifier, that is, the first feature map corresponds to the image block input to the classifier, and the classification confidence is used to indicate that the image block input to the classifier is the first type image block or the second type image block. The target detector is configured to receive the first feature map and output a second feature map after performing linear transformation and nonlinear transformation on the received first feature map; the elements in the second feature map correspond to candidate regions in the image block, that is, the position of each element in the second feature map corresponds to the coordinate of one candidate region in the image block and the size of the candidate region, and the numerical value of the element in the second feature map corresponds to the probability that the candidate region contains positive cells. In some embodiments, the candidate area is an area framed in the image block as shown in FIG. 4.

As shown in fig. 16, the image recognition method provided by the present embodiment is executed by a thyroid cell pathology digital image recognizer, and includes the following steps:

step S1000: acquiring an image block; the acquired image blocks comprise a first type of image blocks and a second type of image blocks;

the content of the first type image block comprises positive cells, and the content of the second type image block comprises only negative cells and a negative environment. In some embodiments, the positive cells include follicular tumor cells, eosinophilic tumor cells, thyroid papillary carcinoma cells, medullary thyroid carcinoma cells, atypical cells of undefined significance; the negative cells comprise benign follicular nodule cells, erythrocytes, lymphocytes, phagocytes and cyst wall cells, and the negative environment comprises blood, dust of glass slices, blank and other various environments.

Step S2000: inputting the image block into a classifier, and outputting a first feature map after linear transformation and nonlinear transformation are carried out on the image block through the classifier;

in this step, the first feature map corresponds to an image block of the input classifier, and the classification confidence is used to indicate that the image block of the input classifier is a first type image block or a second type image block;

step S3000: inputting the first characteristic diagram into a target detector, and performing linear transformation and nonlinear transformation on the first characteristic diagram through the target detector to obtain at least one second characteristic diagram;

step S4000: determining a classifier confidence according to the first feature map and the at least one second feature map;

step S5000: and after all elements of at least one second feature map are normalized and subjected to non-maximum value inhibition, taking a candidate region corresponding to the element with the value larger than the first set value as a region containing positive cells in the image block.

It can be understood that, since the numerical values of the elements in the second feature map need to be higher than the first set value, the corresponding candidate region is output as the region containing the positive cells in the image block, and when the image block is the first type of image block, the numerical values of the elements in the second feature map are higher than the first set value, in this case, the candidate region is output as the region containing the positive cells in the image block, and when the image block is the second type of image block, the numerical values of the elements in the second feature map are not higher than the first set value, in this case, the candidate region is not output as the region containing the positive cells in the image block.

In some embodiments, as shown in fig. 8, the classifier in the thyroid cytopathology digital image identifier comprises a backbone convolutional neural network and a full connectivity layer; as shown in fig. 17, step S2000 may include the following sub-steps:

step S2001: inputting an image block into a backbone convolutional neural network, and performing linear transformation and nonlinear transformation on the image block through the backbone convolutional neural network to obtain a first characteristic diagram;

in some embodiments, the linear transformation and the nonlinear transformation of the one image block are performed by the backbone convolutional neural network, which is equivalent to performing convolution and activation on the one image block by the backbone convolutional neural network.

In some embodiments, as shown in fig. 18, step S4000 may include the following sub-steps:

step S4001: taking an average value of the matrixes of all the channels in the first characteristic diagram to obtain a characteristic vector;

averaging the matrix means summing all elements in the matrix and dividing by the number of matrix elements.

Step S4002: sequencing all elements in at least one second characteristic diagram according to the numerical values and carrying out non-maximum suppression to obtain top-T elements, and generating top-T sub image blocks according to candidate areas corresponding to the top-T elements;

step S4003: inputting the top-T sub image blocks into a backbone convolutional neural network to obtain top-T first feature maps;

step S4004: taking an average value of the matrixes of all channels in the top-T first characteristic graphs to obtain top-T characteristic vectors;

step S4005: cascading top-T characteristic vectors and one characteristic vector to obtain a cascading characteristic vector;

step S4006: and inputting the cascade characteristic vectors into the full-connection layer to obtain classification confidence.

For example, for n vectors

In some embodiments, the classification confidence is a probability for indicating that the content in the image block includes positive cells, or negative cells and a negative environment. In other embodiments, the classification confidence is a probability for indicating that the content in the image patch includes follicular tumor cells, or eosinophilic tumor cells, or papillary thyroid carcinoma cells, or medullary thyroid carcinoma cells, or atypical cells of ambiguous significance, or negative cells and a negative environment.

It can be understood that the thyroid cell pathology digital image recognizer in this embodiment (embodiment three) can be obtained by training the thyroid cell pathology digital image recognizer training method provided in embodiment one and embodiment two. It is understood that the structures, actions, or processes, and input/output of the target detector, the backbone convolutional neural network, and the full link layer in this embodiment (third embodiment) may be consistent with the structures, actions, or processes, and input/output of the target detector, the backbone convolutional neural network, and the full link layer in the first embodiment and the second embodiment. The fully-connected layer in this embodiment (third embodiment) may be any one of the first fully-connected layer, the second fully-connected layer, or the third fully-connected layer in the first embodiment and the second embodiment; in particular, the fully-connected layer in this embodiment (embodiment three) may be the second fully-connected layer in embodiment one and embodiment two.

The image recognition method provided by the embodiment of the application can not only classify the types of the image blocks through the classifier so as to judge the types of lesions, but also determine the candidate region containing the positive cells through the target detector so as to position the positive cells in the image blocks, so that a doctor can further verify and confirm the types of the lesions by using the positive cell region framed and selected by the target detector after obtaining the types of the lesions output by the image recognition method provided by the embodiment of the application. Moreover, the image blocks and the top-T sub-image blocks are used for generating the cascade feature vectors, and the cascade feature vectors are used for generating the classification confidence, so that the Robustness (Robustness) and the classification precision of the classifier can be improved.

It should be noted that, the specific implementation and the corresponding technical effects of the image recognition method according to the embodiment of the present application may be referred to the specific implementation and the corresponding technical effects of the thyroid cell pathology digital image recognizer training method described above; vice versa, that is, the specific implementation of the thyroid cell pathology digital image recognizer training method and the corresponding technical effects can correspond to the specific implementation of the reference image recognition method and the corresponding technical effects.

As shown in fig. 19, an embodiment of the present application provides an exercise apparatus, including: memory 1210, a control processor 1220, and computer programs stored on memory 1210 and executable on control processor 1220.

The control processor 1220 and memory 1210 may be connected by a bus or other means.

The non-transitory software program and instructions required to implement the model training method of the above-mentioned embodiment are stored in the memory 1210, and when being executed by the control processor 1220, the thyroid cytopathology digital image recognizer training method of the above-mentioned embodiment is performed, for example, the method steps S110 to S130 in fig. 5, the method steps S210 to S220 in fig. 6, the method steps S310 to S330 in fig. 7, the method steps S121 to S124 in fig. 10, the method steps S131 to S136 in fig. 11, the method steps S221 to S224 in fig. 13, the method steps S321 to S324 in fig. 14, and the method steps S331 to S332 in fig. 15, which are described above, are performed.

As shown in fig. 20, an embodiment of the present application provides an image recognition apparatus, including: memory 1310, a control processor 1320, and computer programs stored on memory 1310 and executable on control processor 1320.

The control processor 1320 and memory 1310 may be connected by a bus or other means.

Non-transitory software programs and instructions necessary to implement the image recognition method of the above-described embodiment are stored in the memory 1310, and when executed by the control processor 1320, perform the image recognition method of the above-described embodiment, for example, perform the method steps S1000 to S2000 in fig. 16, S2001 in fig. 17, and S4001 to S4005 in fig. 18 described above.

The above-described embodiments of the apparatus and device are merely illustrative, and the units illustrated as separate components may or may not be physically separate, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment.

Furthermore, the present application provides a computer-readable storage medium storing computer-executable instructions, which can be used to enable a computer to execute the thyroid cell pathology digital image recognizer training method or the image recognition method provided in the present application, for example, execute the above-described method steps S110 to S130 in fig. 5, S210 to S220 in fig. 6, S310 to S330 in fig. 7, S121 to S124 in fig. 10, S131 to S136 in fig. 11, S221 to S224 in fig. 13, S321 to S324 in fig. 14, S331 to S332 in fig. 15, or the above-described method steps S1000 to S2000, S332 in fig. 16, Method step S2001 in fig. 17, method step S4001 to method step S4005 in fig. 18.

One of ordinary skill in the art will appreciate that all or some of the steps, systems, and methods disclosed above may be implemented as software, firmware, hardware, and suitable combinations thereof. Some or all of the physical components may be implemented as software executed by a processor, such as a central processing unit, digital signal processor, or microprocessor, or as hardware, or as an integrated circuit, such as an application specific integrated circuit. Such software may be distributed on computer readable media, which may include computer storage media (or non-transitory media) and communication media (or transitory media). The term computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data, as is well known to those skilled in the art. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, Digital Versatile Disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can accessed by a computer. In addition, communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media as known to those skilled in the art.

Although the present invention has been described with reference to a few embodiments, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims

1. A thyroid cell pathology digital image recognizer training method, wherein the thyroid cell pathology digital image recognizer comprises a classifier and an object detector, the method comprising:

a first image set acquisition step of acquiring a first image set for training the thyroid cell pathology digital image recognizer; the first image set comprises image blocks carrying labels, the labels are used for indicating contents contained in the image blocks, the first image set comprises a first type of image blocks and a second type of image blocks, the contents of the first type of image blocks comprise positive cells, and the contents of the second type of image blocks comprise negative cells and negative environments;

a classifier first training step of training the classifier with a first loss value corresponding to the first image set; the classifier is configured to receive the image block carrying the label and output a first feature map and a classification confidence corresponding to the image block; the classification confidence is used for indicating that the image block input into the classifier is a first type image block or a second type image block; the first loss value corresponding to the first image set is determined according to the classification confidence and the label carried by the image block;

a target detector training step, in response to the fact that the image blocks input into the classifier are the first type of image blocks according to the labels, training the target detector by using a second loss value; the target detector is configured to receive a first feature map output by the classifier and output a second feature map; elements of the second feature map correspond to candidate regions in the image block, and the numerical values of the elements correspond to the probability that the candidate regions contain positive cells; and the second loss value is determined according to the classification confidence degree obtained by inputting the sub image block generated by the candidate region into the classifier and the elements of the second feature map.

2. The method of claim 1, further comprising:

a sub-image block generation step, namely inputting a first feature map corresponding to a first type of image or a second type of image block into the target detector to obtain a second feature map, sequencing elements in the second feature map according to the numerical values and inhibiting non-maximum values to obtain top-M elements, and generating top-M sub-image blocks according to candidate areas corresponding to the top-M elements;

a classifier second training step of training the classifier with a first loss value corresponding to the top-M sub image blocks; the first loss value corresponding to the top-M sub image blocks is determined according to the classification confidence coefficient obtained by inputting the top-M sub image blocks into the classifier and the labels carried by the image blocks corresponding to the top-M sub image blocks.

3. The method of claim 2, wherein,

the first training step of the classifier comprises the following steps:

inputting the image blocks corresponding to the top-M sub image blocks into the classifier, performing linear transformation and nonlinear transformation on the image corresponding to the top-M sub image blocks through the classifier to obtain a first feature map, and averaging the matrix of each channel in the first feature map to obtain feature vectors of the image blocks corresponding to the top-M sub image blocks;

the method further comprises the following steps:

generating a cascade feature vector, namely generating a cascade feature vector according to the top-M sub image blocks and the feature vector of the image block corresponding to the top-M sub image blocks;

a third training step of the classifier, which trains the classifier by using a first loss value corresponding to the cascade feature vector; and determining the first loss value corresponding to the cascade feature vector according to a classification confidence coefficient obtained by inputting the cascade feature vector into the classifier and a label carried by an image block corresponding to the cascade feature vector.

4. The method of claim 3, wherein the classifier comprises a backbone convolutional neural network, a first fully-connected layer, a second fully-connected layer, and a third fully-connected layer;

the first training step of the classifier comprises the following steps: optimizing parameters of the backbone convolutional neural network and the first fully-connected layer using a first loss value corresponding to the first image set;

the second training step of the classifier comprises the following steps: optimizing parameters of the backbone convolutional neural network and the third fully-connected layer by using a first loss value corresponding to the top-M sub image blocks;

the third training step of the classifier comprises the following steps: optimizing parameters of the backbone convolutional neural network and the second fully-connected layer using first penalty values corresponding to the concatenated eigenvectors;

the target detector training step includes: and inputting the sub image blocks generated by the candidate area into the backbone convolutional neural network, then averaging to obtain the feature vectors corresponding to the sub image blocks, inputting the feature vectors corresponding to the sub image blocks into a third full-connection layer, and performing nonlinear transformation to obtain classification confidence coefficients.

5. The method of claim 1, wherein the classifier comprises a backbone convolutional neural network, a first fully-connected layer, a second fully-connected layer, and a third fully-connected layer; the positive cells comprise follicular tumor cells, eosinophilic tumor cells, thyroid papillary carcinoma cells, medullary thyroid carcinoma cells and atypical cells with undefined meanings; the first training step of the classifier comprises the following steps:

inputting an image block carrying a label into the backbone convolutional neural network, and performing linear transformation and nonlinear transformation on the image block through the backbone convolutional neural network to obtain a first characteristic diagram corresponding to the image block;

taking an average value of the matrixes of all channels in the first characteristic diagram corresponding to the image blocks to obtain characteristic vectors corresponding to the image blocks;

inputting the feature vector corresponding to the image block into the first fully-connected layer, and performing nonlinear transformation on the output of the first fully-connected layer to obtain a first confidence coefficient vector; wherein elements in the first confidence vector are classification confidences indicating that content in the image patch includes follicular tumor cells, or eosinophilic tumor cells, or papillary thyroid carcinoma cells, or medullary thyroid carcinoma cells, or atypical cells of ambiguous meaning, or negative cells and a negative context;

determining the first loss value corresponding to the first image set according to elements in the first confidence coefficient vector and the label carried by the image block, and optimizing the parameters of the backbone convolutional neural network and the first full-connected layer by using a back propagation algorithm according to the first loss value corresponding to the first image set.

6. The method of claim 5, wherein the target detector training step comprises:

responding to the label carried by the image block, and determining that the content in the image block comprises any one of the following: follicular tumor cells, eosinophilic tumor cells, papillary thyroid carcinoma cells, medullary thyroid carcinoma cells, atypical cells of undefined significance, the following steps being performed:

inputting the first feature map corresponding to the image block into the target detector, and performing linear transformation and nonlinear transformation on the first feature map corresponding to the image block through the target detector to obtain at least one second feature map;

sequencing all elements in the at least one second feature map according to the numerical values and inhibiting non-maximum values to obtain a plurality of elements, and generating a plurality of sub image blocks according to candidate areas corresponding to the plurality of elements;

inputting the plurality of sub image blocks into the backbone convolutional neural network and then averaging to obtain a plurality of feature vectors; wherein the plurality of feature vectors correspond to the plurality of sub image blocks one-to-one;

inputting the plurality of feature vectors into the third fully-connected layer, and performing nonlinear transformation on the output of the third fully-connected layer to obtain a plurality of second confidence vectors; wherein elements in the second confidence vector are classification confidences indicating that content in the image patch includes follicular tumor cells, or eosinophilic tumor cells, or papillary thyroid carcinoma cells, or medullary thyroid carcinoma cells, or atypical cells of ambiguous meaning, or negative cells and a negative context;

and determining the second loss value according to the elements in the second confidence coefficient vector, the labels carried by the image blocks corresponding to the sub image blocks and the numerical values of the elements in the at least one second feature map, and optimizing the parameters of the target detector by using a back propagation algorithm according to the second loss value.

7. The method of claim 6, further comprising:

a second training step of the classifier, comprising:

inputting one sub image block in the top-M sub image blocks into the backbone convolutional neural network, and performing linear transformation and nonlinear transformation on the one sub image block through the backbone convolutional neural network to obtain a first characteristic diagram corresponding to the one sub image block;

averaging the matrixes of all channels in the first feature map corresponding to the sub-image block to obtain a feature vector corresponding to the sub-image block;

inputting the feature vector corresponding to the sub-image block into the third fully-connected layer, and performing nonlinear transformation on the output of the third fully-connected layer to obtain a third confidence coefficient vector; wherein elements in the third confidence vector are classification confidences indicating that content in the image patch includes follicular tumor cells, or eosinophilic tumor cells, or papillary thyroid carcinoma cells, or medullary thyroid carcinoma cells, or atypical cells of ambiguous meaning, or negative cells and a negative context;

determining a first loss value corresponding to the top-M sub image blocks according to elements in the third confidence coefficient vector and labels carried by the image blocks corresponding to the top-M sub image blocks, and optimizing parameters of the backbone convolutional neural network and the third full connection layer by using a back propagation algorithm according to the first loss value corresponding to the top-M sub image blocks.

8. The method of claim 7, wherein,

the first training step of the classifier comprises the following steps:

inputting the image blocks corresponding to the top-M sub image blocks into the classifier, and performing linear transformation and nonlinear transformation on the images corresponding to the top-M sub image blocks through the classifier to obtain a first feature map;

taking an average value of the matrixes of all channels in the first characteristic diagram to obtain characteristic vectors of image blocks corresponding to top-M sub image blocks;

the method further comprises the following steps:

a cascade feature vector generation step, comprising:

selecting top-T sub image blocks from the top-M sub image blocks;

inputting the top-T sub image blocks into the classifier, and performing linear transformation and nonlinear transformation on the top-T sub image blocks through the classifier to obtain top-T first feature maps;

taking an average value of the matrixes of all channels in the top-T first feature maps to obtain top-T feature vectors;

cascading the top-T characteristic vectors and the characteristic vectors of the image blocks corresponding to the top-M sub image blocks to obtain cascaded characteristic vectors;

a third training step of the classifier, comprising:

inputting the cascade characteristic vector into the second fully-connected layer, and performing nonlinear transformation on the output of the second fully-connected layer to obtain a fourth confidence coefficient vector; wherein elements in the fourth confidence vector are classification confidences indicating that content in the image patch includes follicular tumor cells, or eosinophilic tumor cells, or papillary thyroid carcinoma cells, or medullary thyroid carcinoma cells, or atypical cells of ambiguous meaning, or negative cells and a negative context;

and determining a first loss value corresponding to the cascade feature vector according to elements in the fourth confidence coefficient vector and a label carried by an image block corresponding to the cascade feature vector, and optimizing parameters of the target detector and the second fully-connected layer by using a back propagation algorithm according to the first loss function value corresponding to the cascade feature vector.

9. An image recognition method, wherein the method is performed by a thyroid cell pathology digital image recognizer comprising a classifier and an object detector, the method comprising:

acquiring an image block; the image blocks comprise a first type of image blocks and a second type of image blocks, the content of the first type of image blocks comprises positive cells, and the content of the second type of image blocks comprises negative cells and a negative environment;

inputting the image block into the classifier to obtain a first feature map; the first feature map corresponds to the image block input into the classifier, and the classification confidence is used for indicating that the image block input into the classifier is a first type image block or a second type image block;

inputting the first characteristic diagram into the target detector to obtain at least one second characteristic diagram;

determining a classification confidence level according to the first feature map and the at least one second feature map;

and after all elements of the at least one second feature map are normalized and subjected to non-maximum value inhibition, taking a candidate region corresponding to the element with the value larger than the first set value as a region containing positive cells in the image block.

10. The method of claim 9, wherein the classifier comprises a backbone convolutional neural network and a full connectivity layer,

inputting the image block into the classifier to obtain a first feature map, including:

inputting one image block into the backbone convolutional neural network to obtain a first characteristic diagram;

the determining a classification confidence according to the first feature map and the at least one second feature map comprises:

taking an average value of the matrixes of all the channels in the first characteristic diagram to obtain a characteristic vector;

sequencing all elements in the at least one second feature map according to the numerical values and inhibiting non-maximum values to obtain top-T elements, and generating top-T sub image blocks according to candidate areas corresponding to the top-T elements;

inputting the top-T sub image blocks into the backbone convolutional neural network to obtain top-T first feature maps;

taking the mean value of the matrixes of all channels in the top-T first characteristic diagrams to obtain top-T characteristic vectors;

cascading the top-T characteristic vectors and the characteristic vector to obtain a cascading characteristic vector;

inputting the cascade characteristic vectors into a full connection layer to obtain a classification confidence coefficient; wherein the classification confidence is a probability indicating that content in the image block includes positive cells, or negative cells and a negative environment.

11. An exercise device comprising at least one control processor and a memory for communicative connection with the at least one control processor; the memory stores instructions executable by the at least one control processor to enable the at least one control processor to perform the thyroid cell pathology digital image identifier training method of any one of claims 1 to 8.

12. An image recognition apparatus comprising at least one control processor and a memory for communicative connection with the at least one control processor; the memory stores instructions executable by the at least one control processor to enable the at least one control processor to perform the image recognition method of any one of claims 9 to 10.

13. A computer-readable storage medium having stored thereon computer-executable instructions for causing a computer to perform the thyroid cytopathology digital image recognizer training method of any one of claims 1 to 8 or the image recognition method of any one of claims 9 to 10.