CN114037868B

CN114037868B - Image recognition model generation method and device

Info

Publication number: CN114037868B
Application number: CN202111302552.4A
Authority: CN
Inventors: 王晓梅; 张仕侨; 章万韩; 蔡博君; 朱逢亮; 范晓华
Original assignee: Hangzhou Yice Technology Co ltd
Current assignee: Hangzhou Yice Technology Co ltd
Priority date: 2021-11-04
Filing date: 2021-11-04
Publication date: 2022-07-01
Anticipated expiration: 2041-11-04
Also published as: CN114037868A

Abstract

The invention discloses a method and a device for generating an image recognition model. Wherein, the method comprises the following steps: processing the unlabeled data in the second training data set by using the first image recognition model to obtain the classes of the unlabeled data in the second training data set, screening to obtain objects with positive classes in the unlabeled data in the second training data set, and resetting the objects with negative classes to obtain a third training data set; and training by using the processed third training data set to obtain a second image recognition model. The invention solves the technical problems that the TCT slide information analysis mode adopted in the related technology has limitation and can not quickly obtain relatively comprehensive information.

Description

Image recognition model generation method and device

Technical Field

The invention relates to the technical field of information processing, in particular to a method and a device for generating an image recognition model.

Background

Most of the existing studies are to detect and classify cervical liquid-based thin-layer cell Test (TCT), for example, by analyzing physical pathological slides by observing them through a microscope to determine whether the slides are positive. However, the number of slides to be processed is large, and the analysis process is performed manually, which is prone to errors, and the TCT slide cannot be deeply analyzed, so that there is a great limitation on the information obtained by analyzing the slide.

Automated detection has also emerged in response to the aforementioned shortcomings of manual detection, for example, where conventional algorithms are utilized to extract features from the texture color shapes of images. However, the TCT images are too similar in type, so that the traditional algorithm cannot perform accurate identification well.

For another example, as artificial intelligence technology has evolved, people have begun to train artificial intelligence models using artificial intelligence technology to solve various problems. Specifically, the standard neural network model is obtained through training by using the labeled data, and then the classified neural network model is obtained through training and training by using the labeled image data and the unlabeled image data through the standard neural network, so that the characteristic information in the standard neural network model can be reused by the classified neural network model, and the efficiency of classifying the neural network model is improved. And selecting unmarked image data with high information content by using the classified neural network model for marking, and retraining and adjusting the reference neural network model by using the updated marked image data until the reference neural network model reaches a preset condition, and taking the updated reference neural network model as a target neural network model. However, in this way, the labeled image data and the unlabeled image data are used to train the reference neural network model to obtain the classified neural network model, then the classified neural network model is used to label the selected unlabeled image data to obtain the updated standard image data, and the updated standard image data is used to update the reference neural network model, where the image data for training the reference neural network model is fixed, which has certain limitations.

Currently, deep learning is applied in a variety of fields. In the above, deep learning is also widely used in image processing, for example, a neural network is used to locate a target object on an image to be processed. Compared with the traditional algorithm, the deep learning method uses a large number of data sets, and solves the problem that the target in the image is difficult to identify. For example, for images with fine grain differences, deep learning has obvious advantages. There are applications in some image segmentation, for example, classification of images into positive and normal images using a classification network to quickly separate the positive images, thereby saving time. However, in TCT digital images, the picture pixels are large and the visual morphology of different diseased cells is similar, making it difficult for simple neural network algorithms to distinguish.

In view of the above problems, no effective solution has been proposed.

Disclosure of Invention

The application provides a method and a device for generating an image recognition model, which at least solve the technical problems that the TCT slide information analysis mode adopted in the related technology has limitation and relatively comprehensive information cannot be quickly obtained.

According to an aspect of the embodiments of the present invention, there is provided a method for generating an image recognition model, including: acquiring a first training data set, wherein the first training data set comprises positive objects in the labeling data; training an initial network model by using the first training data set to obtain a first image recognition model, wherein the first image recognition model is obtained by using a supervised learning mode, and an application scene of the first image recognition model comprises at least one of the following: target detection, image segmentation and image classification; processing unlabelled data in a second training data set by using the first image recognition model to obtain classes of the unlabelled data in the second training data set, screening to obtain objects with positive classes in the unlabelled data in the second training data set, and resetting the objects with negative classes to obtain a third training data set, wherein the second training data set comprises data which are not labeled in a negative diagnosis result; performing semi-supervised learning training on the initial network model by using the labeling data in the first training data set and the data in the third training data set, and modifying the data labels in the third training data set into image labels corresponding to the application scenes of the second image recognition model according to a preset rule when the application scenes of the second image recognition model are inconsistent with the application scenes of the first image recognition model, wherein the application scenes of the second image recognition model at least comprise: classifying the images; and training by using the processed third training data set to obtain the second image recognition model.

Optionally, obtaining a first training data set comprises: acquiring the labeled data; screening the labeled data to obtain positive objects in the labeled data; and taking the positive objects in the labeling data as the first training data set.

Optionally, the obtaining the annotation data includes: collecting an original sample; scanning the original samples to obtain a predetermined number of training samples; sending the training samples to one or more labeling terminals, wherein labeling operation is performed on the training samples at the one or more labeling terminals; and acquiring the feedback marking data of the one or more marking terminals.

Optionally, the obtaining the annotation data fed back by the one or more annotation terminals includes: sending the annotation data fed back by the one or more annotation terminals to a review terminal, wherein the review terminal executes review operation on the annotation data fed back by the one or more annotation terminals; annotation data reviewed by the reviewing terminal is obtained.

Optionally, after the second image recognition model is obtained through training, the method for generating an image recognition model further includes: acquiring verification data; verifying the second image identification model by using the verification data to obtain a verification result; judging whether the second image recognition model meets a preset condition or not based on the verification result to obtain a judgment result; stopping training when the judgment result shows that the second image recognition model meets a preset condition; and when the judgment result shows that the second image recognition model does not meet the preset condition, continuing training the second image recognition model until the second image recognition model meets the preset condition after verification.

According to another aspect of the embodiments of the present invention, there is provided an apparatus for generating an image recognition model, including: the system comprises a first acquisition module, a second acquisition module and a third acquisition module, wherein the first acquisition module is used for acquiring a first training data set, and the first training data set comprises positive objects in labeling data; a second obtaining module, configured to train an initial network model with the first training data set to obtain a first image recognition model, where the first image recognition model is obtained by using supervised learning, and an application scenario of the first image recognition model includes at least one of: target detection, image segmentation and image classification; a third obtaining module, configured to process unlabeled data in a second training data set by using the first image recognition model to obtain a category of the unlabeled data in the second training data set, and screen an object whose category is positive in the unlabeled data in the second training data set, and reset the object whose category is negative to obtain a third training data set, where the second training data set includes unlabeled data in a negative diagnosis result; a modification module, configured to perform semi-supervised learning training on the initial network model by using the annotation data in the first training data set and the data in the third training data set, and modify a data tag in the third training data set to an image tag corresponding to an application scene of the second image recognition model according to a preset rule when the application scene of the second image recognition model is inconsistent with the application scene of the first image recognition model, where an application scene of the second image recognition model at least includes: classifying the images; and the fourth acquisition module is used for training by using the processed third training data set to obtain the second image recognition model.

Optionally, the first obtaining module includes: the first obtaining sub-module is used for obtaining the marking data; the second obtaining submodule is used for screening the labeled data to obtain a positive object in the labeled data; a determining unit, configured to use a positive object in the annotation data as the first training data set.

Optionally, the first obtaining sub-module includes: the acquisition unit is used for acquiring an original sample; the scanning unit is used for scanning the original samples to obtain a preset number of training samples; a sending unit, configured to send the training sample to one or more labeling terminals, where the training sample is subjected to a labeling operation at the one or more labeling terminals; and the acquisition unit is used for acquiring the feedback marking data of the one or more marking terminals.

Optionally, the obtaining unit includes: a sending subunit, configured to send the annotation data fed back by the one or more annotation terminals to a review terminal, where the review terminal performs a review operation on the annotation data fed back by the one or more annotation terminals; and the acquisition subunit is used for acquiring the annotation data reviewed by the review terminal.

Optionally, the apparatus further comprises: the fifth acquisition module is used for acquiring verification data after the second image recognition model is obtained through training; the verification module is used for verifying the second image identification model by using the verification data to obtain a verification result; a sixth obtaining module, configured to determine whether the second image recognition model meets a predetermined condition based on the verification result, so as to obtain a determination result; the stopping module is used for stopping training when the judgment result shows that the second image recognition model meets the preset condition; and the training module is used for continuing to train the second image recognition model when the judgment result shows that the second image recognition model does not meet the preset condition until the second image recognition model meets the preset condition after verification.

According to another aspect of the embodiments of the present invention, there is also provided a computer-readable storage medium, which includes a stored computer program, wherein when the computer program is executed by a processor, the computer-readable storage medium controls an apparatus to execute the method for generating an image recognition model according to any one of the above descriptions.

According to another aspect of the embodiments of the present invention, there is also provided a processor for executing a computer program, where the computer program executes to execute the method for generating an image recognition model according to any one of the above.

In the embodiment of the invention, a first training data set is obtained, wherein the first training data set comprises positive objects in the labeling data; training an initial network model by using a first training data set to obtain a first image recognition model, wherein the first image recognition model is obtained by using a supervised learning mode, and an application scene of the first image recognition model comprises at least one of the following: target detection, image segmentation and image classification; processing the unlabelled data in the second training data set by using the first image recognition model to obtain the class of the unlabelled data in the second training data set, screening to obtain an object with a positive class in the unlabelled data in the second training data set, and resetting the object with a negative class to obtain a third training data set, wherein the second training data set comprises the unlabelled data in the negative diagnosis result; performing semi-supervised learning training on the initial network model by using the annotation data in the first training data set and the data in the third training data set, and modifying the data labels in the third training data set into image labels corresponding to the application scenes of the second image recognition model according to a preset rule when the application scenes of the second image recognition model are inconsistent with the application scenes of the first image recognition model, wherein the application scenes of the second image recognition model at least comprise: classifying the images; and training by using the processed third training data set to obtain a second image recognition model. By the image recognition model generation method provided by the embodiment of the invention, the training data of the training image recognition model is expanded, the purpose of taking the false positive data as the training data is achieved, the technical effect of improving the accuracy of the image recognition model is achieved, and the technical problems that the TCT slide information analysis mode adopted in the related technology is limited and relatively comprehensive information cannot be quickly obtained are solved.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the invention without limiting the invention. In the drawings:

FIG. 1 is a flow diagram of a method of generating an image recognition model according to an embodiment of the invention;

FIG. 2 is a schematic illustration of a segmentation of a training sample according to an embodiment of the present invention;

FIG. 3 is a flow diagram of an alternative method of generating an image recognition model according to an embodiment of the present invention;

FIG. 4 is a post-processing flow diagram according to an embodiment of the invention;

fig. 5 is a schematic diagram of an apparatus for generating an image recognition model according to an embodiment of the present invention.

Detailed Description

In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

Example 1

In accordance with an embodiment of the present invention, there is provided a method embodiment of a method for generating an image recognition model, it being noted that the steps illustrated in the flowchart of the drawings may be performed in a computer system such as a set of computer-executable instructions and that while a logical order is illustrated in the flowchart, in some cases the steps illustrated or described may be performed in an order different than here.

Fig. 1 is a flowchart of a method for generating an image recognition model according to an embodiment of the present invention, as shown in fig. 1, the method for generating an image recognition model includes the following steps:

step S101, a first training data set is obtained, wherein the first training data set comprises positive objects in the annotation data.

In this embodiment, the annotation data is a sample that has been tagged, i.e., an image with a positive annotation result, and the sample is used to train a portion of the image recognition model.

The first training data is a sample marked with an abnormality in the samples. For example, a sample that is detected as positive. When cervical cancer screening is carried out, the image diagnosis results corresponding to all the slides are positive to the cervical cancer.

Step S102, training an initial network model by using a first training data set to obtain a first image recognition model, wherein the first image recognition model is obtained by using a supervised learning mode, and the application scene of the first image recognition model comprises at least one of the following: target detection, image segmentation and image classification.

Step S103, processing the unlabeled data in the second training data set by using the first image recognition model to obtain the classes of the unlabeled data in the second training data set, screening to obtain objects with positive classes in the unlabeled data in the second training data set, and resetting the objects with negative classes to obtain a third training data set, wherein the second training data set comprises the unlabeled data in the negative diagnosis result.

In this embodiment, unlabeled data is unlabeled samples for which it can be predicted in advance whether they are positive objects.

It should be noted that the unlabeled data are negative for cervical cancer, i.e., the slide diagnosis results corresponding to the unlabeled slide image are negative.

Since there will be positive objects in the prediction process that are mispredicted, i.e., false positives. The second training data is a false positive sample obtained by predicting the unlabeled data.

Step S104, performing semi-supervised learning training on the initial network model by using the label data in the first training data set and the data in the third training data set, and modifying the data labels in the third training data set into image labels corresponding to the application scenes of the second image recognition model according to a preset rule when the application scenes of the second image recognition model are inconsistent with the application scenes of the first image recognition model, wherein the application scenes of the second image recognition model at least comprise: and (5) classifying the images.

If the second image recognition model is a classification model, the label used for training the classification model is the category information, and the pixel region information cannot be directly used, so that the label data of the third training data needs to be processed and converted into the scene to which the second training model is applied. The same situation may also occur in a scenario where the first training model is an object detection model and the second training model is an image classification model.

And step S105, training by using the processed third training data set to obtain a second image recognition model.

In this embodiment, the training data for training the second image recognition model includes positive objects and false positive objects, so that the image recognition model trained with higher accuracy is obtained.

As can be seen from the above, in the embodiment of the present invention, a first training data set may be obtained, where the first training data set includes positive objects in the annotation data; training an initial network model by using a first training data set to obtain a first image recognition model, wherein the first image recognition model is obtained by using a supervised learning mode, and an application scene of the first image recognition model comprises at least one of the following: target detection, image segmentation and image classification; processing the unlabelled data in the second training data set by using the first image recognition model to obtain the class of the unlabelled data in the second training data set, screening to obtain an object with a positive class in the unlabelled data in the second training data set, and resetting the object with a negative class to obtain a third training data set, wherein the second training data set comprises the unlabelled data in the negative diagnosis result; performing semi-supervised learning training on the initial network model by using the label data in the first training data set and the data in the third training data set, and modifying the data labels in the third training data set into image labels corresponding to the application scenes of the second image recognition model according to a preset rule when the application scenes of the second image recognition model are inconsistent with the application scenes of the first image recognition model, wherein the application scenes of the second image recognition model at least comprise: classifying the images; and training by using the processed third training data set to obtain a second image recognition model, and then recognizing the target sample by using the image recognition model to determine whether the target sample is positive, so that the training data of the training image recognition model is expanded, the false positive data is also used as the training data, and the technical effect of improving the accuracy of the image recognition model is achieved.

Therefore, the generation method of the image recognition model provided by the embodiment of the invention solves the technical problems that the TCT slide information analysis mode adopted in the related technology has limitation and relatively comprehensive information cannot be quickly obtained.

In the above step S101, a first training data set is obtained, which includes: acquiring label data; screening the labeled data to obtain positive objects in the labeled data; and taking the positive objects in the labeling data as a first training data set.

In this embodiment, the annotation data is first obtained, and then the annotation data is filtered, so that the first training data set can be obtained from the positive objects in the annotation data.

In the above embodiment, the obtaining of the annotation data includes: collecting an original sample; scanning the original samples to obtain a predetermined number of training samples; sending the training samples to one or more labeling terminals, wherein the training samples are subjected to labeling operation at the one or more labeling terminals; and acquiring the feedback marking data of one or more marking terminals.

In this embodiment, the original samples may be obtained first, then the original samples are scanned to obtain a predetermined number of training samples, and the training samples are sent to one or more labeling terminals.

In the embodiment of the invention, the data annotation can be standard by a plurality of pathologists or experts. For example, the first round may be annotated by 1 doctor, i.e. giving 10 regions of interest per slide (i.e. original specimen). The second round of labeling can be performed by 2 pathological doctors, namely, the regions of interest in the same batch of slides are labeled back to back, so that the labeling of different pathological doctors or experts on the slides can be obtained, and the objectivity is improved.

Wherein the slide may be a TCT slide.

In the above embodiment, the obtaining annotation data fed back by one or more annotation terminals includes: sending the annotation data fed back by the one or more annotation terminals to a review terminal, wherein the review operation is executed on the annotation data fed back by the one or more annotation terminals at the review terminal; annotation data reviewed by the reviewing terminal is obtained.

For example, three levels of labeling can be performed by a pathologist or expert in the third round, i.e., each slide is merged by the labeling boxes of the first and second rounds. And (4) carrying out final examination on each slide by an expert doctor in the fourth round to obtain the labeling data used in the next stage.

It should be noted that in the present embodiment, the original specimen may be a PNG image, and a test slide, for example, 38403 PNG images, and 1000 test slides. The data set here may be a third party testing laboratory, collecting a number of different regional case slides, for example, 918. 918 slides with the radius of 19 mm are scanned, the scanning objective lens multiple can be 20 times, the number of scanning layers can be 7, and finally, a slide with 300000 pixels by 300000 pixels is obtained. In order to ensure the diversity of the data set, the pathologist labels 10 (7800-10000) × (4140-10000) regional images per slide. The large area image can be cut into 2100 x 2100 PNG images uniformly in steps subject to computational constraints, and by observation 2100 x 2100 contains a relatively good number of cells. The cutting step length is specifically shown as a formula: s is 2100-m/n, m is mod (l, 2100), and n is [ l/2100 ]. In the above formula, l represents the image length, mod represents the remainder function, m represents the remainder, n represents l divided by 2100 rounded, and s represents the step size.

Fig. 2 is a schematic diagram of segmentation of a training sample according to an embodiment of the present invention, where when the segmentation is performed according to a step size, an object at an edge of an image may be cut. The presence of a certain number of cell nuclei and not the presence of the whole cell center was analyzed. It is not a good way to use the nucleus as a means of determining whether the whole cell is cut. In the embodiment of the invention, the target cell is processed by adopting the principle of judging whether the center of the target cell is in the image, namely, the image in which the center of the cell frame is located is the only marked image.

Finally, after cutting the digital image of the slide, 38403 images were obtained. In order to ensure that each slide contains a training set and a verification set, in the embodiment of the invention, each slide is divided into a certain number of images which are respectively used as the training set and the verification set, and finally 34910 images are used as the training set and 3493 images are used as the verification set.

Fig. 3 is a flowchart of an alternative method for generating an image recognition model according to an embodiment of the present invention, and as shown in fig. 3, firstly, TCT target detection labeling data can be obtained by labeling a TCT positive diagnostic slide, and a target detection model (i.e., a first image recognition model) can be obtained by training using the labeling data; then, obtaining a TCT negative diagnosis slide, and segmenting the TCT negative diagnosis slide to obtain unmarked negative TCT data; and reasoning the unmarked negative TCT data by using the target detection model to obtain a predicted false positive target in the unmarked data and a predicted positive target in the marked data which are used as training data together to obtain an image classification model (namely, a second image identification model).

By the method for generating the image recognition model provided by the embodiment, the cells in the digital image are recognized by a deep learning method, and the prediction result of the cell level is converted into the slide level prediction, specifically, the semi-supervised mode module and the post-processing module are used.

In general, a semi-supervised model mainly uses a target frame predicted by a target detection model as new data, and then promotes an original model by using the new data. In the semi-supervised mode in the embodiment of the invention, the target frame predicted by the target detection model is mainly used as new data of the classification model. The pattern is largely divided into three parts. The first part is to train a target detection model through a first data set, the second part is to cut negative slides, then transmit the negative slides into the detection model, combine the detection target with positive cells in the data set to make a new classification data set, and the third part is to train a new classification network according to the new classification data set.

In addition, three commonly used two-stage networks and one-stage network are selected for training and testing in the embodiment of the invention, wherein the networks are Mask-RCNN, Cascade-RCNN [, DetectORS and EfficientDet respectively. And multiple experiments were performed on Cascade-RCNN and Detectors. In order to enhance network robustness, in the embodiment of the invention, a data enhancement mode of allocations is added, and the allocations are a data enhancement library realized based on an OpenCV image visual library. It provides a simple interface for different image tasks for the researcher to use. Meanwhile, in order to improve the local view of the network, multi-scale image input is used in the embodiment of the invention. Because the enhancement mode influencing the image colors in the attributes image enhancement library has a large influence on the result, the method of CLAHE is preferentially selected for data enhancement in the embodiment of the invention.

In the final test, there are a large number of false positives for the prediction of cytopathological images. The unmarked pure negative slide is selected and predicted, all the detected positive slides are false positive, then a target frame of the false positive is collected and combined with an original positive cell frame to be made into a classification network data set, and finally a classification network is trained, as shown in figure 3. The classification network picks ResNeXt101 for classification. And (4) after the slide glass passes through a classification network, and then the yin-yang judgment of the slide glass is obtained by a post-inspection method.

Further, since the ultimate goal is to obtain a yin-yang determination for a single slide or even multiple slides, it is necessary to convert the predicted cellular outcome to a slide outcome. In the theory of pathological knowledge, when one or more positive cells are present in a cell slide, the slide is identified as a positive slide. The judgment condition is only suitable for human eye observation and is not suitable for deep learning under the condition of false positive. Thus, after combining physician's recommendations and a posteriori experience, the various positive subtype classes are fixedly ranked with priorities of: ASC-US < ASC-H < LSIL < HSIL. Specifically, first, all positive cell coordinates and category scores of each of 1000 slides were predicted, and then the category scores were averaged. The average value of ASC-US is used as a sorting criterion, the average values of ASC-US are sorted from small to large for 1000 slides, then the ASC-H is used as the sorting criterion, the sorted ASC-US is sorted again, and the LSIL and HSIL are sequentially subjected to the same operation by the same method. Fig. 4 is a flowchart of post-processing according to an embodiment of the present invention, and as shown in fig. 4, all target image sets of the above four categories are detected under the same slide, and the categories thereof are reasoned by the classification network again, outputting the possibility that all targets are in each category. Through a sequential sorting method, the output values of all the categories are sorted from small to large, and finally 1000 slide sorts are formed. The accuracy of the portion was judged with the number of negative slides in 1000 slides as the base and the number of quantiles in the first 60% as the dividing point.

It should be noted that, in the training process, a series of data enhancement modes commonly used in the attributes library, such as randomly changing brightness, contrast, and saturation, performing gaussian blur on the image using a random gaussian function, performing random Gamma transformation, and performing CLAHE, may be selected.

A training part: the training was performed in 12 rounds in a lump, and the learning rate was lowered in stages by 0.1 times in 8 th and 11 th rounds, respectively. The initial value of the learning rate is set to 0.005, the batch size is 2, and the RGB mean and variance of the data set are [224.889,221.787,232.288] and [48.087,50.326,32.562], and the RGB mean and variance after initial normalization are [123.675, 116.28, 103.53] and [58.395, 57.12,57.375 ]. The image size of the initial training is (1333, 800). The size of the multi scale in the detection algorithm is (1333,800), (1024), (1666, 1024), (2048, 1600), (2048 ), (2666,800), (2666,1600), (2666,2048).

As an alternative embodiment, after the second image recognition model is obtained through training, the method for generating an image recognition model further includes: acquiring verification data; verifying the second image identification model by using the verification data to obtain a verification result; judging whether the second image identification model meets a preset condition or not based on the verification result to obtain a judgment result; stopping training when the judgment result shows that the second image recognition model meets the preset condition; and when the judgment result shows that the second image recognition model does not meet the preset condition, continuing training the second image recognition model until the second image recognition model meets the preset condition after verification.

In the embodiment of the invention, aiming at the slide-grade detection, 4 judgment indexes are adopted as judgment standards, namely the precision, the recall rate and the accuracy, and the screening rate which accords with the real scene and is discussed by a pathological expert is referred toAnd (4) marking. The following formula is shown in detail: the accuracy is TP/(TP + FP), the recall rate is TP/(TP + FN), and the accuracy is (TP + TN)/(TP + FP + TN + FN), where TP represents the number of correctly predicted positives, FP represents the number of incorrectly predicted positives, FN represents the number of incorrectly predicted negatives, and TN represents the number of correctly predicted negatives. In addition, a judgment index for 1000 slides is given for the information collected from the clinical scenario. And (4) giving a sequence aiming at 1000 slides, and dividing yin-yang boundaries according to the percentage of the total number of the slides. For example, 1000 slides, containing 900 negative slides and 100 positive slides. The sorting is given by a screening negative algorithm, and if only 1 positive slide in the first 360 slides is positive, the accuracy of the first 40% negative prediction is 99.72%. The concrete formula is as follows: p is a radical of_aM/(a%. N), wherein p_aThe negative prediction accuracy of the first percent a is shown, M represents the number of negative slides with correct prediction, and N represents the total number of negative slides.

After passing through a detection network after DetectoRS, 383 negative slides are selected for prediction in the embodiment of the invention to obtain 62099 false positive pictures, and ResNet50 and ResNeXt101 are trained respectively by combining the positive marked by 84985 original data sets, and the Accuracy result is shown in table 1:

TABLE 1

Method	Accuracy
		ResNet50	92.36％
ResNeXt101	97.60％

As can be seen, resenext 101 is more accurate than resenet 50, so resenext 101 can be selected as a classified network. The Accuracy formula is shown below: accuracy is (TP + TN)/GT, where TP indicates the number of predicted correct positives, TF indicates the number of predicted correct negatives, and GT indicates the number of all labels.

In addition, the images form some target frames after passing through the target detection and classification network. However, since the image is cut in a band-overlapping manner when the image is cut, a situation in which the target frame appears repeatedly occurs may occur. Therefore, the predicted target frame needs to be subjected to IOU (interaction over intersection) deduplication, and the IOU formula is as follows:

in the formula, A, B denote two target frames, n denotes an intersection, and u denotes a union. The results after the deduplication were used to sort the slides using a post-processing method. The final tests were performed on 1000 slides verified, giving the results shown in table 2:

TABLE 2

The numerical representation of the percentage in the first row of the table indicates that the algorithm predicts that negative slides are ranked from greater to lesser likelihood of negativity, the top 25% to the top 85% of the total. The second and third rows represent the negative prediction accuracy of both methods, i.e., p_aOr the rate of screening. The table shows that the effect of the DetectoRS + ResNeXt101 scheme after the classification network is obviously higher than that of the DetectoRS scheme only performing target detection, when the screening rate is 70%, the total number of screened negative slides is 697, 0 is wrong, and the accuracy rate is 100.00%; when the screening rate is 75%, the screening rate of negative slides is 747 in total and 2 in error, and the accuracy rate is 99.73%; when the screening rate of the negative slide is 80%, the total screening rate of the negative slides is 796, 5 in error and the accuracy rate is 99.37%; when the screening rate of the negative glass is 85%, the total 846 negative glass slides are screened out, 12 negative glass slides are staggered, and the accuracy rate is 98.58%; the overall effect is higher than the scheme of only target detection.

Example 2

According to another aspect of the embodiment of the present invention, there is also provided an apparatus for generating an image recognition model, and fig. 5 is a schematic diagram of the apparatus for generating an image recognition model according to the embodiment of the present invention, as shown in fig. 5, the apparatus for generating an image recognition model includes: a first obtaining module 51, a second obtaining module 52, a third obtaining module 53, a modification module 54 and a fourth obtaining module 55. The following describes an apparatus for generating the image recognition model.

The first obtaining module 51 is configured to obtain a first training data set, where the first training data set includes positive objects in the annotation data.

The second obtaining module 52 is configured to train the initial network model by using the first training data set to obtain a first image recognition model, where the first image recognition model is obtained by using supervised learning, and an application scenario of the first image recognition model includes at least one of: target detection, image segmentation and image classification.

The third obtaining module 53 is configured to process unlabeled data in the second training data set by using the first image recognition model to obtain a category of the unlabeled data in the second training data set, and screen an object whose category is positive in the unlabeled data in the second training data set, and reset the object whose category is negative to obtain a third training data set, where the second training data set includes data that is not labeled in the negative diagnosis result.

A modifying module 54, configured to perform semi-supervised learning training on the initial network model by using the labeled data in the first training data set and the data in the third training data set, and modify the data labels in the third training data set into image labels corresponding to the application scenes of the second image recognition model according to a preset rule when the application scenes of the second image recognition model are inconsistent with the application scenes of the first image recognition model, where the application scenes of the second image recognition model at least include: and (5) classifying the images.

And a fourth obtaining module 55, configured to use the processed third training data set to train to obtain a second image recognition model.

It should be noted here that the first acquiring module 51, the second acquiring module 52, the third acquiring module 53, the modifying module 54 and the fourth acquiring module 55 correspond to steps S101 to S105 in embodiment 1, and the modules are the same as the corresponding steps in implementation examples and application scenarios, but are not limited to what is disclosed in embodiment 1. It should be noted that the modules described above as part of the apparatus may be implemented in a computer system such as a set of computer executable instructions.

As can be seen from the above, in the embodiment of the present invention, first, the first obtaining module obtains the first training data set, where the first training data set includes the positive objects in the labeling data; then, a second acquisition module is used for training the initial network model by using a first training data set to obtain a first image recognition model, wherein the first image recognition model is obtained by using a supervised learning mode, and the application scene of the first image recognition model comprises at least one of the following: target detection, image segmentation and image classification; then, a third acquisition module is used for processing the unlabeled data in the second training data set by using the first image recognition model to obtain the classes of the unlabeled data in the second training data set, and objects with positive classes in the unlabeled data in the second training data set are obtained through screening and are reset to be negative to obtain a third training data set, wherein the second training data set comprises the unlabeled data in the negative diagnosis result; and then using a modification module to perform semi-supervised learning training on the initial network model by using the annotation data in the first training data set and the data in the third training data set, and modifying the data labels in the third training data set into image labels corresponding to the application scenes of the second image recognition model according to a preset rule when the application scenes of the second image recognition model are inconsistent with the application scenes of the first image recognition model, wherein the application scenes of the second image recognition model at least comprise: classifying the images; and training by using the fourth acquisition module and the processed third training data set to obtain a second image recognition model. By the image recognition model generation device provided by the embodiment of the invention, the training data of the training image recognition model is expanded, the purpose of taking the false positive data as the training data is achieved, the technical effect of improving the accuracy of the image recognition model is achieved, and the technical problems that the TCT slide information analysis mode adopted in the related technology is limited and relatively comprehensive information cannot be quickly obtained are solved.

Optionally, the first obtaining sub-module includes: the acquisition unit is used for acquiring an original sample; the scanning unit is used for scanning the original samples to obtain a predetermined number of training samples; the training sample processing device comprises a sending unit, a processing unit and a processing unit, wherein the sending unit is used for sending a training sample to one or more labeling terminals, and the training sample is subjected to labeling operation at the one or more labeling terminals; and the acquisition unit is used for acquiring the feedback annotation data of one or more annotation terminals.

Optionally, the obtaining unit includes: the sending subunit is configured to send the annotation data fed back by the one or more annotation terminals to the review terminal, where the review operation is performed on the annotation data fed back by the one or more annotation terminals at the review terminal; and the acquisition subunit is used for acquiring the annotation data reviewed by the review terminal.

Optionally, the apparatus further comprises: the fifth acquisition module is used for acquiring verification data after the second image recognition model is obtained through training; the verification module is used for verifying the second image identification model by using the verification data to obtain a verification result; the sixth obtaining module is used for judging whether the second image recognition model meets the preset condition or not based on the verification result to obtain a judgment result; the stopping module is used for stopping training when the judgment result shows that the second image recognition model meets the preset condition; and the training module is used for continuing training the second image recognition model when the judgment result shows that the second image recognition model does not meet the preset condition until the second image recognition model meets the preset condition after verification.

Example 3

According to another aspect of the embodiments of the present invention, there is also provided a computer-readable storage medium including a stored computer program, wherein when the computer program is executed by a processor, the apparatus in which the computer-readable storage medium is located is controlled to execute the method for generating the image recognition model according to any one of the above.

Example 4

According to another aspect of the embodiments of the present invention, there is also provided a processor, configured to execute a computer program, where the computer program executes the method for generating the image recognition model according to any one of the above aspects.

The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.

In the above embodiments of the present invention, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.

In the embodiments provided in the present application, it should be understood that the disclosed technology can be implemented in other ways. The above-described apparatus embodiments are merely illustrative, and for example, the division of the units may be a logical division, and in actual implementation, there may be another division, for example, multiple units or components may be combined or may be integrated into another system, or some features may be omitted, or may not be executed.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic or optical disk, and other various media capable of storing program codes.

The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.

Claims

1. A method for generating an image recognition model, comprising:

acquiring a first training data set, wherein the first training data set consists of positive objects in the labeling data;

training an initial network model by using the first training data set to obtain a first image recognition model, wherein the first image recognition model is obtained by using a supervised learning mode, and an application scene of the first image recognition model comprises at least one of the following: target detection, image segmentation and image classification;

processing unlabelled data in a second training data set by using the first image recognition model to obtain classes of the unlabelled data in the second training data set, screening to obtain objects with positive classes in the unlabelled data in the second training data set, and resetting the objects with negative classes to obtain a third training data set, wherein the second training data set comprises false positive samples obtained by predicting the unlabelled data;

performing semi-supervised learning training on the initial network model by using the labeling data in the first training data set and the data in the third training data set, and modifying the data labels in the third training data set into image labels corresponding to the application scenes of the second image recognition model according to a preset rule when the application scenes of the second image recognition model are inconsistent with the application scenes of the first image recognition model, wherein the application scenes of the second image recognition model at least comprise: classifying the images;

and training by using the processed third training data set to obtain the second image recognition model.

2. The method of claim 1, wherein obtaining a first training data set comprises:

acquiring the labeled data;

screening the labeled data to obtain positive objects in the labeled data;

and taking the positive objects in the labeling data as the first training data set.

3. The method of claim 2, wherein obtaining the annotation data comprises:

collecting an original sample;

scanning the original samples to obtain a predetermined number of training samples;

sending the training samples to one or more labeling terminals, wherein labeling operation is performed on the training samples at the one or more labeling terminals;

and acquiring the feedback marking data of the one or more marking terminals.

4. The method of claim 3, wherein obtaining the annotation data fed back by the one or more annotation terminals comprises:

sending the annotation data fed back by the one or more annotation terminals to a review terminal, wherein the review terminal performs review operation on the annotation data fed back by the one or more annotation terminals;

and acquiring the annotation data reviewed by the review terminal.

5. The method of any of claims 1 to 4, wherein after training the second image recognition model, the method further comprises:

acquiring verification data;

verifying the second image recognition model by using the verification data to obtain a verification result;

judging whether the second image recognition model meets a preset condition or not based on the verification result to obtain a judgment result;

stopping training when the judgment result shows that the second image recognition model meets a preset condition;

and when the judgment result shows that the second image recognition model does not meet the preset condition, continuing training the second image recognition model until the second image recognition model meets the preset condition after verification.

6. An apparatus for generating an image recognition model, comprising:

the system comprises a first acquisition module, a first storage module and a first processing module, wherein the first acquisition module is used for acquiring a first training data set, and the first training data set consists of positive objects in annotation data;

a second obtaining module, configured to train an initial network model with the first training data set to obtain a first image recognition model, where the first image recognition model is obtained by using supervised learning, and an application scenario of the first image recognition model includes at least one of: target detection, image segmentation and image classification;

a third obtaining module, configured to process unlabeled data in a second training data set by using the first image recognition model to obtain a category of the unlabeled data in the second training data set, and screen an object whose category is positive in the unlabeled data in the second training data set, and reset the object whose category is negative to obtain a third training data set, where the second training data set includes a false positive sample obtained by predicting the unlabeled data;

a modification module, configured to perform semi-supervised learning training on the initial network model by using the annotation data in the first training data set and the data in the third training data set, and modify a data tag in the third training data set to an image tag corresponding to an application scene of the second image recognition model according to a preset rule when the application scene of the second image recognition model is inconsistent with the application scene of the first image recognition model, where an application scene of the second image recognition model at least includes: classifying the images;

and the fourth acquisition module is used for training by using the processed third training data set to obtain the second image recognition model.

7. The apparatus of claim 6, wherein the first obtaining module comprises:

the first obtaining sub-module is used for obtaining the marking data;

the second obtaining submodule is used for screening the labeled data to obtain a positive object in the labeled data;

a determining unit, configured to use a positive object in the annotation data as the first training data set.

8. The apparatus of claim 7, wherein the first obtaining sub-module comprises:

the acquisition unit is used for acquiring an original sample;

the scanning unit is used for scanning the original samples to obtain a preset number of training samples;

a sending unit, configured to send the training sample to one or more labeling terminals, where labeling operations are performed on the training sample at the one or more labeling terminals;

and the acquisition unit is used for acquiring the feedback marking data of the one or more marking terminals.

9. A computer-readable storage medium, comprising a stored computer program, wherein when the computer program is executed by a processor, the computer-readable storage medium controls an apparatus to perform the method for generating an image recognition model according to any one of claims 1 to 5.

10. A processor for executing a computer program, wherein the computer program executes the method for generating an image recognition model according to any one of claims 1 to 5.