CN115130543A

CN115130543A - Image recognition method and device, storage medium and electronic equipment

Info

Publication number: CN115130543A
Application number: CN202210468883.3A
Authority: CN
Inventors: 叶虎; 韩骁; 蔡德; 肖凯文; 马兆轩; 周彦宁
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2022-04-29
Filing date: 2022-04-29
Publication date: 2022-09-30
Anticipated expiration: 2042-04-29
Also published as: CN115130543B

Abstract

The application discloses an image recognition method and device, a storage medium and electronic equipment, which can be applied to the field of image processing. Wherein, the method comprises the following steps: acquiring a target image; inputting the target image into a target neural network model, determining M suspicious visual field images from the target image through a target visual field classification model in the target neural network model, and predicting the M suspicious visual field images through a target full-slice classification model in the target neural network model to obtain a predicted image tag of the target image. The method and the device solve the technical problem of low image recognition accuracy.

Description

Image recognition method and device, storage medium and electronic equipment

Technical Field

The present application relates to the field of computers, and in particular, to an image recognition method and apparatus, a storage medium, and an electronic device.

Background

In the related art, the quality of a product needs to be detected before the product leaves a factory, and flaw detection is an important detection link. For example, defects in the appearance of the product (e.g., scratches, speckles, etc.) or the detection of defective spots on the display of the electronic product, etc. The defects are small, the conventional detection method is manual detection, but the manual detection has certain errors and has the problem of low accuracy. And manual detection is inefficient.

In the related art, cancer cells seriously threaten the health of human beings. In the field of cancer cell detection, in the prior art, a professional doctor performs pathological analysis on a full-field digital section of a patient to determine whether the patient has cancer, and the method depends on the professional doctor, so that the detection efficiency is low.

With the development of image recognition technology, image detection can be applied to various fields. But for minor imperfections in the product, and for small cancer cells on full field digital slices, the image recognition accuracy is low.

In view of the above problems, no effective solution has been proposed.

Disclosure of Invention

The embodiment of the application provides an image identification method and device, a storage medium and electronic equipment, and aims to at least solve the technical problem of low image identification accuracy.

According to an aspect of an embodiment of the present application, there is provided an image recognition method including: acquiring a target image, wherein the target image is an image to be identified obtained by scanning a target object; inputting the target image into a target neural network model, determining M suspicious visual field images from the target image through a target visual field classification model in the target neural network model, and predicting the M suspicious visual field images through a target full-slice classification model in the target neural network model to obtain predicted image labels of the target image, wherein M is greater than or equal to 1, and the M suspicious visual field images are visual field images with predicted suspicious probability greater than or equal to a preset probability threshold in the target image; the target neural network model is obtained by training a neural network model to be trained by using a training sample image and a training visual field image until the following convergence condition is met: and a first loss condition is satisfied between a known image label of the training sample image and a predicted visual field label of a target suspicious visual field image, and the target suspicious visual field image is the suspicious visual field image with the maximum predicted suspicious probability in the training sample image determined by a to-be-trained visual field classification model in the to-be-trained neural network model.

Optionally, the convergence condition further includes: a second loss condition is satisfied between a known image label of the training sample image and a predicted image label of the training sample image determined by a full-scale classification model to be trained in the neural network model to be trained, and a third loss condition is satisfied between a known visual field label of the training visual field image and a predicted visual field label of the training visual field image determined by a visual field classification model to be trained in the neural network model to be trained.

Optionally, before the inputting the target image into the target neural network model, the method further comprises: acquiring the training sample image and the training visual field image; performing multi-round combined training on the to-be-trained visual field classification model and the to-be-trained full-scale classification model in the to-be-trained neural network model through the training sample image and the training visual field image to obtain the target neural network model, wherein, in the training process, if the neural network model to be trained does not satisfy the convergence condition, parameters in the visual field classification model to be trained and the full-scale classification model to be trained are adjusted, if the neural network model to be trained satisfies the convergence condition, the training is finished, and the neural network model to be trained when the training is finished is determined as the target neural network model, and respectively determining the visual field classification model to be trained and the full-scale classification model to be trained when the training is finished as the target visual field classification model and the target full-scale classification model in the target neural network model.

Optionally, the performing, by using the training sample image and the training visual field image, multiple rounds of joint training on the to-be-trained visual field classification model and the to-be-trained full-scale classification model in the to-be-trained neural network model includes: performing ith round of joint training on the to-be-trained visual field classification model and the to-be-trained full-scale classification model in the to-be-trained neural network model, wherein i is a positive integer greater than or equal to 1, and the visual field classification model and the full-scale classification model obtained by the 0 th round of training are the to-be-trained visual field classification model and the to-be-trained full-scale classification model which are not trained in the to-be-trained neural network model, and the method comprises the following steps of: inputting the training visual field image into the visual field classification model obtained by the i-1 th round of training to obtain a prediction visual field label of the training visual field image determined by the i-th round of training; inputting the training sample image into the visual field classification model obtained by the i-1 th round of training and the full-scale classification model obtained by the i-1 th round of training to obtain a predicted visual field label of the target suspicious visual field image determined by the i-th round of training and a predicted image label of the training sample image determined by the i-th round of training; and when the predicted view label of the training view image determined by the ith round of training, the predicted view label of the target suspicious view image determined by the ith round of training and the predicted view label of the training sample image determined by the ith round of training meet the convergence condition, ending the training, and respectively determining the view classification model and the full-scale classification model obtained by the i-1 round of training as the target view classification model and the target full-scale classification model in the target neural network model.

Optionally, the inputting the training sample image into the visual field classification model obtained by the i-1 th round of training and the full-scale classification model obtained by the i-1 th round of training to obtain the predicted visual field label of the target suspicious visual field image determined by the i-1 th round of training and the predicted image label of the training sample image determined by the i-1 th round of training includes: inputting the training sample images into a visual field classification model obtained by the (i-1) th round of training to obtain a predicted visual field label of the target suspicious visual field image determined by the (i) th round of training and S suspicious training visual field images determined by the (i) th round of training, wherein S is greater than or equal to 1, the S suspicious training visual field images are visual field images with predicted suspicious probability greater than or equal to the preset probability threshold in the training sample images, and the target suspicious visual field image is a training visual field image with the maximum predicted suspicious probability in the training sample images; and inputting the S suspicious training visual field images into the full-slice classification model obtained by the i-1 th round of training to obtain predicted image labels of the training sample images determined by the i-th round of training.

Optionally, the inputting the training sample images into the visual field classification model obtained by the i-1 th round of training, obtaining the predicted visual field label of the target suspicious visual field image determined by the i-th round of training, and S suspicious training visual field images determined by the i-th round of training includes: inputting the training sample image into the visual field classification model obtained by the i-1 th round of training, and segmenting the training sample image into N training visual field images through the visual field classification model obtained by the i-1 th round of training; determining training visual field images with the predicted suspicious probability greater than or equal to the preset probability threshold value from the N training visual field images through the visual field classification model obtained in the (i-1) th round of training, and obtaining S suspicious training visual field images determined by the ith round of training, wherein N is greater than or equal to S; and determining the training view image with the maximum predicted suspicious probability in the N training view images as the target suspicious view image through the view classification model obtained by the (i-1) th round of training, and determining a label corresponding to the predicted suspicious probability of the target suspicious view image as the predicted view label of the target suspicious view image.

Optionally, in the ith round of joint training, the method further comprises: inputting the predicted suspicious probability corresponding to the predicted visual field label of the target suspicious visual field image determined by the ith round of training and the known probability corresponding to the known image label of the training sample image into a first loss function to obtain a first loss value; inputting the predicted suspicious probability corresponding to the predicted image label of the training sample image determined by the ith round of training and the known probability corresponding to the known image label of the training sample image into a second loss function to obtain a second loss value; inputting the predicted suspicious probability corresponding to the predicted visual field label of the training visual field image determined by the ith round of training and the known probability corresponding to the known visual field label of the training visual field image into a third loss function to obtain a third loss value; determining whether the first loss value satisfies the first loss condition, whether the second loss value satisfies the second loss condition, and whether the third loss value satisfies the third loss condition; determining that the predicted view label of the training view image determined by the i-th round of training, the predicted view label of the target suspicious view image determined by the i-th round of training, and the predicted image label of the training sample image determined by the i-th round of training satisfy the convergence condition, in a case where it is determined that the first loss value satisfies the first loss condition, the second loss value satisfies the second loss condition, and the third loss value satisfies the third loss condition.

Optionally, the predicting, by a target full-scale classification model in the target neural network model, the M suspicious view images to obtain predicted image tags of the target image includes: performing feature extraction on the M suspicious visual field images through the target visual field classification model in the target neural network model to obtain M feature vectors; and inputting the feature average vectors of the M feature vectors into a classifier in the target full-scale classification model, and classifying the feature average vectors through the classifier to obtain a predicted image tag of the target image.

Optionally, the determining M suspicious visual field images from the target image through a target visual field classification model in the target neural network model includes: segmenting the target image into N view images by the target view classification model in the target neural network model, wherein N is greater than or equal to M; and determining the M suspicious visual field images from the N visual field images through the target visual field classification model, wherein the M suspicious visual field images are the visual field images of which the predicted suspicious probability is greater than or equal to the preset probability threshold in the N visual field images.

According to another aspect of the embodiments of the present application, there is also provided an image recognition apparatus including: the device comprises an acquisition module, a recognition module and a processing module, wherein the acquisition module is used for acquiring a target image, and the target image is an image to be recognized obtained by scanning a target object; the input module is used for inputting the target image into a target neural network model, determining M suspicious visual field images from the target image through a target visual field classification model in the target neural network model, and predicting the M suspicious visual field images through a target full-wafer classification model in the target neural network model to obtain a predicted image tag of the target image, wherein M is greater than or equal to 1, and the M suspicious visual field images are visual field images of which the predicted suspicious probability is greater than or equal to a preset probability threshold in the target image; the target neural network model is obtained by training a to-be-trained neural network model by using a training sample image and a training visual field image until the following convergence condition is met: and a first loss condition is satisfied between a known image label of the training sample image and a predicted visual field label of a target suspicious visual field image, and the target suspicious visual field image is a suspicious visual field image with the maximum predicted suspicious probability in the training sample image determined by a visual field classification model to be trained in the neural network model to be trained.

According to another aspect of the embodiments of the present application, there is also provided a computer-readable storage medium, in which a computer program is stored, wherein the computer program is configured to execute the above-mentioned image recognition method when running.

According to yet another aspect of embodiments herein, there is provided a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions, causing the computer device to perform the image recognition method as above.

According to still another aspect of an embodiment of the present application, there is provided an electronic device including a memory and a processor, the memory having a computer program stored therein, the processor being configured to execute the image recognition method described above through the computer program.

In the embodiment of the application, the target neural network model includes a target visual field classification model and a target full-scale classification model, and the target visual field model can determine M suspicious visual field images with predicted suspicious probabilities greater than or equal to a preset probability threshold in a target image to be recognized, so that the visual fields of flaws or cancer cells on the images can be determined in the target image. Furthermore, M suspicious view images are predicted through the target full-slice classification model, and predicted image tags of the target image can be obtained. The accuracy of image recognition can be improved.

In addition, in the training process of the target neural network model, the suspicious visual field image with the maximum predicted suspicious probability in the training sample images determined by the visual field classification model to be trained and the known label of the training sample image meet a first loss condition, so that the constraint loss is increased for the visual field classification model to be trained, and the image recognition efficiency of the target neural network model can be improved. And then the technical problem that the image recognition accuracy is low is solved.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:

FIG. 1 is a schematic diagram of an application environment of an alternative image recognition method according to an embodiment of the application;

FIG. 2 is a schematic flow chart diagram of an alternative image recognition method according to an embodiment of the present application;

FIG. 3 is a schematic diagram of an alternative target neural network model structure according to an embodiment of the present application;

FIG. 4 is a schematic illustration of an alternative target image according to an embodiment of the present application;

FIG. 5 is a flow chart illustrating an alternative training process for a neural network model to be trained according to an embodiment of the present application;

FIG. 6 is a schematic illustration of positive and negative fields of yet another alternative cancer cell according to an embodiment of the present application;

FIG. 7 is a schematic flow chart diagram according to an embodiment of the present application;

FIG. 8 is a schematic diagram of an alternative overall structure according to an embodiment of the present application;

FIG. 9 is a schematic diagram of an alternative image recognition apparatus according to an embodiment of the present application;

FIG. 10 is a block diagram of a computer system architecture for an alternative electronic device according to an embodiment of the present application;

fig. 11 is a schematic structural diagram of an alternative electronic device according to an embodiment of the present application.

Detailed Description

In order to make the technical solutions better understood by those skilled in the art, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only partial embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

It should be noted that the terms "first," "second," and the like in the description and claims of this application and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the application described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

The embodiment of the invention can be applied to various scenes such as cloud technology, artificial intelligence and the like.

First, partial nouns or terms appearing in the description of the embodiments of the present application are applicable to the following explanations:

full field digital section: (WSI for short), one pathological section is digitalized to form one WSI.

Negative and positive: negative in a medical examination generally means normal, and positive means problematic. Negative and positive are used more medically and have become a term that generally refers to the presence or absence or to the results of an examination.

The present application is illustrated below with reference to examples:

according to an aspect of the embodiments of the present invention, there is provided an image recognition method, optionally, as an optional implementation manner, the image recognition method may be but not limited to be applied in an application environment as shown in fig. 1, where the application environment may include: terminal device 102, network 110, and server 112.

Optionally, in this embodiment, the terminal device may include, but is not limited to, at least one of the following: mobile phones (such as Android phones, iOS phones, etc.), notebook computers, tablet computers, palm computers, MID (Mobile Internet Devices), PAD, desktop computers, smart televisions, medical Devices, etc. The terminal device may be a configured target client, which may be a game client, an instant messaging client, a browser client, a video client, a shopping client, etc. In this embodiment, the terminal device may include but is not limited to: memory 104, processor 106, and display 108. The memory 104 may be used for storing data, for example, an image to be recognized obtained by scanning the target object. The processor may be configured to input the target image into the target neural network model. The display 108 may be used to display the target image, as well as to display a predictive image tag of the target image.

Optionally, the network 110 may include, but is not limited to: a wired network, a wireless network, wherein the wired network comprises: a local area network, a metropolitan area network, and a wide area network, the wireless network comprising: bluetooth, WIFI, and other networks that enable wireless communication.

Alternatively, the server 112 may be a single server, a server cluster composed of a plurality of servers, or a cloud server. The server 112 may be, but is not limited to: a database 114 and a processing engine 116. The database 114 may be used to store data, such as the target image. The processing engine may be configured to perform the steps of:

step S12, inputting the target image into the target neural network model, determining M suspicious visual field images from the target image through the target visual field classification model in the target neural network model, and predicting the M suspicious visual field images through the target full-scale classification model in the target neural network model to obtain a predicted image label of the target image. The above is only an example, and this is not limited in this embodiment.

Optionally, as an optional implementation manner, as shown in fig. 2, the image recognition method includes:

step S202, acquiring a target image, wherein the target image is an image to be identified obtained by scanning a target object;

the target object may be a product to be detected, such as an electronic product, a household product, and the like, and the target image is an image captured by an image capturing device (e.g., a camera) on the product to be detected.

The target object may also be a body part of an animal or a human, such as a diseased part of a patient. The target image is a full-field digital slice obtained by scanning the affected part with a medical device.

Step S204, inputting the target image into a target neural network model, determining M suspicious visual field images from the target image through a target visual field classification model in the target neural network model, and predicting the M suspicious visual field images through a target full-scale classification model in the target neural network model to obtain a predicted image tag of the target image, wherein M is greater than or equal to 1, and the M suspicious visual field images are visual field images of which the predicted suspicious probability is greater than or equal to a preset probability threshold in the target image;

the target neural network model is obtained by training a to-be-trained neural network model by using a training sample image and a training visual field image until the following convergence condition is met: and a first loss condition is satisfied between a known image label of the training sample image and a predicted visual field label of a target suspicious visual field image, and the target suspicious visual field image is the suspicious visual field image with the maximum predicted suspicious probability in the training sample image determined by a to-be-trained visual field classification model in the to-be-trained neural network model.

The target neural network model structure shown in fig. 3 includes a target visual field classification model and a target full-scale classification model. Inputting the target image into a target view classification model, outputting M suspicious view images through the target view classification model, inputting the M suspicious view images into a target full-scale classification model, and outputting a predicted image tag of the target image through the target full-scale classification model. The predictive image tag may be used to indicate the classification type of the target image. For example, in a product defect detection scenario, the predictive image tag can be used to indicate whether a defect exists in the target image, and can be represented by 0 and 1, where 0 indicates no defect and 1 indicates a defect. In the cancer cell detection scenario, the predictive image tag can be used to indicate whether cancer cells are present in the target image, 0 can be used to indicate that no cancer cells are present, 1 indicates that cancer cells are present, or alternatively 0 can be used to indicate negative and 1 indicates positive.

The target image may be segmented into a plurality of field-of-view images by the target field-of-view classification model, such as the target image shown in fig. 4, the target image may be segmented into N field-of-view images by the target field-of-view classification model, and M suspicious field-of-view images may be determined among the N field-of-view images by the target field-of-view classification model.

In a product defect detection scenario, the M suspicious visual field images are used to indicate that there is a possible defect in the suspicious visual field images, the probability of the existence of the defect in the suspicious visual field images can be indicated by a predicted suspicious probability, and the probability value can be indicated by a value of 0 to 1, where a higher probability value indicates a higher probability of the existence of the defect in the suspicious visual field images.

In a cancer cell scene, the M suspicious visual field images are used to indicate that cancer cells may exist in the suspicious visual field images, the probability that cancer cells exist in the suspicious visual field images may be indicated by a predicted suspicious probability, and the probability may be indicated by a value of 0 to 1, where a higher probability value indicates a higher probability that cancer cells exist in the suspicious visual field images.

Optionally, the convergence condition further includes: and a second loss condition is met between the known image label of the training sample image and the predicted image label of the training sample image determined by the full-scale classification model to be trained in the neural network model to be trained, and a third loss condition is met between the known visual field label of the training visual field image and the predicted visual field label of the training visual field image determined by the visual field classification model to be trained in the neural network model to be trained.

As an optional implementation, the neural network model to be trained is an untrained neural network model and a neural network model in a training process. In the training process, the neural network model to be trained can be constrained by the following three loss conditions:

the first loss condition includes: the predicted view label of the suspicious view image with the highest predicted suspicious probability in the training sample image is consistent with the known image label of the training sample image. The visual field image with the maximum predicted suspicious probability determined in the training sample image through the visual field classification model to be trained is called a target suspicious visual field image. Supposing that the predicted suspicious probability of the target suspicious visual field image is 0.8, the suspicious probability of the visual field image with the largest predicted suspicious probability in the training sample image is 0.8, and the predicted suspicious probabilities of other visual field images are all less than or equal to 0.8.

In a product defect detection scene, supposing that a suspicious visual field image with a predicted suspicious probability greater than or equal to 0.5 represents that a defect exists in the suspicious visual field image, it can be further determined that the defect exists in a training sample image where the suspicious visual field image is located. Then, the target suspicious view image with the predicted suspicious probability of 0.8(0.8 is only used for illustrating the embodiment, and the specific numerical value may be determined according to actual conditions) indicates that a flaw exists in the target suspicious view image, that is, a flaw exists in a training sample image where the target suspicious view image is located. Since the known image label of the training sample image is used to indicate whether a flaw actually exists in the training sample image (the known image label is 1 when the flaw exists, and the known image label is 0 when the flaw does not exist), that is, whether the flaw exists in the training sample image is known, by the first loss condition, a constraint loss can be added to the training of the visual field classification model to be trained, so as to improve the recognition accuracy of the target visual field classification model in the target neural network model on the image to be recognized.

In a cancer cell detection scene, if a visual field image with a predicted suspicious probability greater than or equal to 0.5 is assumed to indicate that cancer cells exist in the visual field image, it can be determined that cancer cells exist in a training sample image in which the visual field image is located. Then the target suspicious visual field image with the predicted suspicious probability of 0.8 indicates that cancer cells exist in the target suspicious visual field image, that is, cancer cells exist in the training sample image where the target suspicious visual field image is located. Since the known image label of the training sample image is used to indicate whether a cancer cell actually exists in the training sample image (the known image label is 1 in the case of the existence of the cancer cell, and the known image label is 0 in the case of the absence of the cancer cell), that is, whether the cancer cell exists in the training sample image is known, by using the first loss condition, a constraint loss can be added to the training of the to-be-trained visual field classification model, so as to improve the recognition accuracy of the to-be-recognized image by the target visual field classification model in the target neural network model.

The second loss condition includes: and the known image labels of the training sample images are consistent with the predicted image labels of the training sample images determined by the full-scale classification model to be trained in the neural network model to be trained.

In a product flaw detection scenario, a training sample image with a predicted suspicious probability greater than or equal to 0.5 is assumed to indicate that flaws exist in the training sample image. And because the known image label of the training sample image is used to indicate whether a flaw actually exists in the training sample image (the known image label is 1 when the flaw exists, and the known image label is 0 when the flaw does not exist), that is, whether the flaw exists in the training sample image is known, by the second loss condition, a constraint loss can be added to the training of the full-scale classification model to be trained, so as to improve the recognition accuracy of the target full-scale classification model to be recognized in the target neural network model.

In a cancer cell detection scenario, it is assumed that a training sample image with a predicted suspicious probability greater than or equal to 0.5 indicates that cancer cells exist in the training sample image, and since a known image tag of the training sample image is used to indicate whether cancer cells actually exist in the training sample image (the known image tag is 1 in the case of the existence of cancer cells, and the known image tag is 0 in the case of the absence of cancer cells), that is, whether cancer cells exist in the training sample image is known, through the second loss condition, a constraint loss can be added to training of a to-be-trained full-scale classification model, so as to improve the recognition accuracy of the to-be-recognized image of the to-be-recognized target full-scale classification model in the target neural network model.

The third loss condition includes: the known visual field label of the training visual field image is consistent with the predicted visual field label of the training visual field image determined by the visual field classification model to be trained in the neural network model to be trained.

In a product flaw detection scenario, a training view image with a predicted suspicious probability greater than or equal to 0.5 is assumed to indicate that flaws exist in the training view image. And because the known view label of the training view image is used to indicate whether a flaw actually exists in the training view image (the known view label is 1 when the flaw exists, and the known view label is 0 when the flaw does not exist), that is, whether the flaw exists in the training view image is known, by the third loss condition, a constraint loss can be added to the training of the to-be-trained view classification model, so as to improve the recognition accuracy of the to-be-recognized image of the target view classification model in the target neural network model.

In the cancer cell detection scenario, it is assumed that a training visual field image with a predicted suspicious probability greater than or equal to 0.5 indicates the presence of cancer cells in the training visual field image. Since the known visual field label of the training visual field image is used to indicate whether cancer cells actually exist in the training visual field image (the known visual field label is 1 in the case that cancer cells exist, and the known visual field label is 0 in the case that cancer cells do not exist), that is, whether cancer cells exist in the training visual field image is known, by the third loss condition, a constraint loss can be added to the training of the visual field classification model to be trained, so as to improve the recognition accuracy of the target visual field classification model in the target neural network model on the image to be recognized.

Optionally, before the inputting the target image into the target neural network model, the method further comprises: acquiring the training sample image and the training visual field image; performing multi-round combined training on the to-be-trained visual field classification model and the to-be-trained full-scale classification model in the to-be-trained neural network model through the training sample image and the training visual field image to obtain the target neural network model, wherein, in the training process, if the neural network model to be trained does not satisfy the convergence condition, parameters in the visual field classification model to be trained and the full-scale classification model to be trained are adjusted, if the neural network model to be trained satisfies the convergence condition, the training is finished, the neural network model to be trained when the training is finished is determined as the target neural network model, and respectively determining the visual field classification model to be trained and the full-scale classification model to be trained when the training is finished as the target visual field classification model and the target full-scale classification model in the target neural network model.

As an alternative embodiment, the image label of the training sample image is known (the known image label is 1 if there is a defect, and the known image label is 0 if there is no defect, or the known image label is 1 if there is a cancer cell, and the known image label is 0 if there is no cancer cell). The visual field label of the training visual field image is also known (the known visual field label is 1 in the case where a defect exists, and is 0 in the case where no defect exists, or the known visual field label is 1 in the case where a cancer cell exists, and is 0 in the case where no cancer cell exists).

And performing multi-round combined training on the to-be-trained visual field classification model and the to-be-trained full-scale classification model in the to-be-trained neural network model through a small number of training visual field images with known visual field labels and a small number of training sample images with known image labels. The training flowchart of the neural network model to be trained shown in fig. 5 includes the following steps:

step S500, acquiring a training sample image and a training visual field image, wherein an image label of the training sample image is known (a known image label is 1 if a defect exists, and a known image label is 0 if no defect exists, or a known image label is 1 if a cancer cell exists, and a known image label is 0 if no cancer cell exists), and a visual field label of the training visual field image is also known (a known visual field label is 1 if a defect exists, and a known visual field label is 0 if no defect exists, or a known visual field label is 1 if a cancer cell exists, and a known visual field label is 0 if no cancer cell exists);

step S501, inputting a training visual field image into a visual field classification model to be trained to obtain a prediction visual field label of the training visual field image; wherein, the visual field classification model to be trained is a neural network model. In the product flaw detection scenario, the prediction view label is used to indicate the prediction result of whether flaws exist in the training view image (the flaw prediction view label is 1, and the flaw prediction view label is 0). The prediction visual field label in the cancer cell detection scene is used for indicating a prediction result of whether cancer cells exist in the training visual field image or not (the prediction visual field label of the existence of cancer cells is 1, and the non-existence of cancer cells is 0);

step S503, determining whether the predicted view label of the training view image and the known view label of the training view image satisfy a third convergence condition, if so, executing step S514, and if not, executing step S505;

step S505, adjusting model parameters of the visual field classification model to be trained to obtain an updated visual field classification model to be trained, and continuing to execute step S501 and step S502;

step S502, inputting the training sample images into the visual field classification model to be trained to obtain S suspicious training visual field images, and predicting the target suspicious visual field image with the maximum suspicious probability and the predicted visual field label of the target suspicious visual field image. The visual field classification model to be trained can segment a training sample image into a plurality of training visual field images, determine a training visual field image with a predicted suspicious probability greater than or equal to a preset probability threshold in the plurality of training visual field images, and obtain S suspicious training visual field images, wherein S is greater than or equal to 1, and the preset probability threshold can be set according to actual conditions, for example: 0.3, 0.4, 0.5, etc. And determining the training view image with the maximum predicted suspicious probability in the plurality of training view images as a target suspicious view image, and determining the predicted view label of the target suspicious view image according to the predicted suspicious probability of the target suspicious view image. In a product flaw detection scene, if the predicted suspicious probability is greater than or equal to 0.5, the corresponding predicted view label is 1 (indicating that flaws exist in the training sample image), and if the predicted suspicious probability is less than 0.5, the corresponding predicted view label is 0 (indicating that flaws do not exist in the training sample image). In a cancer cell detection scene, if the predicted suspicious probability is greater than or equal to 0.5, the corresponding predicted visual field label is 1 (indicating that cancer cells exist in the training sample image), and if the predicted suspicious probability is less than 0.5, the corresponding predicted visual field label is 0 (indicating that no cancer cells exist in the training sample image).

Step S504, inputting S suspicious training view images into a full-sheet classification model to be trained to obtain predicted image labels of training sample images;

step S506, determining whether the predicted image label of the training sample image and the known image label of the training sample image satisfy a second convergence condition, if so, executing step S514, and if not, executing step S508;

step S508, adjusting model parameters of the full-scale classification model to be trained to obtain an updated full-scale classification model to be trained, and continuing to execute the step S502;

step S510, determining whether the predicted view label of the target suspicious view image and the predicted view label of the training sample image satisfy a first convergence condition, if so, executing step S514, and if not, executing step S512;

s512, adjusting model parameters of the visual field classification model to be trained to obtain an updated visual field classification model to be trained, and continuing to execute S501 and S502;

and step S514, finishing the training to obtain the target neural network model. The target neural network model comprises a trained target visual field classification model and a trained target full-scale classification model.

Optionally, the performing, through the training sample image and the training visual field image, multiple rounds of joint training on the to-be-trained visual field classification model and the to-be-trained full-scale classification model in the to-be-trained neural network model includes: performing ith round of joint training on the to-be-trained visual field classification model and the to-be-trained full-scale classification model in the to-be-trained neural network model, wherein i is a positive integer greater than or equal to 1, and the visual field classification model and the full-scale classification model obtained by the 0 th round of training are the to-be-trained visual field classification model and the to-be-trained full-scale classification model which are not trained in the to-be-trained neural network model, and the method comprises the following steps of: inputting the training visual field image into the visual field classification model obtained by the i-1 th round of training to obtain a prediction visual field label of the training visual field image determined by the i-th round of training; inputting the training sample image into the visual field classification model obtained by the i-1 th round of training and the full-scale classification model obtained by the i-1 th round of training to obtain a predicted visual field label of the target suspicious visual field image determined by the i-th round of training and a predicted image label of the training sample image determined by the i-th round of training; and when the predicted view label of the training view image determined by the ith round of training, the predicted view label of the target suspicious view image determined by the ith round of training and the predicted view label of the training sample image determined by the ith round of training meet the convergence condition, ending the training, and respectively determining the view classification model and the full-scale classification model obtained by the i-1 round of training as the target view classification model and the target full-scale classification model in the target neural network model.

Optionally, the inputting the training sample image into the visual field classification model obtained by the i-1 th round of training and the full-scale classification model obtained by the i-1 th round of training to obtain the predicted visual field label of the target suspicious visual field image determined by the i-1 th round of training and the predicted image label of the training sample image determined by the i-1 th round of training includes: inputting the training sample images into a visual field classification model obtained by the (i-1) th round of training to obtain a predicted visual field label of the target suspicious visual field image determined by the (i) th round of training and S suspicious training visual field images determined by the (i) th round of training, wherein S is greater than or equal to 1, the S suspicious training visual field images are visual field images with predicted suspicious probability greater than or equal to the preset probability threshold in the training sample images, and the target suspicious visual field image is a training visual field image with the maximum predicted suspicious probability in the training sample images; and inputting the S suspicious training view images into the full-scale classification model obtained by the i-1 th round of training to obtain a predicted image label of the training sample image determined by the i-th round of training.

As an alternative embodiment, the training of the ith round is any one of multiple rounds of training of the neural network model to be trained. The (i-1) th round of training is a round prior to the (i) th round of training.

Before the visual field classification model to be trained is trained, the training visual field image needs to be labeled, the size of the training visual field image can be set according to practical situations, for example, the training visual field image can be 256 × 256, the labeled known visual field label can be represented by 0 or 1, a defective visual field is represented in a product defect detection scene 0, and a defective visual field is represented by 1. In the cancer cell detection scenario, 0 indicates a negative field, and 1 indicates a positive field, i.e., the image contains abnormalities, where the abnormalities include 11 types of lesions, including 6 types of cytopathies: ASCUS (atypical squamous cell, not definitively), LSIL (low-grade squamous lesion), ASCH (atypical squamous cell-Cannot exclude, atypical squamous cell, prone to high-grade lesion), HSIL (high-grade squamous lesion), SCC (squamous cell, squamous carcinoma) and AdC (adenocarinoma, adenocarcinoma), AGC (atypical squamous cell, atypical glandular cell). Fig. 6 shows a positive field image (containing LSIL lesion cells) including cancer cells and a negative field image in which both normal cells are present.

The visual field classification model to be trained may employ an image classification model, which may be, for example, ResNet50, or other classification models such as EfficientNet, etc. The last classifier of the model can be set as 2 classes, and the 2 classes result is in a product defect detection scene to indicate that defects exist and do not exist in the visual field image. In the cancer cell detection scenario, 2 classification results indicate that the visual field image is positive or negative. The number of training field images can be small, and the initialization can be performed by adopting weights of pre-training models on the ImageNet data set.

For the training sample image, as shown in the flowchart of fig. 7, a foreground region (for example, a cell region) may be extracted by using a conventional image segmentation method, and then the foreground region is segmented into a field-of-view image with a fixed size based on a mesh segmentation method, where the size may be determined according to the actual situation (for example, 256 × 256). And (3) processing the training sample images by using the visual field classification model obtained in the (i-1) th round of training to obtain N training visual field images and the predicted suspicious probability of the N training visual field images. In a product defect detection scene, the predicted suspicious probability of N training visual field images is used for representing the probability that defects exist in each training visual field image, and the predicted suspicious probability of N training visual field images in a cancer cell detection scene is used for representing the probability that cancer cells exist in each training visual field image (0-1). The N training view images are sorted in descending order according to the predicted suspicious probability, and then a top S view is selected as a suspicious positive region of the training view image, where S may be determined according to the actual situation, for example, 32, that is, the first 32 suspicious training view images are selected. And the predicted visual field label of the target suspicious visual field image with the maximum predicted suspicious probability in the training sample image can be obtained through the visual field classification model obtained through the (i-1) th round of training. And inputting the S suspicious training visual field images into the full-scale classification model obtained by the i-1 th round of training, and obtaining a predicted image label (which can be represented by 0 and 1) of the training sample image through the full-scale classification model obtained by the i-1 th round of training.

When the predicted view label of the training view image and the predicted view label of the training sample image satisfy the convergence conditions (the first loss condition, the second loss condition, and the third loss condition), the training is terminated to obtain the target neural network model.

Optionally, in the ith round of joint training, the method further comprises: inputting the predicted suspicious probability corresponding to the predicted view label of the target suspicious view image determined by the ith round of training and the known probability corresponding to the known image label of the training sample image into a first loss function to obtain a first loss value; inputting the predicted suspicious probability corresponding to the predicted image label of the training sample image determined by the ith round of training and the known probability corresponding to the known image label of the training sample image into a second loss function to obtain a second loss value; inputting the predicted suspicious probability corresponding to the predicted visual field label of the training visual field image determined by the ith round of training and the known probability corresponding to the known visual field label of the training visual field image into a third loss function to obtain a third loss value; determining whether the first loss value satisfies the first loss condition, whether the second loss value satisfies the second loss condition, and whether the third loss value satisfies the third loss condition; determining that the predicted view label of the training view image determined by the i-th round of training, the predicted view label of the target suspicious view image determined by the i-th round of training, and the predicted image label of the training sample image determined by the i-th round of training satisfy the convergence condition, in a case where it is determined that the first loss value satisfies the first loss condition, the second loss value satisfies the second loss condition, and the third loss value satisfies the third loss condition.

For training of the full-scale classification model, a joint training method may be adopted, such as the overall structure shown in fig. 8, and in each training batch (batch), in addition to the S suspicious visual field images extracted from the target image, the training visual field images that have been labeled are also included. The visual field classification loss (loss) needs to be calculated through the visual field classification model as well. By default, each training batch comprises a training sample image and two marked training visual field images, so that a joint training is formed, and the loss of the training comprises: the loss of classification of the training sample images and the loss of classification of the training field of view images. The classification loss of the training visual field image plays a regularization method to improve the effect of the model. Furthermore, a constraint is additionally added, and the predicted suspicious probability of the visual field image with the highest predicted suspicious probability of a training sample image is consistent with the known image label of the training sample image. Specifically, in each training iteration, the predicted suspicious probability of the suspicious field images in the training sample images is first calculated by using a field of view classification model, then the field of view with the largest predicted suspicious probability is selected, and then the predicted suspicious probability and the classification loss of the known image labels of the training sample images are calculated, wherein the loss is referred to as field of view constraint loss. Therefore, here the joint training loss contains three parts: a classification loss of the training sample images (second loss condition), a classification loss of the training visual field images (third loss condition), and a visual field constraint loss of the training sample images (first loss condition).

As an alternative, the first loss function, the second loss function and the third loss function may be loss functions in the prior art, such as cross entropy loss functions.

L＝-[y*logp+(1-y)*log(1-p)]

Where L represents the loss value, y represents the known label, and p is the predicted probability value for prediction as y.

The cross-entropy loss function of the first loss function is:

L ₁ ＝-[y ₁ *logp1+(1-y ₁ )*log(1-p1)]

wherein L is ₁ Denotes the first loss value, y ₁ The known probabilities of the known image labels representing the training sample image (1 or 0, 1 indicates the presence of defective or cancerous cells, 0 indicates the absence of defective or cancerous cells), p1 is that the target may beThe predicted view label of the suspected view image is y ₁ Is predicted for the probability of suspicion.

The cross entropy loss function of the second loss function is:

L ₂ ＝-[y ₂ *logp2+(1-y ₂ )*log(1-p2)]

wherein L is ₂ Represents a second loss value, y ₂ The known probability corresponding to the known image label representing the training sample image (1 or 0, 1 represents the existence of a defect or cancer cell, 0 represents the absence of a defect or cancer cell), and p2 is the predicted image label y of the training sample image ₂ Is predicted for the probability of suspicion.

The cross entropy loss function of the third loss function is:

L ₃ ＝-[y ₃ *logp3+(1-y ₃ )*log(1-p3)]

wherein L is ₃ Represents a third loss value, y ₃ The known probabilities of the known visual field labels representing the training visual field images (1 or 0, 1 representing the presence of defective or cancerous cells, and 0 representing the absence of defective or cancerous cells), p3 is the predicted visual field label y of the training visual field images ₃ The predicted suspicious probability.

The first loss condition may be that the first loss value is less than or equal to a first loss threshold, which may be determined according to actual conditions, and may be, for example, 0.05, 0.1, 0.15, or the like. The second loss condition may be that the second loss value is less than or equal to a second loss threshold, and the second loss threshold may be determined according to actual situations, and may be, for example, 0.05, 0.1, 0.15, and the like. The third loss condition may be that the third loss value is less than or equal to a third loss threshold, which may be determined according to actual conditions, and may be, for example, 0.05, 0.1, 0.15, or the like. And if the first loss value meets the first loss condition, the second loss value meets the second loss condition, and the third loss value meets the third loss condition, stopping training to obtain the target neural network model.

Optionally, the predicting the M suspicious view images through a target whole-slice classification model in the target neural network model to obtain predicted image tags of the target image includes: performing feature extraction on the M suspicious visual field images through the target visual field classification model in the target neural network model to obtain M feature vectors; and inputting the feature average vectors of the M feature vectors into a classifier in the target full-scale classification model, and classifying the feature average vectors through the classifier to obtain a predicted image tag of the target image.

As an optional implementation manner, the input of the target full-scale classification model is M suspicious view images extracted from the target image, and the M suspicious view images are view images of which the predicted suspicious probability in the target image is greater than or equal to a preset probability threshold. The method comprises the steps of firstly extracting a feature vector of each suspicious visual field image from a main body part of a target visual field classification model for each suspicious visual field image, then sending the feature vector into a transform block, wherein the transform block comprises a mult-head attribute and an MLP (maximum likelihood probability) and is used for modeling the features of each suspicious visual field image and obtaining new features, and sending feature average vectors of all suspicious visual field images into a classifier to obtain a predicted image label of the target image. The above transformer block can also be replaced by other models such as GNN.

As an alternative embodiment, the method can be applied to the fields of product flaw detection, cancer cell detection and the like.

In the product flaw detection, a product to be detected is photographed to obtain a target image, and then the target image is predicted through a trained target neural network model, so that the visual field image without flaws is eliminated, and the visual field image with flaws is screened out. The method can improve the detection efficiency of the product defects.

In a cancer cell detection scene, firstly, diseased part cells are scanned into WSI, then a trained target neural network model is used for predicting a WSI whole slice, so that negative slices are eliminated, positive slices are screened out, and a positive cell area is provided for a pathologist to make final diagnosis. The method can assist pathological doctors in cancer cytology diagnosis, reduce the workload of doctors and improve the efficiency.

Only a small number of training visual field images are needed to train a weak visual field classification model, so that the dependence on fine labeling data is reduced; when a whole classification model is trained, a joint training method is provided to make up for the defects of a visual field classification model; the whole-slice classification model considers the interaction relation among all the views; when a whole classification model is trained, the model performance is improved by a constraint method of non-labeled visual field in a training sample image;

it is understood that in the specific implementation of the present application, related data such as user information, when the above embodiments of the present application are applied to specific products or technologies, user permission or consent needs to be obtained, and the collection, use and processing of related data need to comply with related laws and regulations and standards of related countries and regions.

It should be noted that for simplicity of description, the above-mentioned embodiments of the method are described as a series of acts, but those skilled in the art should understand that the present application is not limited by the described order of acts, as some steps may be performed in other orders or simultaneously according to the present application. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required in this application.

According to another aspect of the embodiment of the application, an image recognition device for implementing the image recognition method is also provided. As shown in fig. 9, the apparatus includes: an obtaining module 92, configured to obtain a target image, where the target image is an image to be identified obtained by scanning a target object; an input module 94, configured to input the target image into a target neural network model, determine M suspicious visual field images from the target image through a target visual field classification model in the target neural network model, and predict the M suspicious visual field images through a target full-scale classification model in the target neural network model to obtain a predicted image tag of the target image, where M is greater than or equal to 1, and the M suspicious visual field images are visual field images of which predicted suspicious probability is greater than or equal to a preset probability threshold in the target image; the target neural network model is obtained by training a to-be-trained neural network model by using a training sample image and a training visual field image until the following convergence condition is met: and a first loss condition is satisfied between a known image label of the training sample image and a predicted visual field label of a target suspicious visual field image, and the target suspicious visual field image is the suspicious visual field image with the maximum predicted suspicious probability in the training sample image determined by a to-be-trained visual field classification model in the to-be-trained neural network model.

Optionally, the above apparatus is further configured to acquire the training sample image and the training visual field image before the target image is input into the target neural network model; performing multi-round combined training on the to-be-trained visual field classification model and the to-be-trained full-scale classification model in the to-be-trained neural network model through the training sample image and the training visual field image to obtain the target neural network model, wherein, in the training process, if the neural network model to be trained does not satisfy the convergence condition, parameters in the visual field classification model to be trained and the full-scale classification model to be trained are adjusted, if the neural network model to be trained satisfies the convergence condition, the training is finished, and the neural network model to be trained when the training is finished is determined as the target neural network model, and respectively determining the visual field classification model to be trained and the full-scale classification model to be trained when the training is finished as the target visual field classification model and the target full-scale classification model in the target neural network model.

Optionally, the above apparatus is further configured to perform an ith round of joint training on the to-be-trained visual field classification model and the to-be-trained full-scale classification model in the to-be-trained neural network model, where i is a positive integer greater than or equal to 1, and the visual field classification model and the full-scale classification model obtained by the 0 th round of training are the to-be-trained visual field classification model and the to-be-trained full-scale classification model that are not trained in the to-be-trained neural network model, and the method includes: inputting the training visual field image into a visual field classification model obtained by the (i-1) th round of training to obtain a predicted visual field label of the training visual field image determined by the (i) th round of training; inputting the training sample image into the visual field classification model obtained by the i-1 th round of training and the full-scale classification model obtained by the i-1 th round of training to obtain a predicted visual field label of the target suspicious visual field image determined by the i-th round of training and a predicted image label of the training sample image determined by the i-th round of training; and when the predicted view label of the training view image determined by the ith round of training, the predicted view label of the target suspicious view image determined by the ith round of training and the predicted view label of the training sample image determined by the ith round of training meet the convergence condition, ending the training, and respectively determining the view classification model and the full-scale classification model obtained by the i-1 round of training as the target view classification model and the target full-scale classification model in the target neural network model.

Optionally, the apparatus is further configured to input the training sample image into a visual field classification model obtained in the i-1 st round of training, obtain a predicted visual field label of the target suspicious visual field image determined in the i-1 th round of training, and obtain S suspicious training visual field images determined in the i-th round of training, where S is greater than or equal to 1, the S suspicious training visual field images are visual field images with a predicted suspicious probability greater than or equal to the preset probability threshold in the training sample image, and the target suspicious visual field image is a training visual field image with a largest predicted suspicious probability in the training sample image; and inputting the S suspicious training visual field images into the full-slice classification model obtained by the i-1 th round of training to obtain predicted image labels of the training sample images determined by the i-th round of training.

Optionally, the device is further configured to input the training sample image into the visual field classification model obtained through the i-1 th round of training, and segment the training sample image into N training visual field images through the visual field classification model obtained through the i-1 th round of training; determining training visual field images with the predicted suspicious probability greater than or equal to the preset probability threshold value from the N training visual field images through the visual field classification model obtained in the (i-1) th round of training, and obtaining S suspicious training visual field images determined by the ith round of training, wherein N is greater than or equal to S; and determining the training view image with the maximum predicted suspicious probability in the N training view images as the target suspicious view image through the view classification model obtained by the (i-1) th round of training, and determining a label corresponding to the predicted suspicious probability of the target suspicious view image as the predicted view label of the target suspicious view image.

Optionally, the apparatus is further configured to, in the ith round of joint training, input the predicted suspicious probability corresponding to the predicted view label of the target suspicious view image determined in the ith round of training and the known probability corresponding to the known image label of the training sample image into a first loss function, so as to obtain a first loss value; inputting the predicted suspicious probability corresponding to the predicted image label of the training sample image determined by the ith round of training and the known probability corresponding to the known image label of the training sample image into a second loss function to obtain a second loss value; inputting the predicted suspicious probability corresponding to the predicted visual field label of the training visual field image determined by the ith round of training and the known probability corresponding to the known visual field label of the training visual field image into a third loss function to obtain a third loss value; determining whether the first loss value satisfies the first loss condition, whether the second loss value satisfies the second loss condition, and whether the third loss value satisfies the third loss condition; determining that the predicted view label of the training view image determined by the i-th round of training, the predicted view label of the target suspicious view image determined by the i-th round of training, and the predicted image label of the training sample image determined by the i-th round of training satisfy the convergence condition, in a case where it is determined that the first loss value satisfies the first loss condition, the second loss value satisfies the second loss condition, and the third loss value satisfies the third loss condition.

Optionally, the device is further configured to perform feature extraction on the M suspicious visual field images through the target visual field classification model in the target neural network model to obtain M feature vectors; and inputting the feature average vectors of the M feature vectors into a classifier in the target full-scale classification model, and classifying the feature average vectors through the classifier to obtain a predicted image tag of the target image.

Optionally, the above apparatus is further configured to segment the target image into N view images through the target view classification model in the target neural network model, where N is greater than or equal to M; and determining the M suspicious visual field images from the N visual field images through the target visual field classification model, wherein the M suspicious visual field images are the visual field images of which the predicted suspicious probability is greater than or equal to the preset probability threshold in the N visual field images.

According to an aspect of the application, there is provided a computer program product comprising a computer program/instructions containing program code for performing the method illustrated by the flow chart. In such an embodiment, the computer program may be downloaded and installed from the network through the communication part 1009 and/or installed from the removable medium 1011. When executed by the cpu 1001, the computer program performs various functions provided by the embodiments of the present application.

The above-mentioned serial numbers of the embodiments of the present application are merely for description, and do not represent the advantages and disadvantages of the embodiments.

Fig. 10 schematically shows a block diagram of a computer system of an electronic device for implementing an embodiment of the present application.

It should be noted that the computer system 1000 of the electronic device shown in fig. 10 is only an example, and should not bring any limitation to the functions and the application scope of the embodiments of the present application.

As shown in fig. 10, the computer system 1000 includes a Central Processing Unit 1001 (CPU), which can perform various appropriate actions and processes according to a program stored in a Read-Only Memory 1002 (ROM) or a program loaded from a storage section 1008 into a Random Access Memory 1003 (RAM). In the random access memory 1003, various programs and data necessary for system operation are also stored. The cpu 1001, the rom 1002, and the ram 1003 are connected to each other via a bus 1004. An Input/Output interface 1005(Input/Output interface, i.e., I/O interface) is also connected to the bus 1004.

The following components are connected to the input/output interface 1005: an input section 1006 including a keyboard, a mouse, and the like; an output section 1007 including a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and a speaker; a storage portion 1008 including a hard disk and the like; and a communication section 1009 including a network interface card such as a local area network card, modem, or the like. The communication section 1009 performs communication processing via a network such as the internet. The driver 1010 is also connected to the input/output interface 1005 as necessary. A removable medium 1011 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 1010 as necessary, so that a computer program read out therefrom is mounted into the storage section 1008 as necessary.

In particular, according to embodiments of the present application, the processes described in the various method flowcharts may be implemented as computer software programs. For example, embodiments of the present application include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated by the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication part 1009 and/or installed from the removable medium 1011. When the computer program is executed by the cpu 1001, various functions defined in the system of the present application are executed.

According to another aspect of the embodiments of the present application, there is also provided an electronic device for implementing the image recognition method, where the electronic device may be the terminal device or the server shown in fig. 1. The present embodiment takes the electronic device as a terminal device as an example for explanation. As shown in fig. 11, the electronic device comprises a memory 1102 and a processor 1104, wherein the memory 1102 stores a computer program and the processor 1104 is configured to execute the steps of any of the method embodiments described above by means of the computer program.

Optionally, in this embodiment, the electronic device may be located in at least one network device of a plurality of network devices of a computer network.

Optionally, in this embodiment, the processor may be configured to execute the following steps by a computer program:

s1, acquiring a target image, wherein the target image is an image to be identified obtained by scanning a target object;

s2, inputting the target image into a target neural network model, determining M suspicious visual field images from the target image through a target visual field classification model in the target neural network model, and predicting the M suspicious visual field images through a target full-scale classification model in the target neural network model to obtain a predicted image tag of the target image, wherein M is greater than or equal to 1, and the M suspicious visual field images are visual field images of which the predicted suspicious probability is greater than or equal to a preset probability threshold in the target image;

the target neural network model is obtained by training a neural network model to be trained by using a training sample image and a training visual field image until the following convergence condition is met: and a first loss condition is satisfied between a known image label of the training sample image and a predicted visual field label of a target suspicious visual field image, and the target suspicious visual field image is a suspicious visual field image with the maximum predicted suspicious probability in the training sample image determined by the visual field classification model to be trained.

Alternatively, it can be understood by those skilled in the art that the structure shown in fig. 11 is only an illustration, and the electronic device may also be a terminal device such as a smart phone (e.g., an Android phone, an iOS phone, etc.), a tablet computer, a palmtop computer, a Mobile Internet Device (MID), a PAD, and the like. Fig. 11 is a diagram illustrating a structure of the electronic device. For example, the electronics may also include more or fewer components (e.g., network interfaces, etc.) than shown in FIG. 11, or have a different configuration than shown in FIG. 11.

The memory 1102 may be used to store software programs and modules, such as program instructions/modules corresponding to the image recognition method and apparatus in the embodiment of the present application, and the processor 1104 executes various functional applications and data processing by running the software programs and modules stored in the memory 1102, so as to implement the above-mentioned image recognition method. The memory 1102 may include high-speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the memory 1102 can further include memory located remotely from the processor 1104 and such remote memory can be coupled to the terminal via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof. The memory 1102 may be, but is not limited to, used to store information such as a target image. As an example, as shown in fig. 11, the memory 1102 may include, but is not limited to, the obtaining module 92 and the input module 94 of the target image device. In addition, other module units in the target image device may also be included, but are not limited to these, and are not described in detail in this example.

Optionally, the transmitting device 1106 is used for receiving or transmitting data via a network. Examples of the network may include a wired network and a wireless network. In one example, the transmission device 1106 includes a Network adapter (NIC) that can be connected to a router via a Network cable and can communicate with the internet or a local area Network. In one example, the transmission device 1106 is a Radio Frequency (RF) module, which is used for communicating with the internet in a wireless manner.

In addition, the electronic device further includes: a display 1108 for displaying the target image; and a connection bus 1110 for connecting the respective module components in the above-described electronic apparatus.

In other embodiments, the terminal device or the server may be a node in a distributed system, where the distributed system may be a blockchain system, and the blockchain system may be a distributed system formed by connecting a plurality of nodes through a network communication. Nodes can form a Peer-To-Peer (P2P, Peer To Peer) network, and any type of computing device, such as a server, a terminal, and other electronic devices, can become a node in the blockchain system by joining the Peer-To-Peer network.

According to an aspect of the present application, there is provided a computer-readable storage medium, a processor of a computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions to make the computer device execute the image recognition method provided in the above-mentioned various alternative implementations.

Alternatively, in the present embodiment, the above-mentioned computer-readable storage medium may be configured to store a computer program for executing the steps of:

the target neural network model is obtained by training a to-be-trained neural network model by using a training sample image and a training visual field image until the following convergence condition is met: and a first loss condition is satisfied between a known image label of the training sample image and a predicted visual field label of a target suspicious visual field image, wherein the target suspicious visual field image is the suspicious visual field image with the maximum predicted suspicious probability in the training sample image determined by the visual field classification model to be trained.

Alternatively, in this embodiment, a person skilled in the art may understand that all or part of the steps in the methods of the foregoing embodiments may be implemented by a program instructing hardware associated with the terminal device, where the program may be stored in a computer-readable storage medium, and the storage medium may include: flash disks, Read-Only memories (ROMs), Random Access Memories (RAMs), magnetic or optical disks, and the like.

The integrated unit in the above embodiments, if implemented in the form of a software functional unit and sold or used as a separate product, may be stored in the above computer-readable storage medium. Based on such understanding, the technical solution of the present application may be substantially implemented or a part of or all or part of the technical solution contributing to the prior art may be embodied in the form of a software product stored in a storage medium, and including instructions for causing one or more computer devices (which may be personal computers, servers, network devices, or the like) to execute all or part of the steps of the method described in the embodiments of the present application.

In the above embodiments of the present application, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.

In the several embodiments provided in the present application, it should be understood that the disclosed client may be implemented in other manners. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one type of division of logical functions, and there may be other divisions when actually implemented, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, units or modules, and may be in an electrical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one position, or may be distributed on multiple network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit may be implemented in the form of hardware, or may also be implemented in the form of a software functional unit.

The foregoing is only a preferred embodiment of the present application and it should be noted that, as will be apparent to those skilled in the art, numerous modifications and adaptations can be made without departing from the principles of the present application and such modifications and adaptations are intended to be considered within the scope of the present application.

Claims

1. An image recognition method, comprising:

acquiring a target image, wherein the target image is an image to be identified obtained by scanning a target object;

inputting the target image into a target neural network model, determining M suspicious visual field images from the target image through a target visual field classification model in the target neural network model, and predicting the M suspicious visual field images through a target full-slice classification model in the target neural network model to obtain predicted image tags of the target image, wherein M is greater than or equal to 1, and the M suspicious visual field images are visual field images with predicted suspicious probability greater than or equal to a preset probability threshold in the target image;

2. The method of claim 1, wherein the convergence condition further comprises:

a second loss condition is satisfied between a known image label of the training sample image and a predicted image label of the training sample image determined by a full-scale classification model to be trained in the neural network model to be trained, and a third loss condition is satisfied between a known visual field label of the training visual field image and a predicted visual field label of the training visual field image determined by the visual field classification model to be trained in the neural network model to be trained.

3. The method of claim 2, wherein prior to said inputting the target image into a target neural network model, the method further comprises:

acquiring the training sample image and the training visual field image;

performing multi-round combined training on the to-be-trained visual field classification model and the to-be-trained full-scale classification model in the to-be-trained neural network model through the training sample image and the training visual field image to obtain the target neural network model, wherein, in the training process, if the neural network model to be trained does not satisfy the convergence condition, parameters in the visual field classification model to be trained and the full-scale classification model to be trained are adjusted, if the neural network model to be trained satisfies the convergence condition, the training is finished, and the neural network model to be trained when the training is finished is determined as the target neural network model, and respectively determining the visual field classification model to be trained and the full-scale classification model to be trained when the training is finished as the target visual field classification model and the target full-scale classification model in the target neural network model.

4. The method according to claim 3, wherein the multi-round joint training of the visual field classification model to be trained and the full-scale classification model to be trained in the neural network model to be trained through the training sample images and the training visual field images comprises:

performing ith round of joint training on the to-be-trained visual field classification model and the to-be-trained full-scale classification model in the to-be-trained neural network model, wherein i is a positive integer greater than or equal to 1, and the visual field classification model and the full-scale classification model obtained by the 0 th round of training are the to-be-trained visual field classification model and the to-be-trained full-scale classification model which are not trained in the to-be-trained neural network model, and the method comprises the following steps of:

inputting the training visual field image into a visual field classification model obtained by the i-1 th round of training to obtain a prediction visual field label of the training visual field image determined by the i-th round of training;

inputting the training sample image into the visual field classification model obtained by the (i-1) th round of training and the full-slice classification model obtained by the (i-1) th round of training to obtain a predicted visual field label of the target suspicious visual field image determined by the (i) th round of training and a predicted image label of the training sample image determined by the (i) th round of training;

and when the predicted view label of the training view image determined by the ith round of training, the predicted view label of the target suspicious view image determined by the ith round of training and the predicted view label of the training sample image determined by the ith round of training meet the convergence condition, ending the training, and respectively determining the view classification model and the full-scale classification model obtained by the i-1 round of training as the target view classification model and the target full-scale classification model in the target neural network model.

5. The method according to claim 4, wherein the inputting the training sample images into the field-of-view classification model obtained from the i-1 th round of training and the full-scale classification model obtained from the i-1 th round of training to obtain the predicted field-of-view label of the target suspicious field-of-view image determined from the i-th round of training and the predicted image label of the training sample image determined from the i-th round of training comprises:

inputting the training sample images into a visual field classification model obtained by the (i-1) th round of training to obtain a predicted visual field label of the target suspicious visual field image determined by the (i) th round of training and S suspicious training visual field images determined by the (i) th round of training, wherein S is greater than or equal to 1, the S suspicious training visual field images are visual field images with predicted suspicious probability greater than or equal to the preset probability threshold in the training sample images, and the target suspicious visual field image is a training visual field image with the maximum predicted suspicious probability in the training sample images;

and inputting the S suspicious training view images into the full-scale classification model obtained by the i-1 th round of training to obtain a predicted image label of the training sample image determined by the i-th round of training.

6. The method according to claim 5, wherein the inputting the training sample images into the visual field classification model obtained from the i-1 th round of training, obtaining the predicted visual field label of the target suspicious visual field image determined by the i-th round of training, and obtaining the S suspicious training visual field images determined by the i-th round of training comprises:

inputting the training sample image into the visual field classification model obtained by the i-1 th round of training, and segmenting the training sample image into N training visual field images through the visual field classification model obtained by the i-1 th round of training;

determining training visual field images with the predicted suspicious probability greater than or equal to the preset probability threshold value from the N training visual field images through the visual field classification model obtained in the (i-1) th round of training, and obtaining S suspicious training visual field images determined by the ith round of training, wherein N is greater than or equal to S;

and determining the training view image with the maximum predicted suspicious probability in the N training view images as the target suspicious view image through the view classification model obtained by the (i-1) th round of training, and determining a label corresponding to the predicted suspicious probability of the target suspicious view image as the predicted view label of the target suspicious view image.

7. The method according to any one of claims 4 to 6, wherein in the ith round of joint training, the method further comprises:

inputting the predicted suspicious probability corresponding to the predicted view label of the target suspicious view image determined by the ith round of training and the known probability corresponding to the known image label of the training sample image into a first loss function to obtain a first loss value;

inputting the predicted suspicious probability corresponding to the predicted image label of the training sample image determined by the ith round of training and the known probability corresponding to the known image label of the training sample image into a second loss function to obtain a second loss value;

inputting the predicted suspicious probability corresponding to the predicted visual field label of the training visual field image determined by the ith round of training and the known probability corresponding to the known visual field label of the training visual field image into a third loss function to obtain a third loss value;

determining whether the first loss value satisfies the first loss condition, whether the second loss value satisfies the second loss condition, and whether the third loss value satisfies the third loss condition;

determining that the predicted view label of the training view image determined by the i-th round of training, the predicted view label of the target suspicious view image determined by the i-th round of training, and the predicted image label of the training sample image determined by the i-th round of training satisfy the convergence condition, in a case where it is determined that the first loss value satisfies the first loss condition, the second loss value satisfies the second loss condition, and the third loss value satisfies the third loss condition.

8. The method according to claim 1, wherein the predicting the M suspicious view images through a target whole-slice classification model in the target neural network model to obtain a predicted image tag of the target image comprises:

performing feature extraction on the M suspicious visual field images through the target visual field classification model in the target neural network model to obtain M feature vectors;

and inputting the feature average vectors of the M feature vectors into a classifier in the target full-scale classification model, and classifying the feature average vectors through the classifier to obtain a predicted image tag of the target image.

9. The method of claim 1, wherein the determining M suspicious visual field images from the target images by a target visual field classification model in the target neural network model comprises:

segmenting the target image into N view images by the target view classification model in the target neural network model, wherein N is greater than or equal to M;

and determining the M suspicious visual field images from the N visual field images through the target visual field classification model, wherein the M suspicious visual field images are the visual field images of which the predicted suspicious probability is greater than or equal to the preset probability threshold in the N visual field images.

10. An image recognition apparatus, comprising:

the device comprises an acquisition module, a recognition module and a processing module, wherein the acquisition module is used for acquiring a target image, and the target image is an image to be recognized obtained by scanning a target object;

the input module is used for inputting the target image into a target neural network model, determining M suspicious visual field images from the target image through a target visual field classification model in the target neural network model, and predicting the M suspicious visual field images through a target full-wafer classification model in the target neural network model to obtain a predicted image tag of the target image, wherein M is greater than or equal to 1, and the M suspicious visual field images are visual field images of which the predicted suspicious probability is greater than or equal to a preset probability threshold in the target image;

11. A computer-readable storage medium, characterized in that it comprises a stored program, wherein the program is executable by a terminal device or a computer to perform the method of any one of claims 1 to 9.

12. A computer program product comprising computer program/instructions, characterized in that the computer program/instructions, when executed by a processor, implement the steps of the method as claimed in any one of claims 1 to 9.

13. An electronic device comprising a memory and a processor, characterized in that the memory has stored therein a computer program, the processor being arranged to execute the method of any of claims 1 to 9 by means of the computer program.