CN115359325B

CN115359325B - Training method, device, equipment and medium for image recognition model

Info

Publication number: CN115359325B
Application number: CN202211277547.7A
Authority: CN
Inventors: 田宽; 张军; 韩骁
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2022-10-19
Filing date: 2022-10-19
Publication date: 2023-01-10
Anticipated expiration: 2042-10-19
Also published as: CN115359325A

Abstract

The application discloses a training method, a training device, equipment and a training medium for an image recognition model, and relates to the field of machine learning. The method comprises the following steps: acquiring a sample image set, performing region segmentation on sample images in the sample image set to generate a first sample slice packet and a second sample slice packet, performing feature analysis on sample image slices in the first sample slice packet through an image recognition model, and determining a first loss value based on a difference between the slice label and a slice analysis result; performing feature analysis on the second sample slice packet through the image recognition model, and determining a second loss value corresponding to the sample image and a third loss value corresponding to the second sample slice packet based on the difference between the sample label and the slice packet analysis result; and training the image recognition model based on the loss value. The accuracy and the recognition precision of the image recognition model to the target image content in the sample image slice are improved.

Description

Training method, device, equipment and medium for image recognition model

Technical Field

The embodiment of the application relates to the field of artificial intelligence, in particular to a training method, a device, equipment and a medium for an image recognition model.

Background

Pathology is a microscopic study of cell morphology, and pathological sections are usually made for observation of a lesion tissue to be examined. The microscope scanning device is used for scanning the traditional pathological section to obtain a corresponding medical pathological image, the image recognition model is used for recognizing the image content, and the observation output result can judge whether the lesion tissues are abnormal.

In the related art, in the training process of the image recognition model, a sample image labeled with a classification label is generally segmented to obtain sample image slices, each slice is input to the image recognition model, the slice is recognized and a recognition result is output, and the image recognition model is trained based on an error between the recognition result and the classification label of the sample image.

However, in the related art, training is performed based on the classification labels at the image level, the accuracy of output results is low, and the visualization effect is poor.

Disclosure of Invention

The embodiment of the application provides a training method, a training device, equipment and a medium for an image recognition model, which can improve the accuracy of image recognition and improve the visual effect of an image after recognition. The technical scheme is as follows.

In one aspect, a method for training an image recognition model is provided, the method including:

obtaining a sample image set, wherein the sample image set comprises a sample image marked with a sample label, and the sample label is used for indicating the inclusion condition of the target image content in the sample image;

performing region segmentation on sample images in the sample image set to generate a first sample slice packet and a second sample slice packet, wherein sample image slices in the first sample slice packet are acquired from different sample images, sample image slices in the second sample slice packet are acquired from the same sample image, and sample image slices in the first sample slice packet are marked with slice labels corresponding to the sample labels;

performing feature analysis on a sample image slice in the first sample slice packet through an image recognition model, and determining a first loss value based on a difference between the slice label and a slice analysis result;

performing feature analysis on the second sample slice packet through the image recognition model, and determining a second loss value corresponding to the sample image and a third loss value corresponding to the second sample slice packet based on the difference between the sample label and the slice packet analysis result;

and training the image recognition model based on the first loss value, the second loss value and the third loss value, wherein the trained image recognition model is used for recognizing the target image content in the image.

In another aspect, an apparatus for training an image recognition model is provided, the apparatus including:

the system comprises an acquisition module, a display module and a display module, wherein the acquisition module acquires a sample image set, the sample image set comprises sample images marked with sample labels, and the sample labels are used for indicating the inclusion condition of target image contents in the sample images;

the generating module is used for carrying out region segmentation on the sample images in the sample image set and generating a first sample slice packet and a second sample slice packet, wherein the sample image slices in the first sample slice packet are acquired from different sample images, the sample image slices in the second sample slice packet are acquired from the same sample image, and the sample image slices in the first sample slice packet are marked with slice labels corresponding to the sample labels;

a determination module for performing feature analysis on the sample image slices in the first sample slice packet through an image recognition model, and determining a first loss value based on a difference between the slice label and a slice analysis result;

the determining module is used for performing feature analysis on the second sample slice packet through the image recognition model, and determining a second loss value corresponding to the sample image and a third loss value corresponding to the second sample slice packet based on the difference between the sample label and the slice packet analysis result;

and the training module is used for training the image recognition model based on the first loss value, the second loss value and the third loss value, and the trained image recognition model is used for recognizing the target image content in the image.

In another aspect, a computer device is provided, which includes a processor and a memory, where at least one instruction, at least one program, a set of codes, or a set of instructions is stored in the memory, and the at least one instruction, the at least one program, the set of codes, or the set of instructions is loaded and executed by the processor to implement the training method for the image recognition model according to any one of the embodiments of the present application.

In another aspect, a computer-readable storage medium is provided, in which at least one instruction, at least one program, a set of codes, or a set of instructions is stored, which is loaded and executed by a processor to implement the method for training an image recognition model as described in any of the embodiments of the present application.

In another aspect, a computer program product is provided that includes computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer readable storage medium, and the processor executes the computer instructions to make the computer device execute the training method of the image recognition model in any one of the above embodiments.

The beneficial effects brought by the technical scheme provided by the embodiment of the application at least comprise:

and inputting the sample image and the sample image slice into the image recognition model, training the image recognition model, and outputting an image recognition result. The sample image is marked with a sample label, the sample image slice is marked with a slice label, based on an output result of the image recognition model, the sample label and the slice label, a corresponding loss value is calculated, the image recognition model is trained, the accuracy and precision of image recognition of the image recognition model can be improved, the result output by the image recognition model can be better visualized, the problem of the slice level can be solved by the trained image recognition model, the image slice is input into the image recognition model, and the image recognition model can also output an accurate result.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

FIG. 1 is a diagram illustrating a training process for an image recognition model provided by an exemplary embodiment of the present application;

FIG. 2 is a schematic illustration of an implementation environment provided by an exemplary embodiment of the present application;

FIG. 3 is a flowchart of a method for training an image recognition model provided by an exemplary embodiment of the present application;

FIG. 4 is a schematic diagram of an image slicing process provided by an exemplary embodiment of the present application;

FIG. 5 is a flow diagram of a patch strong supervision module training method based on the image recognition model provided by the embodiment shown in FIG. 3;

FIG. 6 is a flowchart of a WSI weak supervision module training method based on the image recognition model provided by the embodiment shown in FIG. 3;

FIG. 7 is a flowchart of a method for training an image recognition model based on a fourth loss value according to another exemplary embodiment of the present application;

FIG. 8 is a diagram illustrating an improvement in visualization during training of an image recognition model according to an exemplary embodiment of the present application;

FIG. 9 is a block diagram of an apparatus for training an image recognition model according to an exemplary embodiment of the present application;

FIG. 10 is a block diagram of an apparatus for training an image recognition model according to another exemplary embodiment of the present application;

FIG. 11 is a block diagram of a computer device provided in an exemplary embodiment of the present application.

Detailed Description

To make the objects, technical solutions and advantages of the present application more clear, embodiments of the present application will be described in further detail below with reference to the accompanying drawings.

First, terms referred to in the embodiments of the present application will be briefly described.

Artificial Intelligence (AI): the method is a theory, method, technology and application system for simulating, extending and expanding human intelligence by using a digital computer or a machine controlled by the digital computer, sensing the environment, acquiring knowledge and obtaining the best result by using the knowledge. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making.

The artificial intelligence technology is a comprehensive subject and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence base technology generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, training technology of large image recognition models, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.

Machine Learning (ML): the method is a multi-field cross discipline and relates to a plurality of disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and the like. The special research on how a computer simulates or realizes the learning behavior of human beings so as to acquire new knowledge or skills and reorganize the existing knowledge structure to continuously improve the performance of the computer. Machine learning is the core of artificial intelligence, is the fundamental approach for computers to have intelligence, and is applied to all fields of artificial intelligence. Machine learning and deep learning generally include techniques such as artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning, and teaching learning.

Image recognition techniques can be applied in a number of areas, such as: the system comprises the fields of traffic image identification, medical auxiliary diagnosis, home video identification and the like.

First, the present embodiment will be described taking the medical auxiliary diagnosis field as an example.

Pathology is a microscopic study of cell morphology that can complement in situ molecular information. The tissue samples were removed from the body and then placed in fixative to make pathological sections for microscopic observation. A full-automatic microscope scanning device is utilized to scan and splice the traditional glass slices, so that a complete digital slice (white Slide Image, WSI for short) can be obtained. The digital section has the advantages of fastness, opening at any time and the like in pathology research.

For a lesion tissue to be detected, a pathologist usually needs to make a stained pathological section, then examine all regions of the lesion tissue in a microscope or digital section mode, find abnormal conditions therein, and finally summarize all the abnormal conditions to obtain a final diagnosis conclusion. For the abnormal condition existing in some lesion tissues, there are problems of complicated type, complicated composition and random location, for example, for kidney tissues, the abnormal condition may exist in cortix (cortex region), medulla (medulla region), glomerili (glomerulus), adipose (fat region), and the abnormal condition may be cancer, tumor, inflammation, etc. Therefore, in order to detect all abnormalities present in the focal tissue, a physician needs to have accurate knowledge of all possible abnormalities.

In recent years, with the rapid development of artificial intelligence technology, many algorithms capable of effectively assisting doctors in detecting abnormal conditions appear, the methods usually need to collect all types of abnormal samples and normal samples and carry out a large amount of manual labeling to train to obtain a detection or segmentation model of the abnormal samples, the method is time-consuming and labor-consuming, difficult to effectively cover all types of abnormal conditions, and when a new type of abnormal conditions appears, the method is often ineffective.

Therefore, in the related art, digital slices (WSIs) corresponding to a pathological image are densely blocked to obtain local slices (patch), the local slices are used as input, and an image recognition model is used to recognize and predict the local slices, so that recognition results of all the local slices are integrated, and all the WSIs classified by taking the WSIs as a unit to obtain a pathological change recognition result of a final pathological image.

However, for an actual task, besides the WSI-level label, a sample of the input image recognition model also has a small amount of label information at the patch level, and the conventional image recognition model and the corresponding training method cannot complete the task of detecting the abnormal condition of the local slice. Therefore, the problems of low accuracy of the recognition result and poor visualization effect exist.

In view of the above problems, the embodiment of the present application provides a method for training an image recognition model by using a global annotation and a local annotation at the same time. Namely, the WSI-level label and the patch-level/pixel-level label are simultaneously utilized to train the image recognition model, so that the recognition precision and accuracy of the image recognition model are improved, and the visualization effect of the output content of the image recognition model is improved.

It should be noted that the image data such as the pathological image data is data actively uploaded by the user; or, the data obtained after being individually authorized by the user. In the above embodiment, pathological image recognition is taken as an example for explanation, and the training method of the image recognition model provided in the embodiment of the present application may also be applied to other scenes, which is not limited to this.

It should be noted that the information (including but not limited to user equipment information, user personal information, etc.), data (including but not limited to data for analysis, stored data, displayed data, etc.) and signals referred to in this application are authorized individually by the user or fully by each party, and the collection, use and processing of the relevant data are subject to the relevant laws and regulations and standards in the relevant countries and regions. For example, the pathological image data referred to in this application is acquired with sufficient authorization.

The embodiment of the application provides a method for training an image recognition model by simultaneously utilizing integral labels and local labels, which mainly comprises two parts: a patch strong supervision module and a WSI weak supervision module. Fig. 1 is a schematic diagram illustrating a training process of an image recognition model according to an exemplary embodiment of the present application, as shown in fig. 1.

In the patch strong supervision module 100, an image slice 110 corresponding to a pathological image is acquired, and the image slice 110 is from a different pathological image and labeled with a slice label.

The image slice 110 mainly includes the following loss acquisition.

First loss value:

inputting the image slice 110 into a feature extraction layer 120 of an image recognition model, and outputting to obtain a first slice feature 121 of the image slice 110; the attention distribution 131 and the attention feature 132 of the image slice 110 are obtained through classification prediction of the attention layer 130 according to the first slice feature 121; calculating a second slice feature 133 of the image slice 110 based on the attention distribution 131 and the attention feature 132; predicting a first slice analysis result 141 of the image slice 110 through the first feature classification layer 140 according to the second slice feature 133; a first loss value is derived based on the difference between the first slice analysis result 141 of the image slice 110 and the slice label.

In the WSI weak supervision module 150, a sample pathology image 160 is acquired, and the sample pathology image 160 is labeled with a sample label. The pathological sample image 160 is sliced to obtain corresponding image slices 170, and the image slices 170 constitute slice packets 180.

The slice packet 180 mainly includes the following two loss acquisitions.

1. Second loss value

Inputting the slice packets 180 into the feature extraction layer 120 of the image recognition model, and outputting to obtain first packet features 122 of the slice packets 180; predicting the attention distribution 134 and the attention feature 135 of the slice packet 180 by classifying the attention layer 130 according to the first packet feature 122; aggregating second packet features 136 of slice packet 180 based on attention profile 134 and attention features 135; predicting a first packet analysis result 142 of the slice packet 180 through the first feature classification layer 140 according to the second packet feature 136; a second loss value is obtained based on the difference between the first packet analysis result 142 of the cut-sheet package 180 and the sample label.

2. Third loss value

According to the attention profile 134 of the slice group 180, K image slices 170 with the highest value of the attention profile 134 are selected, and the slice group 171 is composed of the K image slices 170. The image slices 170 in the slice group 171 are labeled with the same specimen label as the specimen pathology image 160. The attention feature 135 corresponding to the image slice 170 in the slice group 171 is the third slice feature 137 of the slice group 171; predicting a second slice analysis result 191 of the slice group 171 through the second feature classification layer 190 according to the third slice feature 137; a third loss value is obtained based on the difference between the second slice analysis result 191 of the slice group 171 and the sample label.

An image recognition model is trained by integrating the first loss value of the image slice 110, the second loss value of the slice packet 180, and the third loss value of the slice group 171.

Next, an implementation environment related to the embodiment of the present application is described, and please refer to fig. 2 schematically, in which a terminal 210 and a server 220 are related, and the terminal 210 and the server 220 are connected through a communication network 230.

In some embodiments, terminal 210 is used to send image data to server 220. In some embodiments, the terminal 210 has an application program with an image recognition function installed therein, and illustratively, the terminal 210 has an application program with a diagnosis assistance function installed therein. Such as: the terminal 210 has a search engine program, a life support application program, an instant messaging application program, a video program, a game program, and the like installed therein, which is not limited in the embodiment of the present application.

The server 220 is installed with an image recognition model 221, wherein the image recognition model 221 can recognize pathological image data with a large data volume, and when the pathological image is recognized, the pathological image is firstly sliced to obtain a plurality of image slices, the image slices are recognized to obtain a recognition result, and then the recognition results of the image slices are combined to obtain a recognition result corresponding to the pathological image.

The image recognition model 221 is divided into two parts in the training process, and one part obtains a first loss value by calculating image slices in the sample library; and the other part is used for slicing the sample images in the sample library, forming slice packets, calculating to obtain a second loss value through the slice packets, selecting partial slices based on the slice packets to form slice groups, and calculating to obtain a third loss value through the slice groups. Finally, the image recognition model 221 is trained based on the first loss value, the second loss value, and the third loss value.

The terminal may be a mobile phone, a tablet computer, a desktop computer, a portable notebook computer, an intelligent television, a vehicle-mounted terminal, an intelligent home device, and other terminal devices in various forms, which is not limited in the embodiment of the present application.

It should be noted that the server may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing basic cloud computing services such as a cloud service, a cloud database, cloud computing, a cloud function, cloud storage, a Network service, cloud communication, a middleware service, a domain name service, a security service, a Content Delivery Network (CDN), a big data and artificial intelligence platform, and the like.

The Cloud technology (Cloud technology) is a hosting technology for unifying series resources such as hardware, software, network and the like in a wide area network or a local area network to realize calculation, storage, processing and sharing of data. The cloud technology is based on the general names of network technology, information technology, integration technology, management platform technology, application technology and the like applied in the cloud computing business model, can form a resource pool, is used as required, and is flexible and convenient. Cloud computing technology will become an important support. Background services of the technical network system require a large amount of computing and storage resources, such as video websites, picture-like websites and more web portals. With the high development and application of the internet industry, each article may have an own identification mark and needs to be transmitted to a background system for logic processing, data of different levels can be processed separately, and various industry data need strong system background support and can be realized only through cloud computing.

In some embodiments, the servers described above may also be implemented as nodes in a blockchain system.

In combination with the above noun introduction and application scenario, the method for training the image recognition model provided in this application is described, where the method may be executed by a server or a terminal, or may be executed by both the server and the terminal, and in this embodiment of this application, the method is described as an example executed by the server, as shown in fig. 3, the method includes the following steps.

Step 301, a sample image set is obtained.

The sample image set comprises sample images marked with sample labels, and the sample labels are used for indicating the inclusion condition of the target image content in the sample images.

Illustratively, when the sample image is implemented as a pathological image, the sample label is used to indicate the inclusion of a pathological portion in the pathological image, and when the pathological portion is included in the pathological image, the sample label is also used to indicate an image region in which the pathological portion is located in the pathological image.

When the sample image is implemented as a traffic collection image, the sample tag is used to indicate the inclusion of a vehicle in the traffic collection image, and when a vehicle is included in the traffic collection image, the sample tag is also used to indicate the identity of the vehicle in the traffic collection image, such as: and acquiring the license plate number of the vehicle in the image by the traffic. The traffic collected image refers to an image collected by traffic camera equipment.

When the sample image is implemented as a home video image, the sample label is used to indicate the inclusion of the living body in the home video image, and when the living body is included in the home video image, the sample label is also used to indicate the type of the living body in the home video image, such as: the home video image includes a pet (cat).

In some embodiments, the images in the sample image set are of the same type, such as: all belong to pathological images, or the image types in the sample image set are different, such as: there are parts of the pathology image, and parts of the traffic acquisition image.

In some embodiments, the sample image set is an image set obtained from an open data set, or the sample image set is an image set composed of image data authorized and uploaded by a user, which is not limited in this embodiment.

The sample label marked by the sample image can be marked in the following way: after a sample collector collects a sample image, distinguishing target image content in the sample image, and labeling the sample image with a label; or when the sample image is realized as a pathological image, acquiring an image diagnosed and labeled by a doctor, labeling the pathological image according to the diagnosis result of the doctor, and labeling the pathological area according to the diagnosis of the doctor when the pathological image is positive, namely the pathological area exists in the pathological image; or, the sample image is input into a recognition model trained in advance, and the prediction result is output as a sample label, where in this case, the sample label is implemented as a pseudo label.

That is, taking a pathological image as an example for description, when the pathological image is negative, that is, there is no pathological area in the pathological image, the overall label of the pathological image is labeled as "negative"; when the pathological image is positive, namely a pathological area exists in the pathological image, the overall label of the pathological image is marked as 'positive', and the image area with the pathological area in the pathological image is marked.

Step 302, performing region segmentation on the sample images in the sample image set to generate a first sample slice packet and a second sample slice packet.

And carrying out region segmentation on the sample image to obtain corresponding sample slices, and combining the sample slices in different forms to correspondingly form different sample slice packets.

Taking a pathological image as an example, the pathological image is sliced, the local slices (Patch) are reasoned, and the reasoned results of all the local slices are integrated to give the final reasoned result of the whole pathological image. Let Patch be considered an Instance (Instance) and the pathology image be considered a package (Bag), i.e., one Bag contains multiple instances. If at least one Instance is judged to be positive, the whole packet is positive; conversely, if all instances are judged to be negative, the packet is negative.

In some embodiments, performing image region segmentation on the sample image comprises at least one of the following ways.

Firstly, performing equal-size segmentation on the whole sample image to obtain sample image slices with consistent image sizes, wherein the sample image slices at the edge positions complement the edges with insufficient sizes in a blank form so as to obtain edge sample slices with consistent sizes with other sample slices.

And secondly, performing equal-size segmentation on the middle area of the sample image to obtain sample image slices with consistent image sizes, wherein the sample image slices with consistent image sizes in the sample image range are cut from the edge position of the sample image, and the non-collected image parts are discarded.

And thirdly, determining a minimum rectangular range containing the sample image as a sample extended image, performing equal-size segmentation on the sample extended image to obtain candidate sample slices, discarding the candidate sample slices without image content in the candidate sample slices, and taking the reserved candidate sample slices as sample image slices.

Referring to fig. 4, a schematic diagram of an image slicing process provided by an exemplary embodiment of the present application is shown, as shown in fig. 4.

Using a foreground detection algorithm, detecting a foreground region of the sample image 400, i.e. the sample extended image 410, and performing equal-size segmentation on the sample extended image 410 to obtain a sample image slice 420.

The foreground detection refers to a process of identifying a region of interest or identifying a region with higher visibility and higher significance in the sample image 400. The sample expanded image 410 is divided into equal sizes, dense slice extraction is performed, 10000 sample image slices 420 are extracted, and each sample image slice 420 has a size of 224 × 224.

It should be noted that the above dividing method of the sample slice is only an illustrative example, and the embodiment of the present application does not limit the obtaining method of the sample slice.

The foreground detection process of the sample image can be omitted, and the sample image is directly subjected to equal-size segmentation; the number of extracted sample image slices can be arbitrary, and the size of the sample image slices can be arbitrary; this embodiment is not limited thereto.

In some embodiments, after segmenting the sample image into sample image slices, composing the sample slice packets includes at least one of the following.

1) And summarizing the sample image slices belonging to the same sample image into the same sample slice packet to obtain the sample slice packet corresponding to each sample image.

Then, the sample label marked by the sample image is the corresponding package label of the sample slice packet, such as: if the sample image is marked as 'negative', the sample slice packet is correspondingly marked as 'negative'; and if the sample image is marked as 'positive', the sample slice packet is correspondingly marked as 'positive', the inclusion condition of the target image content in each sample image slice obtained by dividing the sample image is correspondingly determined according to the region where the target image content marked in the sample image is located, and a slice label is marked for the sample image slice. Such as: if the area of the sample image where the target image is marked is the area 1 in the sample image, and the sample image slice a includes all or part of the area 1, it is considered that the sample image slice a includes the target image content.

In some embodiments, when the exemplar label of the exemplar image indicates that the target image content is not included in the exemplar image, then the exemplar slice packet only requires the package label, i.e., the package label for indicating that the target image content is not included in the exemplar slice packet, without the need to label the slice label for the exemplar image slice any more.

When the sample image slice includes the target image content, the slice label of the sample image slice may be at a category level or at a pixel level, and when the slice label is implemented at the category level, that is, the slice label represents whether the target image content is included in the sample image slice, for example: labeling a sample image slice of a pathological image with 'positive' or 'negative', optionally, if the sample image slice includes target image content, a slice label is 'P', namely positive, and if the sample image slice does not include the target image content, the slice label is 'N', namely negative; when the slice label is implemented at the pixel level, the slice label characterizes where the target image content is located in the sample image slice, such as: the target image content fills the sample image slice, or alternatively, the target image content is located in region 2 of the sample image slice, and in some embodiments, the pixel-level label can also characterize the pixel fraction of the target image content in the sample image slice, such as: the number of pixels in region 2 is proportional to the total pixels in the sample image slice.

2) After each sample image is divided into sample image slices, a slice set of the sample image slices is obtained, and n sample image slices are randomly obtained from the slice set to form a sample slice packet, wherein n is a preset positive integer.

That is, the sample image slices in the same sample slice packet are from the same or different sample images.

Determining a package label of the sample slice packet according to the slice label corresponding to the sample image slice; or, determining a package label of the sample slice packet according to a sample label of a source sample image of the sample image slice, such as: the sample labels of the sample images from which the sample image slices are derived all indicate that no target image content exists, and thus the sample image slices do not naturally include the target image content, and the packet labels of the sample slice packets are used to indicate that no target image content is included.

When the sample image from which the sample image slice comes includes the target image content, the package label of the sample slice packet needs to be determined according to the slice label. Taking the pathological image as an example, when the slice labels of all the sample image slices are "negative", the package label of the slice package is "negative"; when the slice label of a specimen image slice present in the slice packet is "positive", the packet label of the slice packet is "positive".

Since the label exists for the sample image and is used for indicating the inclusion condition of the target image content in the sample image, after the sample image is subjected to region segmentation, the slice label of the sample image slice is determined according to the sample label of the sample image, such as: taking the pathological image as an example, when the sample label of the sample image is "negative", the slice labels of all sample image slices are naturally "negative"; and when the sample label of the sample image is positive, determining whether the sample image slice is negative or positive according to the position of the target image area marked by the sample image.

3) The method comprises the steps of summarizing sample image slices belonging to the same sample image into the same sample slice packet, and randomly obtaining n sample image slices from a slice set to form the sample slice packet, namely, the sample slice packet comprises the sample slice packet obtained by dividing the same sample image and the sample slice packet formed by sample image slices obtained by dividing different images.

Optionally, the sample image slices in the first sample slice packet are acquired from different sample images, that is, different sample images are subjected to region segmentation, and the obtained sample image slices form the first sample slice packet; the sample image slices in the second sample slice packet are acquired from the same sample image, that is, the same sample image is subjected to region segmentation, and the obtained sample image slices form the second sample slice packet.

The sample image slices in the first sample slice packet are marked with slice labels, and the slice labels correspond to the sample labels of the sample images; the second specimen slice packet has a packet label corresponding to the specimen image, i.e., a specimen label, and the specimen image slices in the second specimen slice packet are not labeled with a slice label.

In this embodiment of the application, if at least one sample image slice in the second sample slice packet is determined to be positive, the entire second sample slice packet is positive, that is, the sample image corresponding to the second sample slice packet is positive; if the second sample slice packet does not contain the sample image slice, the second sample slice packet is judged to be positive, and the whole second sample slice packet is negative.

In step 303, a feature analysis is performed on the specimen image slice in the first specimen slice packet through the image recognition model, and a first loss value is determined based on a difference between the slice label and a slice analysis result.

The sample image slices in the first sample slice packet are marked with slice labels, the sample image slices in the first sample slice packet are input into an image recognition model, and corresponding slice analysis results are output through model recognition.

In some embodiments, the first loss value is determined based on a difference between a slice label of a specimen image slice annotation in the first specimen slice packet and a corresponding slice analysis result. That is, the sample image slice in the first sample slice packet is input to the image recognition model, the slice analysis result of the sample image slice in the first slice packet is output, the slice analysis result and the slice label marked on the sample slice image are subjected to loss calculation, and the first loss value is determined.

And step 304, performing feature analysis on the second sample slice packet through the image recognition model, and determining a second loss value corresponding to the sample image and a third loss value corresponding to the second sample slice packet based on the difference between the sample label and the slice packet analysis result.

The second sample slice group is provided with a bag label corresponding to the sample image, namely a sample label, and the sample image slices in the second sample slice group are not labeled with the slice label. And inputting the sample image slice in the second sample slice packet into an image recognition model, and outputting a corresponding slice packet analysis result after model recognition.

In some embodiments, the second loss value is determined based on a difference between a sample label labeled with the second sample slice packet and a corresponding slice packet analysis result. Namely, the specimen image slices in the second specimen slice packet are input into the image recognition model, then the slice packet analysis results of the second specimen slice packet are output, the slice packet analysis results and the specimen labels are subjected to loss calculation, and a second loss value is determined.

And selecting K sample image slices from the second sample slice packet according to the size of the attention expression of each sample image slice in each second sample slice packet to form a candidate sample slice packet. Each specimen image slice in the candidate specimen slice packet has a slice label corresponding to the specimen label of the second specimen slice packet.

Wherein the attention representation is obtained by performing attention analysis on each sample image slice in the second sample slice packet by an attention layer in the image recognition model.

In some embodiments, the third loss value is determined based on a difference between a slice label of a sample image slice annotation in the candidate sample slice set and a corresponding slice analysis result. Namely, after the specimen image slice in the candidate specimen slice packet is input to the image recognition model, the slice analysis result of the specimen image slice in the candidate slice packet is output, and the slice analysis result and the slice label marked on the specimen slice image are subjected to loss calculation to determine a third loss value.

It should be noted that the number of sample image slices in the candidate sample slice group may be any, that is, K may be any value, which is not limited in this embodiment.

It should be noted that the

steps

303 and 304 are two parallel steps, the step 303 may be executed first and then the step 304 is executed, the step 304 may be executed first and then the step 303 is executed, and the step 303 and the step 304 may also be executed simultaneously, which is not limited in this embodiment.

Step 305, training the image recognition model based on the first loss value, the second loss value and the third loss value.

The trained image recognition model is used for recognizing the target image content in the image.

Training the image recognition model based on the first loss value, the second loss value, and the third loss value includes, but is not limited to, one of the following ways.

1. And directly fusing the first loss value, the second loss value and the third loss value to obtain a loss value sum, and training the image recognition model based on the loss value sum.

2. And performing weighted fusion on the first loss value, the second loss value and the third loss value, namely fusing the first loss value, the second loss value and the third loss value by adopting respective corresponding weights, calculating a weighted sum to obtain a loss value sum, and training the image recognition model based on the loss value sum.

Optionally, when adjusting model parameters of the image recognition model, adjusting parameters of different feature extraction layers, feature classification layers and attention layers in the image recognition model according to the sum of loss values; and adjusting parameters of other network layers in the image recognition model according to the sum of the loss values.

It is noted that the way of training the image recognition model based on the first loss value, the second loss value and the third loss value may be arbitrary; the loss value sum calculation method of the first loss value, the second loss value, and the third loss value may be arbitrary; any kind of network layer can be included in the image recognition model; the method for adjusting the parameters of each network layer in the image recognition model according to the sum of the loss values is arbitrary; this embodiment is not limited thereto.

In summary, in the method provided in this embodiment, in the training process of the image recognition model, for the image to be slice-recognized, the sample image slice and the slice packet corresponding to the sample image are respectively input to the image recognition model, and based on the output result of the image recognition model, the sample label and the slice label, the corresponding loss value is calculated, and the image recognition model is trained. The overall accuracy of the identification sample image is improved, the identification accuracy of the target image content in the sample image slice is improved, the result output by the image identification model can be better visualized, and the problem of the slice level can be solved by the trained image identification model.

In an optional embodiment, the sum of the loss values is obtained by calculating the weighted sum of the first loss value, the second loss value and the third loss value, and the image recognition model is trained based on the sum of the loss values, so that the recognition accuracy and the accuracy of the image recognition model for recognizing the target image content are improved.

In an optional embodiment, the image recognition model comprises an attention layer, a first feature classification layer and a second feature classification layer, and the process of training the image recognition model comprises two modules, namely a patch strong supervision module and a WSI weak supervision module. The two modules are simultaneously carried out in the training process of the image recognition model. Fig. 5 is a flowchart of a training method for an image recognition model according to another exemplary embodiment of the present application, and illustrates a training process of a patch strong supervision module on the image recognition model, where the method may be executed by a server or a terminal, or by both the server and the terminal, and in this embodiment of the present application, the method is executed by the server as an example, as shown in fig. 5, the step 303 may be implemented as the following step.

Step 3031, extracting the features of the sample image slices in the first sample slice packet through the feature extraction model to obtain a first feature representation of the sample image slices.

The feature extraction model is a model obtained by pre-training images of the same type as the sample images, and is also a feature extraction layer in the image recognition model.

The number of specimen image slices in the first specimen image slice packet may be arbitrary, the specimen image slices in the first specimen image slice packet may be from different specimen images, and each specimen image slice is labeled with a slice label.

Alternatively, when the sample image slices are mixed and distributed and constitute the first sample slice group, at least one of the following distribution manners may be used.

First, at least one sample image slice is selected from the sample image slices of each sample image and is allocated to the same packet to obtain a sample slice packet.

Wherein the number of sample image slices acquired from each sample image is the same or different.

And secondly, mixing sample image slices of different sample images to obtain a slice set, and randomly acquiring n sample image slices from the slice set to form a first sample slice packet.

Thirdly, a partial sample image slice is obtained from each sample image classified by different labels to form a first sample slice packet.

It should be noted that the distribution manner of the sample slice packets is only an illustrative example, and the present embodiment does not limit the distribution manner.

Alternatively, taking the example that the sample image is a pathological image, the sample image slices in the first sample slice packet are obtained by segmenting the pathological image, and the slice label of each sample image slice is used to indicate whether the sample image slice is "positive" or "negative", that is, to indicate whether the target image content is included in the sample image slice. The slice label of each sample image slice is "P" or "N" representing positive or negative, respectively; that is, if the target image content is included in the sample image slice, the slice label is "P", and if the target image content is not included in the sample image slice, the slice label is "N".

Alternatively, in the first sample slice group, there are K1 sample image slices with slice label "P" and K2 sample image slices with slice label "N".

And inputting the sample image slices in the first sample slice packet into a feature extraction model, performing feature analysis on the sample image slices by the feature extraction model, and outputting first feature representations corresponding to the sample image slices.

Schematically, a first feature corresponding to a sample image slice labeled "P" is denoted as P _ Fp1, and has a size K1 × 2048; the first feature corresponding to the sample image slice labeled "N" is denoted as N _ Fp1 and has a size K2 × 2048.

It is noted that the sample image may be any type of image, and the sample image slices in the first sample slice packet may be from any sample image; the mode of segmenting the sample image to obtain the sample image slice can be any mode; this embodiment is not limited thereto.

The slice labels corresponding to the sample image slices can be of any kind, and the number of the sample image slices in the first sample slice packet can be any; when the slice label of each sample image slice is used to indicate that the sample image slice is "positive" or "negative", the number of sample image slices with slice label "P" in the first sample slice group may be arbitrary, and the number of sample image slices with slice label "N" may be arbitrary; after the sample image slice passes through the feature extraction model, the size of the first feature representation output by the feature extraction model can be any; this embodiment is not limited thereto.

Step 3032, performing attention analysis on the first feature representation to obtain a first slice attention representation (Ap) and a first slice feature (Fp) representation.

The image recognition model comprises an attention layer, and the first feature representation corresponding to the sample image slice in the first sample slice group is input into the attention layer for attention analysis, so as to obtain a first slice attention representation and a first slice feature representation corresponding to the first feature representation.

Attention Mechanism (Attention Mechanism) stems from the study of human vision. In cognitive science, humans selectively focus on a portion of all information while ignoring other visible information due to bottlenecks in information processing. The above mechanism is commonly referred to as an attention mechanism. Different parts of the human retina have different degrees of information processing ability, i.e., acuity (Acuity), with only the foveal part having the strongest Acuity. In order to make reasonable use of limited visual information processing resources, a human needs to select a specific portion in a visual region and then focus on it.

The attention layer selectively focuses on the classification belonging to the positive classification or the classification belonging to the negative classification in the first feature representation by applying an attention mechanism, so as to obtain the attention distribution corresponding to the sample image slice in the first sample slice packet.

In the first sample slice group, there are K1 sample image slices with slice label "P" and K2 sample image slices with slice label "N".

The first slice attention corresponding to the sample image slice labeled "P" is denoted as P _ Ap and has a size of K1 × 1, and the first slice attention corresponding to the sample image slice labeled "N" is denoted as N _ Ap and has a size of K2 × 1.

The first slice feature corresponding to the sample image slice labeled "P" is denoted as P _ Fp and has a size of K1 × 512, and the first slice feature corresponding to the sample image slice labeled "N" is denoted as N _ Fp and has a size of K2 × 512.

It should be noted that, in the first specimen-image package, the number of specimen-image slices with a slice label "P" may be arbitrary, and the number of specimen-image slices with a slice label "N" may be arbitrary; after the first feature representation corresponding to the sample image slice passes through the attention layer, the size of the first slice attention representation and the first slice feature representation output by the attention layer can be any; this embodiment is not limited thereto.

Step 3033, feature calculation is carried out on the first slice attention representation and the first slice feature representation to obtain a second feature representation.

In some embodiments, since the first slice representation of the sample image slice in the first sample slice packet is a feature representation at a sample image slice level, and the sample image slice in the first sample slice packet is also labeled with a slice label, the first slice feature representation may be directly used as the second feature representation of the sample image slice in the first sample slice packet.

In some embodiments, the sample image slices in the first sample slice packet are not labeled with slice labels, so feature calculation needs to be performed on the first slice attention representation and the first slice feature representation to obtain a second feature representation of the sample image slices in the first sample slice packet.

In this embodiment, the first slice feature representation is directly used as the second feature representation of the sample image slice in the first sample slice group.

And step 3034, performing classification analysis on the second feature representation to obtain a first slice analysis result of the sample image slice in the first sample slice packet.

The image recognition model comprises a first feature classification layer, second feature representations corresponding to the sample image slices in the first sample slice packet are input into the first feature classification layer for classification analysis, and a first slice analysis result of the sample image slices in the first sample slice packet is obtained.

In the first sample slice group, there are K1 sample image slices with slice label "P" and K2 sample image slices with slice label "N". The second slice feature corresponding to the sample image slice labeled "P" is denoted as P _ Fp and has a size of K1 × 512, and the second slice feature corresponding to the sample image slice labeled "N" is denoted as N _ Fp and has a size of K2 × 512.

And performing classification analysis on the second slice feature representations P _ Fp and N _ Fp by using a first feature classification layer contained in the image recognition model to obtain a first slice analysis result of the sample image slices in the first sample slice packet, wherein the first slice analysis result comprises the analysis result of each sample image slice in the first sample slice packet.

Step 3035, a first loss value is determined based on the difference between the slice label and the first slice analysis result.

And marking the specimen image slices in the first specimen slice packet with slice labels, calculating the difference between the slice labels of the specimen image slices in the first specimen slice packet and the analysis result of the first slices, and determining a first loss value. Wherein the first loss value is used for training the image recognition model.

The image recognition model comprises a first feature classification layer and an attention layer, and the image recognition model is trained based on a first loss value, namely the first feature classification layer and the attention layer which are contained in the image recognition model are trained.

It is noted that the manner of calculating the difference between the slice label of the specimen image slice in the first specimen slice packet and the first slice analysis result may be arbitrary, and the network layer included in the image recognition model may be of any kind, including but not limited to an attention layer and a feature classification layer; the mode of training the image recognition model based on the first loss value may be arbitrary; this embodiment does not limit this.

In summary, in the method provided in this embodiment, in the training process of the image recognition model, for the image that needs to be slice-recognized, the slice packet corresponding to the sample image is input to the image recognition model, and based on the output result of the image recognition model and the slice label, the corresponding loss value is calculated, so as to train the image recognition model. The overall accuracy of identifying the sample image is improved, and meanwhile, the accuracy of identifying the target image content in the sample image slice is improved, so that the problem of the slice level of the trained image identification model can be solved.

In an optional embodiment, feature extraction is performed on a sample image slice through a feature extraction layer in an image recognition model to obtain a first feature representation of the sample image slice, further, feature analysis is performed on the first feature representation of the sample image slice to obtain a first slice analysis result of the sample image slice, and a loss value is calculated based on the first slice analysis result and a slice label of the sample image slice, so that an effect of training the image recognition model can be improved.

In an alternative embodiment, the accuracy and precision of the output result of the image recognition model can be improved by performing attention analysis on the first feature representation of the sample image slice to obtain the first slice attention representation and the first slice feature representation of the sample image slice, calculating a second feature representation of the sample image slice based on the first slice attention representation and the first slice feature representation, and performing classification analysis on the sample image slice based on the second feature representation to obtain the first slice analysis result of the sample image slice.

Fig. 6 is a flowchart of a training method for an image recognition model according to another exemplary embodiment of the present application, and illustrates a training process of an image recognition model by a WSI weak supervision module, where the method may be performed by a server or a terminal, or may be performed by both the server and the terminal, and in this embodiment of the present application, the method is performed by the server as an example, as shown in fig. 6, and step 304 described above may be implemented as the following step.

Step 3041, extracting features of the sample image slice in the second sample slice packet through the feature extraction model, so as to obtain a third feature representation of the sample image slice.

The feature extraction model is a model obtained through image pre-training of the same type as the sample image, and is also a feature extraction layer in the image recognition model.

The number of specimen image slices in the second specimen slice packet may be any, the specimen image slices in the second specimen slice packet are from the same specimen image, and the second specimen slice packet is labeled with a packet label, that is, a specimen label corresponding to the specimen image.

Optionally, taking the example that the sample image is a pathological image, the sample image slices in the second sample slice packet are obtained by segmenting the same pathological image, and the sample labels labeled in the second sample slice packet are used to indicate that the sample image slices in the second sample slice packet are totally "positive" or "negative", that is, to indicate whether the sample image slices in the second sample slice packet generally include the target image content. The sample label marked on the second sample slice packet is 'P' or 'N', and respectively represents positive or negative; that is, if the second sample slice packet includes the target image content, the sample label is "P", and if the second sample slice packet does not include the target image content, the sample label is "N".

Optionally, the number of sample image slices in the second sample slice packet is M.

And inputting the sample image slice in the second sample slice packet into a feature extraction model, performing feature analysis on the sample image slice by the feature extraction model, and outputting a third feature representation corresponding to the sample image slice.

Schematically, the third feature corresponding to the sample image slice is denoted as F3 and has a size of M × 2048.

It is noted that the sample image corresponding to the second sample slice packet may be any type of image; the mode of segmenting the sample image to obtain the sample image slice can be any mode; this embodiment is not limited thereto.

The sample labels corresponding to the second sample slice packet can be of any type, and the number of sample image slices in the second sample slice packet can be any; after the sample image slice passes through the feature extraction model, the size of the third feature representation output by the feature extraction model can be any; this embodiment is not limited thereto.

Step 3042, perform an attention analysis on the third feature representation to obtain a second slice attention representation and a second slice feature representation.

The image recognition model comprises an attention layer, and the third feature representation corresponding to the sample image slice in the second sample slice packet is input into the attention layer for attention analysis, so that the second slice attention representation and the second slice feature representation corresponding to the third feature representation are obtained.

Illustratively, in the second sample slice packet, the number of sample image slices in the second sample slice packet is M.

The attention of the second slice corresponding to the sample image slice in the second sample slice packet is denoted as a and has a size of M × 1, and the feature of the second slice corresponding to the sample image slice in the second sample slice packet is denoted as F and has a size of M × 512.

It should be noted that, after the third feature representation corresponding to the sample image slice in the second sample slice packet passes through the attention layer, the size of the second slice attention representation and the second slice feature representation output by the attention layer may be arbitrary; this embodiment is not limited thereto.

Step 3043, feature aggregation is performed on the second slice attention representation and the second slice feature representation to obtain a fourth feature representation.

Wherein the Sigmoid function is used in the feature aggregation of the second slice attention representation and the second slice feature representation.

Sigmoid function is a common biological Sigmoid function, also called sigmoidal growth curve. In the information science, due to the properties of single increment and single increment of an inverse function, a Sigmoid function is often used as an activation function of a neural network, and variables are mapped between (0,1).

The feature aggregation is performed on the second slice attention representation and the second slice feature representation, and the process of obtaining the fourth feature representation comprises the following steps.

1. A third slice attention representation of the sample image slice in the second sample slice set is obtained by normalizing the second slice attention representation.

The normalization processing uses a sigmoid function, that is, the normalization processing is performed on the second slice attention representation by using the sigmoid function, at this time, the second slice attention is normalized to the range of (0,1), and a third slice attention representation of the sample image slice in the second sample slice packet is obtained, and the third slice attention is represented as a _ sigmoid.

Schematically, the normalization process uses a calculation as shown in equation one below.

The formula I is as follows:

where a is the second slice attention representation and a _ sigmoid is the third slice attention representation.

2. And performing linear normalization processing on the third slice attention representation to obtain a fourth slice attention representation of the sample image slice in the second sample slice group.

Each sample image slice in the second sample slice packet has a respective third slice attention representation, and all third slice attention representations are summed to obtain a sum of the third slice attention representations. Taking the sum as a denominator, and expressing the attention of each third slice as a numerator, a fourth slice attention expression of each sample image slice in the second sample slice group is obtained, and the fourth slice attention is expressed as a _ sigmoid _ norm.

Schematically, the linear normalization process uses a calculation as shown in equation two below.

The formula II is as follows:

here, sum (a _ sigmoid) is the Sum of the third slice attention expression, a _ sigmoid is the third slice attention expression, and a _ sigmoid _ norm is the fourth slice attention expression.

3. And performing multiplication processing based on the fourth slice attention representation and the second slice feature representation to obtain a fourth feature representation of the second sample slice packet.

Each sample image slice in the second slice packet has a fourth slice attention representation and a second slice feature representation, respectively.

Wherein the fourth slice attention representation of each sample image slice is combined into a fourth slice attention representation of the second sample slice packet, and the second slice feature representation of each sample image slice is combined into a second slice feature representation of the second sample slice packet.

And multiplying the fourth slice attention representation of each sample image slice in the second sample slice packet by the second slice feature representation to obtain a fourth feature representation of each sample image slice in the second sample slice packet, calculating the sum of the fourth feature representations to obtain a fourth feature representation corresponding to the sample image slice in the second sample slice packet, namely a fourth feature representation of the second sample slice packet, wherein the fourth feature representation is F4.

Schematically, the calculation method used in the multiplication process is shown in the following formula three.

The formula III is as follows:

where F4 is the fourth feature representation, a _ sigmoid _ norm is the fourth slice attention representation, and F is the second slice feature representation.

Illustratively, the number of sample image slices in the second sample slice group is M, the fourth slice attention indicates a size Mx1, the second slice feature indicates a size Mx512, and the fourth feature indicates a size 1x512.

It is to be noted that the above formula one, formula two, and formula three may be any type of calculation formula; the method of feature aggregation for the second slice attention representation and the second slice feature representation may be arbitrary, and any type of function may be used in the feature aggregation process; the number of sample image slices in the second sample slice packet may be arbitrary, and the size of the second slice feature representation, the size of the fourth slice attention representation, and the size of the fourth feature representation of the sample image slices in the second sample slice packet may be arbitrary; this embodiment is not limited thereto.

Step 3044, perform a classification analysis on the fourth feature representation to obtain a second slice packet analysis result of the sample image slice in the second sample slice packet.

The image recognition model comprises a first feature classification layer, fourth feature representations corresponding to the sample image slices in the second sample slice packet are input into the first feature classification layer for classification analysis, and a second slice packet analysis result of the sample image slices in the second sample slice packet is obtained.

Illustratively, the number of the sample image slices in the second sample slice packet is M, the corresponding fourth feature is denoted as F4, and the fourth slice feature is classified and analyzed by using the first feature classification layer included in the image recognition model, so as to obtain a second slice packet analysis result of the sample image slices in the second sample slice packet, where the second slice packet analysis result is an analysis result of a sum of each sample image slice in the second sample slice packet.

Step 3045, a second loss value is determined based on the difference between the specimen label and the second slice packet analysis result.

The second specimen slice packet is labeled with a packet label, i.e., a specimen label, and a difference between the specimen label of the second specimen slice packet and the analysis result of the second specimen slice packet is calculated to determine a second loss value. And the second loss value is used for training the image recognition model.

The image recognition model comprises a first feature classification layer and an attention layer, and the image recognition model is trained based on the second loss value, namely the first feature classification layer and the attention layer which are contained in the image recognition model are trained.

It is noted that the manner of calculating the difference between the sample label of the second sample slice packet and the analysis result of the second slice packet may be arbitrary, and the network layer included in the image recognition model may be of any kind, including but not limited to the attention layer and the feature classification layer; the mode of training the image recognition model based on the second loss value may be arbitrary; this embodiment is not limited thereto.

Step 3046, based on the corresponding value of the second slice attention expression, selecting the K sample image slices with the highest value as the candidate sample slice group.

And each sample image slice in the second sample slice group is provided with a second slice attention representation, sorting is carried out according to the values of the second slice attention representations, and the K slices with the highest values are selected as candidate sample slice groups. Wherein K is a preset value.

Generally, if the second specimen slice packet includes the target image content, that is, if the packet label of the second specimen slice packet is "positive", the target image content appears in the specimen image slice with the highest attention indication value of the second slice; that is, the target image content is considered to be located in the candidate specimen slice packet, so the specimen image slices in the candidate specimen slice packet have slice labels that correspond to the package labels of the second specimen slice packet, i.e., the specimen image slices in the candidate specimen slice packet are labeled with slice labels that correspond to the specimen labels of the second specimen slice packet.

Illustratively, K is 8, 8 specimen image slices with the highest second slice attention representation value are selected to form candidate specimen slice packets, and each specimen image slice is labeled with a slice label identical to the specimen label of the second specimen slice packet.

It should be noted that the value of K may be arbitrary, and this embodiment is not limited thereto.

Step 3047, a fifth feature representation of the sample image slice in the candidate sample slice set is obtained based on the second slice feature representation of the sample image slice in the candidate sample slice set.

Optionally, a fifth feature representation of the sample image slice in the candidate sample slice set is obtained, including but not limited to one of the following ways.

1. And calculating a fifth feature representation on the second slice feature representation by using a preset function based on the second slice feature representation of the sample image slice in the candidate sample slice packet.

2. The second slice feature representation of the sample image slice in the candidate sample slice set is directly used as the fifth feature representation.

Schematically, in the present embodiment, the second slice feature representation of the sample image slice in the candidate sample slice packet is directly used as the fifth feature representation, the fifth feature representation is F5, and the size of the fifth feature representation is K × 512.

It should be noted that the above-mentioned manner of obtaining the fifth feature representation of the sample image slice in the candidate sample slice group is only an example; if the fifth feature representation is obtained by calculating the second slice feature representation by using a preset function, the preset function can be any type of function; the size of the fifth feature representation may be arbitrary; this embodiment does not limit this.

Step 3048, perform a classification analysis on the fifth feature representation to obtain a third slice analysis result of the sample image slice in the candidate sample slice group.

The image recognition model comprises a second feature classification layer, fifth feature representations corresponding to the sample image slices in the candidate sample slice group are input into the second feature classification layer for classification analysis, and a third slice analysis result of the sample image slices in the candidate sample slice group is obtained.

Illustratively, the number of sample image slices in the candidate sample slice group is 8, the fifth feature corresponding to the sample image slice is denoted as F5, and the size of the fifth feature is denoted as 8 × 512.

And performing classification analysis on the fifth slice feature representation F5 by using a second feature classification layer contained in the image recognition model to obtain a third slice analysis result of the sample image slice in the candidate sample slice group, wherein the third slice analysis result comprises the analysis result of each sample image slice in the candidate sample slice group.

Step 3049, determining a third loss value based on the difference between the slice label and the third slice analysis result.

And marking the sample image slices in the candidate sample slice group with slice labels, calculating the difference between the slice labels of the sample image slices in the candidate sample slice group and the third slice analysis result, and determining a third loss value. Wherein the third loss value is used for training the image recognition model.

The image recognition model comprises a second feature classification layer, and the image recognition model is trained based on a third loss value, namely the second feature classification layer contained in the image recognition model is trained.

It is noted that the way of calculating the difference between the slice label of the sample image slice in the candidate sample slice packet and the third slice analysis result may be arbitrary, and the network layer included in the image recognition model may be of any kind, including but not limited to a feature classification layer; the mode of training the image recognition model based on the third loss value can be any; this embodiment is not limited thereto.

It should be noted that the steps 3043 to 3045 and the steps 3046 to 3049 are two parallel steps, and the steps 3043 to 3045 may be executed first and then the steps 3046 to 3049, or the steps 3046 to 3049 may be executed first and then the steps 3043 to 3045, or the steps 3043 to 3045 and the steps 3046 to 3049 may be executed simultaneously, which is not limited in this embodiment.

In the method provided by this embodiment, in the training process of the image recognition model, for an image that needs to be slice-recognized, a sample image slice and a slice packet corresponding to the sample image are input to the image recognition model, and based on an output result of the image recognition model and a sample label and a slice label, a corresponding loss value is calculated, so as to train the image recognition model. The whole accuracy of the identification sample image is improved, the identification accuracy of the target image content in the sample image slice is improved, the result output by the image identification model can be better visualized, and the problem of slice package level and the problem of slice level can be solved by the trained image identification model.

In an optional embodiment, a third feature representation of the sample image slice is obtained by performing feature extraction on the sample image slice in the second sample slice packet, and a second slice packet analysis result of the sample image slice in the second sample slice packet is obtained by performing feature analysis on the third feature representation; and determining a second loss value based on the difference between the sample label and the second slice packet analysis result, and returning the second loss value to the image recognition model for training, so that the accuracy of the image recognition model recognition result is improved.

In an alternative embodiment, the second slice attention representation and the second slice feature representation are obtained by performing attention analysis on the third feature representation; performing feature aggregation on the second slice attention representation and the second slice feature representation to obtain a fourth feature representation; and performing classification analysis on the fourth feature representation to obtain a second slice packet analysis result of the sample image slice in the second sample slice packet, so that the accuracy of the second slice packet analysis result is improved.

In an alternative embodiment, a third slice attention representation of the sample image slices in the second sample slice packet is obtained by normalizing the second slice attention representation; obtaining a fourth slice attention representation of the sample image slice in the second sample slice packet by performing linear normalization processing on the third slice attention representation; and performing multiplication processing based on the fourth slice attention representation and the second slice feature representation to obtain a fourth feature representation of the second sample slice packet, so that the accuracy of the fourth feature representation is improved.

In an optional embodiment, the K sample image slices with the highest attention expression are selected from the second sample slice packet to form a candidate sample slice packet, and the sample image slices in the candidate sample slice packet are subjected to feature analysis and classification analysis, so that the problem of the grade of the slice packet and the problem of the grade of the slice can be solved in the process of training the image recognition model, and the accuracy and precision of the image recognition model are improved.

In an alternative embodiment, in order to better train the image recognition model, other loss values may also be used to train the image recognition model. Fig. 7 is a training method of an image recognition model according to another embodiment of the present application, which is mainly used in a training process of a patch strong supervision module, and the method may be executed by a server or a terminal, or may be executed by both the server and the terminal.

In step 701, a fifth slice attention representation of the sample image slice in the first sample slice group is obtained by performing normalization processing on the first slice attention representation.

Schematically, the first slice attention of the sample image slice in the first sample slice packet is denoted Ap.

The normalization processing uses a sigmoid function, that is, the normalization processing is performed on the first slice attention representation by using the sigmoid function, at this time, the first slice attention is normalized to the range of (0,1), a fifth slice attention representation of the sample image slice in the first sample slice packet is obtained, and the fifth slice attention representation is Ap _ sigmoid.

Schematically, the normalization process uses a calculation method as shown in the following equation four.

The formula four is as follows:

where Ap is the first slice attention representation and Ap _ sigmoid is the fifth slice attention representation.

It should be noted that any method may be used to normalize the first slice attention representation, which is not limited in this embodiment.

And step 702, performing multiplication processing based on the fifth slice attention expression and the first slice analysis result to obtain a fourth slice analysis result of the sample image slice in the first sample slice group.

Each sample image slice in the first slice grouping has a fifth slice attention representation, respectively.

And multiplying the fifth slice attention representation of each sample image slice in the first sample slice packet by the first slice analysis result to obtain a fourth slice analysis result of the sample image slice in the first sample slice packet.

It should be noted that any method may be used for the multiplication processing based on the fifth slice attention representation and the first slice analysis result, which is not limited in this embodiment.

Step 703 determines a fourth loss value based on the difference between the slice label and the fourth slice analysis result.

And labeling the sample image slices in the first sample slice group with slice labels, calculating the difference between the slice labels of the sample image slices in the first sample slice group and the analysis result of the fourth slice, and determining a fourth loss value. And the fourth loss value is used for training the image recognition model.

The image recognition model comprises an attention layer, and the image recognition model is trained based on the fourth loss value, namely the attention layer contained in the image recognition model is trained.

And step 704, training the image recognition model based on the fourth loss value.

Training the image recognition model based on the fourth loss value includes, but is not limited to, one of the following ways.

1. And directly fusing the first loss value, the second loss value, the third loss value and the fourth loss value to obtain a loss value sum, and training the image recognition model based on the loss value sum.

2. And performing weighted fusion on the first loss value, the second loss value, the third loss value and the fourth loss value, namely fusing the first loss value, the second loss value and the third loss value by adopting respective corresponding weights, calculating a weighted sum to obtain a loss value sum, and training the image recognition model based on the loss value sum.

3. And training the image recognition model based on the fourth loss value directly without considering the first loss value, the second loss value and the third loss value.

Optionally, when adjusting the model parameters of the image recognition model, the parameters of the attention layer in the image recognition model are adjusted according to the fourth loss value or the loss value sum.

It should be noted that the way of training the image recognition model based on the fourth loss value may be arbitrary; the method of calculating the sum of the first loss value, the second loss value, the third loss value and the fourth loss value may be any method; any kind of network layer can be included in the image recognition model; the method for adjusting the parameters of each network layer in the image recognition model according to the loss values is arbitrary; this embodiment does not limit this.

In conclusion, in the process of training the image recognition model, the step of training the image recognition model based on the fourth loss value is added, so that the overall accuracy of recognizing the sample image is improved, the accuracy of recognizing the target image content in the sample image slice is improved, the result output by the image recognition model can be better visualized, and the problem of slice package level and the problem of slice level can be solved by the trained image recognition model.

In an optional embodiment, the trained image recognition model may be applied to recognize target image content in an image, a sample image is divided into sample image slices to form corresponding sample slice packets, and after the sample image slices are input to the image recognition model, the image recognition model may output a recognition result of the sample image and also display a visualization result correspondingly to assist in observing the recognition result. Schematically, fig. 8 is a schematic diagram of improving the visualization effect in the training process of the image recognition model, as shown in fig. 8.

The sample image slice 800 is input into the image recognition model 810, when the image recognition model 810 performs attention analysis on the sample image slice 800, a third slice attention representation corresponding to the sample image slice 800 is obtained, and a first visualization result 820 is output, wherein the first visualization result 820 is used for representing the attention distribution situation of the sample image slice 800.

When the image recognition model 810 performs feature analysis on the second slice feature representation of the sample image slice 800, a slice anomaly probability is obtained, and a second visualization result 830 is output, where the slice anomaly probability is used to represent an anomaly probability of the sample image slice, and the second visualization result 830 is used to represent an anomaly probability distribution of the sample image slice 800.

When the second visualization result 830 output by the image recognition model 810 is observed after the sample image slice 800 is input to the image recognition model 810, a blue region in the image indicates that a region corresponding to the sample image slice 800 is "negative", and the abnormal probability is 0; a red area in the image indicates that the area corresponding to the sample image slice 800 is "positive" and the abnormal probability thereof is 1; the other color regions in the image represent different abnormal situations in the region corresponding to the sample image slice 800, and the abnormal probability thereof is within the interval (0,1). A fifth slice analysis result of the sample image slice 800 is determined based on the third slice attention representation and the slice abnormality probability.

Based on the first visualization 820 and the second visualization 830, a third visualization 840 is output. Illustratively, the sample image slice 800 has an anomaly probability of 0.95, and the third visualization result 840 is used to represent an anomaly of the sample image slice 800.

Normally, the anomaly threshold is set to be 0.5, and if the anomaly probability of the sample image slice 800 is lower than 0.5, the sample image slice 800 is indicated to be negative; if the abnormality probability of the sample image slice 800 is equal to or higher than 0.5, the sample image slice 800 is indicated as "positive".

In conclusion, the trained image recognition model can recognize the sample image slice and output a corresponding visualization result, the visualization result is high in accuracy, part of information which is high in attention expression but is not the target sample image is filtered, the sample image slice can be effectively observed in an auxiliary mode, the abnormal condition of the sample image slice is judged, and the efficiency and the accuracy of image recognition are improved.

Fig. 9 is a block diagram of a training apparatus for an image recognition model according to an exemplary embodiment of the present application, and as shown in fig. 9, the apparatus includes the following components:

an obtaining module 910, configured to obtain a sample image set, where the sample image set includes a sample image labeled with a sample label, and the sample label is used to indicate a content of a target image in the sample image;

a generating module 920, configured to perform region segmentation on sample images in the sample image set, and generate a first sample slice packet and a second sample slice packet, where sample image slices in the first sample slice packet are acquired from different sample images, sample image slices in the second sample slice packet are acquired from the same sample image, and sample image slices in the first sample slice packet are labeled with slice labels corresponding to the sample labels;

a determining module 930 configured to perform feature analysis on the sample image slices in the first sample slice group through an image recognition model, and determine a first loss value based on a difference between the slice label and a slice analysis result;

the determining module 930 is further configured to perform feature analysis on the second sample slice packet through the image recognition model, and determine a second loss value corresponding to the sample image and a third loss value corresponding to the second sample slice packet based on a difference between the sample label and a slice packet analysis result;

a training module 940, configured to train the image recognition model based on the first loss value, the second loss value, and the third loss value, where the image recognition model obtained through training is used to recognize target image content in an image.

In an alternative embodiment, as shown in fig. 10, the determining module 930 includes:

an extracting unit 931, configured to perform feature extraction on the sample image slices in the first sample slice packet through a feature extraction model to obtain a first feature representation of the sample image slices, where the feature extraction model is a model obtained through image pre-training of the same type as the sample image;

an analyzing unit 932, configured to perform feature analysis on the first feature representation to obtain a first slice analysis result of the sample image slice in the first sample slice packet;

a determining unit 933 for determining the first loss value based on a difference between the slice label and the first slice analysis result.

In an alternative embodiment, the analyzing unit 932 is further configured to perform attention analysis on the first feature representation to obtain a first slice attention representation and a first slice feature representation; performing feature calculation on the first slice attention representation and the first slice feature representation to obtain a second feature representation; and performing classification analysis on the second feature representation to obtain a first slice analysis result of the sample image slice in the first sample slice packet.

In an optional embodiment, the determining module 930 further includes:

the extracting unit 931 is further configured to perform feature extraction on the sample image slices in the second sample slice packet through the feature extraction model to obtain a third feature representation of the sample image slices;

the analysis unit 932 is further configured to perform feature analysis on the third feature representation to obtain a second slice packet analysis result of the sample image slice in the second sample slice packet;

the determining unit 933 is further configured to determine the second loss value based on a difference between the specimen label and the second slice packet analysis result.

In an alternative embodiment, the analyzing unit 932 is further configured to perform attention analysis on the third feature representation to obtain a second slice attention representation and a second slice feature representation; performing feature aggregation on the second slice attention representation and the second slice feature representation to obtain a fourth feature representation; and performing classification analysis on the fourth feature representation to obtain a second slice packet analysis result of the sample image slice in the second sample slice packet.

In an alternative embodiment, the analyzing unit 932 is further configured to obtain a third slice attention representation of a sample image slice in the second sample slice packet by performing a normalization process on the second slice attention representation; obtaining a fourth slice attention representation of a sample image slice in the second sample slice packet by performing linear normalization processing on the third slice attention representation; and performing multiplication processing based on the fourth slice attention representation and the second slice feature representation to obtain the fourth feature representation of the second sample slice packet.

In an optional embodiment, the analyzing unit 932 is further configured to select, based on a value corresponding to the second slice attention, K sample image slices with a highest value as a candidate sample slice group, where K is a preset value, and sample image slices in the candidate sample slice group are labeled with slice labels corresponding to the sample labels; obtaining a fifth feature representation of a sample image slice in the candidate sample slice packet based on the second slice feature representation of a sample image slice in the candidate sample slice packet; performing feature analysis on the fifth feature representation to obtain a third slice analysis result of the sample image slice in the candidate sample slice packet; determining the third loss value based on a difference between the slice label and the third slice analysis result.

In an optional embodiment, the training module 940 is further configured to calculate a weighted sum of the first loss value, the second loss value, and the third loss value, so as to obtain a loss value sum; and training the image recognition model based on the loss value sum.

In an optional embodiment, the training module 940 is further configured to obtain a fifth slice attention representation of the sample image slices in the first sample slice packet by performing a normalization process on the first slice attention representation; performing multiplication processing on the fifth slice attention representation and the first slice analysis result to obtain a fourth slice analysis result of the sample image slice in the first sample slice packet; determining a fourth loss value based on a difference between the slice label and the fourth slice analysis result; training the image recognition model based on the fourth loss value.

In an optional embodiment, the training module 940 is further configured to perform feature analysis on the second slice feature representation to obtain a slice anomaly probability, where the slice anomaly probability is used to represent an anomaly probability of a sample image slice in the second sample slice packet; determining a fifth slice analysis result of the sample image slice in the second sample slice packet based on the third slice attention representation and the slice abnormality probability; displaying identification content based on the fifth slice analysis result, wherein the identification content is used for representing the abnormal condition of the sample image slice in the second sample slice group.

In summary, in the device provided in this embodiment, in the training process of the image recognition model, for the image that needs to be slice-recognized, the sample image slice and the slice packet corresponding to the sample image are input to the image recognition model, and based on the output result of the image recognition model, the sample label and the slice label, the corresponding loss value is calculated, and the image recognition model is trained. The overall accuracy of the identification sample image is improved, the identification accuracy of the target image content in the sample image slice is improved, the result output by the image identification model can be better visualized, and the problem of the slice level can be solved by the trained image identification model.

It should be noted that: the training device for an image recognition model provided in the above embodiment is only exemplified by the division of the above functional modules, and in practical applications, the above function distribution may be completed by different functional modules according to needs, that is, the internal structure of the device is divided into different functional modules to complete all or part of the above described functions. In addition, the training device for the image recognition model and the training method for the image recognition model provided in the above embodiments belong to the same concept, and specific implementation processes thereof are described in detail in the method embodiments, and are not described herein again.

Fig. 11 shows a schematic structural diagram of a computer device provided in an exemplary embodiment of the present application. The computer device may be a server as shown in fig. 2.

Specifically, the method comprises the following steps: the computer apparatus 1100 includes a Central Processing Unit (CPU) 1101, a system Memory 1104 including a Random Access Memory (RAM) 1102 and a Read Only Memory (ROM) 1103, and a system bus 1105 connecting the system Memory 1104 and the Central Processing Unit 1101. The computer device 1100 also includes a mass storage device 1106 for storing an operating system 1113, application programs 1114, and other program modules 1115.

The mass storage device 1106 is connected to the central processing unit 1101 through a mass storage controller (not shown) connected to the system bus 1105. The mass storage device 1106 and its associated computer-readable media provide non-volatile storage for the computer device 1100. That is, mass storage device 1106 may include a computer-readable medium (not shown) such as a hard disk or Compact disk Read Only Memory (CD-ROM) drive.

Without loss of generality, computer readable media may comprise computer storage media and communication media. Computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data. Computer storage media includes RAM, ROM, erasable Programmable Read-Only Memory (EPROM), electrically Erasable Programmable Read-Only Memory (EEPROM), flash Memory or other solid state Memory technology, CD-ROM, digital Versatile Disks (DVD), or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage, or other magnetic storage devices. Of course, those skilled in the art will appreciate that computer storage media is not limited to the foregoing. The system memory 1104 and mass storage 1106 described above may collectively be referred to as memory.

According to various embodiments of the present application, the computer device 1100 may also operate as a remote computer connected to a network via a network, such as the Internet. That is, the computer device 1100 may connect to the network 1112 through the network interface unit 1111 that is connected to the system bus 1105, or may connect to other types of networks or remote computer systems (not shown) using the network interface unit 1111.

The memory further includes one or more programs, and the one or more programs are stored in the memory and configured to be executed by the CPU.

Embodiments of the present application further provide a computer device, which may be implemented as a terminal or a server as shown in fig. 2. The computer device comprises a processor and a memory, wherein at least one instruction, at least one program, a code set or an instruction set is stored in the memory, and the at least one instruction, the at least one program, the code set or the instruction set is loaded and executed by the processor to realize the training method of the image recognition model provided by the method embodiments.

Embodiments of the present application further provide a computer-readable storage medium, on which at least one instruction, at least one program, a code set, or a set of instructions is stored, and the at least one instruction, the at least one program, the code set, or the set of instructions is loaded and executed by a processor to implement the method for training an image recognition model provided by the above method embodiments.

Embodiments of the present application also provide a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer readable storage medium, and the processor executes the computer instructions to make the computer device execute the training method of the image recognition model in any one of the above embodiments.

Optionally, the computer-readable storage medium may include: read Only Memory (ROM), random Access Memory (RAM), solid State Drive (SSD), or optical disc. The Random Access Memory may include a resistive Random Access Memory (ReRAM) and a Dynamic Random Access Memory (DRAM). The above-mentioned serial numbers of the embodiments of the present application are merely for description and do not represent the merits of the embodiments.

It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware, where the program may be stored in a computer-readable storage medium, and the above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.

The above description is only exemplary of the present application and should not be taken as limiting, as any modification, equivalent replacement, or improvement made within the spirit and principle of the present application should be included in the protection scope of the present application.

Claims

1. A method for training an image recognition model, the method comprising:

acquiring a sample image set, wherein the sample image set comprises sample images marked with sample labels, and the sample labels are used for indicating the inclusion condition of target image contents in the sample images;

performing region segmentation on sample images in the sample image set to generate a first sample slice packet and a second sample slice packet, wherein sample image slices in the first sample slice packet are acquired from different sample images, sample image slices in the second sample slice packet are acquired from the same sample image, the second sample slice packet has a packet label corresponding to the sample image, and the sample image slices in the first sample slice packet are labeled with slice labels corresponding to the sample labels;

performing feature analysis on the second sample slice packet through the image recognition model, and determining a second loss value corresponding to the sample image based on the difference between the packet label and the slice packet analysis result;

selecting K sample image slices from the second sample slice packet to form a candidate sample slice packet, wherein each sample image slice in the candidate sample slice packet has a slice label corresponding to a packet label of the second sample slice packet, and K is a preset numerical value;

a third loss value determined based on a difference between a slice label of a sample image slice annotation in the candidate sample slice packet and a corresponding slice analysis result;

2. The method of claim 1, wherein the performing feature analysis on the specimen image slices in the first specimen slice packet by an image recognition model, determining a first loss value based on a difference between the slice label and a slice analysis result, comprises:

performing feature extraction on the sample image slices in the first sample slice packet through a feature extraction model to obtain a first feature representation of the sample image slices, wherein the feature extraction model is a model obtained through image pre-training of the same type as the sample images;

performing feature analysis on the first feature representation to obtain a first slice analysis result of the sample image slice in the first sample slice packet;

determining the first loss value based on a difference between the slice label and the first slice analysis result.

3. The method of claim 2, wherein said performing feature analysis on the first feature representation to obtain a first slice analysis result of the sample image slice in the first sample slice packet comprises:

performing attention analysis on the first feature representation to obtain a first slice attention representation and a first slice feature representation;

performing feature calculation on the first slice attention representation and the first slice feature representation to obtain a second feature representation;

and performing classification analysis on the second feature representation to obtain a first slice analysis result of the sample image slice in the first sample slice packet.

4. The method of claim 1, wherein the performing feature analysis on the second sample slice packet through the image recognition model, and determining a second loss value corresponding to the sample image and a third loss value corresponding to the second sample slice packet based on a difference between the sample label and a slice packet analysis result comprises:

performing feature extraction on the sample image slices in the second sample slice packet through a feature extraction model to obtain a third feature representation of the sample image slices, wherein the feature extraction model is a model obtained through image pre-training of the same type as the sample image;

performing feature analysis on the third feature representation to obtain a second slice packet analysis result of the sample image slice in the second sample slice packet;

determining the second loss value based on a difference between the specimen tag and the second slice packet analysis result.

5. The method of claim 4, wherein said performing feature analysis on the third feature representation to obtain a second slice packet analysis result of the sample image slice in the second sample slice packet comprises:

performing attention analysis on the third feature representation to obtain a second slice attention representation and a second slice feature representation;

performing feature aggregation on the second slice attention representation and the second slice feature representation to obtain a fourth feature representation;

and performing classification analysis on the fourth feature representation to obtain a second slice packet analysis result of the sample image slice in the second sample slice packet.

6. The method of claim 5, wherein the feature aggregating the second slice attention representation and the second slice feature representation to obtain a fourth feature representation comprises:

obtaining a third slice attention representation of a sample image slice in the second sample slice packet by performing normalization processing on the second slice attention representation;

obtaining a fourth slice attention representation of the sample image slice in the second sample slice packet by performing linear normalization processing on the third slice attention representation;

and performing multiplication processing based on the fourth slice attention representation and the second slice feature representation to obtain the fourth feature representation of the second sample slice packet.

7. The method of claim 5, wherein after performing the attention analysis on the third feature representation to obtain a second slice attention representation and a second slice feature representation, further comprising:

selecting K sample image slices with the highest numerical values as the candidate sample slice groups based on the corresponding numerical values expressed by the second slice attention;

obtaining a fifth feature representation of a sample image slice in the candidate sample slice packet based on the second slice feature representation of a sample image slice in the candidate sample slice packet;

performing feature analysis on the fifth feature representation to obtain a third slice analysis result of the sample image slice in the candidate sample slice packet;

determining the third loss value based on a difference between the slice label and the third slice analysis result.

8. The method of claim 1, wherein training the image recognition model based on the first loss value, the second loss value, and the third loss value comprises:

calculating a weighted sum of the first loss value, the second loss value and the third loss value to obtain a loss value sum;

and training the image recognition model based on the loss value sum.

9. A method according to claim 2 or 3, characterized in that the method further comprises:

obtaining a fifth slice attention representation of the sample image slice in the first sample slice packet by performing normalization processing on the first slice attention representation;

performing multiplication processing on the fifth slice attention representation and the first slice analysis result to obtain a fourth slice analysis result of the sample image slice in the first sample slice;

determining a fourth loss value based on a difference between the slice label and the fourth slice analysis result;

training the image recognition model based on the fourth loss value.

10. The method of claim 6, further comprising:

performing feature analysis on the second slice feature representation to obtain a slice abnormal probability, wherein the slice abnormal probability is used for representing the abnormal probability of a sample image slice in the second sample slice packet;

determining a fifth slice analysis result of the sample image slice in the second sample slice packet based on the third slice attention representation and the slice abnormality probability;

displaying identification content based on the fifth slice analysis result, wherein the identification content is used for representing the abnormal condition of the sample image slice in the second sample slice group.

11. An apparatus for training an image recognition model, the apparatus comprising:

the acquisition module is used for acquiring a sample image set, wherein the sample image set comprises a sample image marked with a sample label, and the sample label is used for indicating the inclusion condition of the target image content in the sample image;

the generating module is used for carrying out region segmentation on the sample images in the sample image set and generating a first sample slice packet and a second sample slice packet, wherein the sample image slices in the first sample slice packet are acquired from different sample images, the sample image slices in the second sample slice packet are acquired from the same sample image, the second sample slice packet has a packet label corresponding to the sample images, and the sample image slices in the first sample slice packet are labeled with slice labels corresponding to the sample labels;

a determining module for performing feature analysis on the sample image slice in the first sample slice packet through an image recognition model, and determining a first loss value based on a difference between the slice label and a slice analysis result;

the determining module is used for performing feature analysis on the second sample slice packet through the image recognition model and determining a second loss value corresponding to the sample image based on the difference between the packet label and the slice packet analysis result;

the device further comprises: the image acquisition module is used for selecting K sample image slices from the second sample slice packet to form a candidate sample slice packet, each sample image slice in the candidate sample slice packet has a slice label corresponding to a packet label of the second sample slice packet, and K is a preset value; a module for determining a third loss value based on a difference between a slice label of a sample image slice annotation in the candidate sample slice packet and a corresponding slice analysis result;

12. A computer device comprising a processor and a memory, wherein at least one program is stored in the memory, and the at least one program is loaded and executed by the processor to implement the method for training an image recognition model according to any one of claims 1 to 10.

13. A computer-readable storage medium, wherein at least one program is stored in the storage medium, and the at least one program is loaded and executed by a processor to implement the method for training an image recognition model according to any one of claims 1 to 10.