CN115222982A

CN115222982A - Image classification model processing method

Info

Publication number: CN115222982A
Application number: CN202210639771.XA
Authority: CN
Inventors: 许静
Original assignee: Alibaba China Co Ltd
Current assignee: Alibaba China Co Ltd
Priority date: 2022-06-08
Filing date: 2022-06-08
Publication date: 2022-10-21

Abstract

The embodiment of the specification provides an image classification model processing method, which comprises the steps of obtaining sample data of an image type and/or a text type of an initial object and a target classification result corresponding to the initial object; determining an object to be processed from the initial object; according to a preset sampling strategy, carrying out data processing on sample data of the picture type or the text type of the object to be processed to obtain at least one group of sample data of the object to be processed, wherein the group of sample data comprises the picture type and the text type; processing the image classification model according to the sample data of the image type and the text type of the object to be processed and other objects, and the target classification results corresponding to the object to be processed and other objects to obtain an image classification model; the subsequent image classification model can be applied to a skin damage classification scene, and the accuracy of subsequent skin damage classification is improved.

Description

Image classification model processing method

Technical Field

The embodiment of the specification relates to the technical field of computers, in particular to an image classification model processing method.

Background

Accurate determination of early-stage skin lesions can significantly improve survival rates of malignant tumor patients such as melanoma, and is critical for preventing skin tumors, and accurate identification of skin tumors remains a challenging task for experienced experts.

At present, dermatologists generally judge skin lesion conditions by observing and analyzing skin mirror images by naked eyes, and the like, and have poor identification accuracy and low efficiency; and in most studies, often unreliable based on imaging data alone, non-image data of the patient, such as the progression of skin lesions, long exposure times, etc., attribute information can also greatly affect the identification of melanoma in the skin.

Disclosure of Invention

In view of this, embodiments of the present specification provide an image classification model processing method. One or more embodiments of the present disclosure also relate to an image classification model processing apparatus, an image classification model processing method and apparatus for a skin disease image, a target object classification method and apparatus, a computing device, a computer-readable storage medium, and a computer program, so as to solve technical defects in the prior art.

According to a first aspect of embodiments of the present specification, there is provided an image classification model processing method, including:

acquiring sample data of an image type and/or a text type of an initial object and a target classification result corresponding to the initial object;

determining an object to be processed from the initial object, wherein the object to be processed is an object which only comprises sample data of a picture type or a text type in the initial object;

according to a preset sampling strategy, carrying out data processing on sample data of the picture type or the text type of the object to be processed to obtain at least one group of sample data of the object to be processed, wherein the group of sample data comprises the picture type and the text type;

processing an image classification model according to the sample data of the picture type and the text type of the object to be processed and other objects and the target classification result corresponding to the object to be processed and other objects to obtain the image classification model,

the other objects are other initial objects except the object to be processed in the initial objects, and the image classification model is a machine learning model.

According to a second aspect of embodiments of the present specification, there is provided an image classification model processing apparatus including:

the data acquisition module is configured to acquire sample data of a picture type and/or a text type of an initial object and a target classification result corresponding to the initial object;

the object determining module is configured to determine an object to be processed from the initial object, wherein the object to be processed is an object which only comprises sample data of a picture type or a text type in the initial object;

the data processing module is configured to perform data processing on the sample data of the picture type or the text type of the object to be processed according to a preset sampling strategy to obtain at least one group of sample data of the picture type and the text type of the object to be processed;

a model processing module configured to process an image classification model according to the sample data of the image type and the text type of the object to be processed and other objects and the target classification result corresponding to the object to be processed and other objects to obtain the image classification model,

According to a third aspect of embodiments of the present specification, there is provided an image classification model processing method for a skin disease image, including:

acquiring sample data of a picture type and/or a text type of an initial body skin damage part and a target classification result corresponding to the initial body skin damage part;

determining an object to be processed from the initial body skin damage part, wherein the object to be processed is an object which only comprises sample data of a picture type or a text type in the initial body skin damage part;

processing an image classification model according to the sample data of the image type and the text type of the object to be processed and other objects and the target classification result corresponding to the object to be processed and other objects to obtain the image classification model,

and the other objects are other initial body skin damage parts except the object to be processed in the initial body skin damage part, and the image classification model is a machine learning model.

According to a fourth aspect of embodiments of the present specification, there is provided an image classification model processing apparatus for a skin disease image, including:

the data acquisition module is configured to acquire sample data of a picture type and/or a text type of an initial body skin damage part and a target classification result corresponding to the initial body skin damage part;

an object determination module configured to determine an object to be processed from the initial body lesion part, wherein the object to be processed is an object in the initial body lesion part, which only includes sample data of a picture type or a text type;

According to a fifth aspect of embodiments of the present specification, there is provided a target object classification method including:

acquiring picture data of a first picture type, picture data of a second picture type and/or text data of a text type of a target object;

and inputting the picture data of the first picture type, the picture data of the second picture type and/or the text data of the text type into an image classification model to obtain a target classification result corresponding to the target object, wherein the image classification model is obtained by the image classification model processing method.

According to a sixth aspect of embodiments herein, there is provided a target object classification apparatus comprising:

the data acquisition module is configured to acquire picture data of a first picture type, picture data of a second picture type and/or text data of a text type of the target object;

and the classification module is configured to input the image data of the first image type, the image data of the second image type and/or the text data of the text type into an image classification model to obtain a target classification result corresponding to the target object, wherein the image classification model is obtained by the image classification model processing method.

According to a seventh aspect of embodiments herein, there is provided a computing device comprising:

a memory and a processor;

the memory is for storing computer-executable instructions, and the processor is for executing the computer-executable instructions, which when executed by the processor, implement the steps of the image classification model processing method, the target object classification method, or the user skin classification method described above.

According to an eighth aspect of embodiments herein, there is provided a computer-readable storage medium storing computer-executable instructions that, when executed by a processor, implement the steps of the image classification model processing method, the target object classification method or the user skin classification method described above.

According to a ninth aspect of embodiments herein, there is provided a computer program, wherein the computer program, when executed in a computer, causes the computer to perform the steps of the image classification model processing method, the target object classification method or the user skin classification method described above.

One embodiment of the present specification implements an image classification model processing method, including acquiring sample data of an image type and/or a text type of an initial object, and a target classification result corresponding to the initial object; determining an object to be processed from the initial object, wherein the object to be processed is an object which only comprises sample data of a picture type or a text type in the initial object; according to a preset sampling strategy, carrying out data processing on sample data of the picture type or the text type of the object to be processed to obtain at least one group of sample data of the object to be processed, wherein the sample data comprises the picture type and the text type; processing an image classification model according to the sample data of the image type and the text type of the object to be processed and other objects and target classification results corresponding to the object to be processed and other objects to obtain the image classification model, wherein the other objects are other initial objects except the object to be processed in the initial objects, and the image classification model is a machine learning model.

Specifically, the image classification model processing method can train the image classification model through multi-mode sample data of a picture mode and a text mode, so that the object to be recognized can be efficiently and accurately classified subsequently according to the image classification model; meanwhile, in order to improve the generalization performance of the image classification model, when the image classification model is trained by multi-mode sample data, in order to solve the problem of mode loss in the sample data of the initial object, the missing mode data in each sample data can be supplemented according to a preset sampling strategy, so that the training accuracy and effectiveness of the multi-mode sample data on the image classification model are improved, and the classification accuracy of the subsequent image classification model is improved.

Drawings

Fig. 1 is a schematic view of a specific scene where a target object classification method provided in an embodiment of the present specification is applied to human skin lesion classification;

FIG. 2 is a flowchart of a method for processing an image classification model according to an embodiment of the present disclosure;

fig. 3 is a schematic diagram illustrating feature fusion in an image classification model processing method according to an embodiment of the present disclosure;

FIG. 4 is a flowchart illustrating a processing procedure of a method for processing an image classification model according to an embodiment of the present disclosure;

fig. 5 is a flowchart illustrating a specific processing procedure of an image classification model processing method according to an embodiment of the present specification;

fig. 6 is a schematic structural diagram of an image classification model processing apparatus according to an embodiment of the present specification;

fig. 7 is a flowchart illustrating an image classification model processing method for skin disease images according to an embodiment of the present disclosure;

FIG. 8 is a flowchart illustrating a method for classifying a target object according to an embodiment of the present disclosure;

fig. 9 is a block diagram of a computing device according to an embodiment of the present disclosure.

Detailed Description

In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present description. This description may be implemented in many ways other than those specifically set forth herein, and those skilled in the art will appreciate that the present description is susceptible to similar generalizations without departing from the scope of the description, and thus is not limited to the specific implementations disclosed below.

The terminology used in the description of the one or more embodiments is for the purpose of describing the particular embodiments only and is not intended to be limiting of the description of the one or more embodiments. As used in one or more embodiments of the present specification and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used in one or more embodiments of the present specification refers to and encompasses any and all possible combinations of one or more of the associated listed items.

It will be understood that, although the terms first, second, etc. may be used herein in one or more embodiments to describe various information, these information should not be limited by these terms. These terms are only used to distinguish one type of information from another. For example, a first can be termed a second and, similarly, a second can be termed a first without departing from the scope of one or more embodiments of the present description. The word "if," as used herein, may be interpreted as "at \8230; \8230when" or "when 8230; \823030when" or "in response to a determination," depending on the context.

First, the noun terms to which one or more embodiments of the present specification relate are explained.

Multimodal data: in the embodiment of the present specification, the multimodal data may be understood as clinical modality images, skin mirror modality images, medical history text information, and the like.

The Transformer model, a deep learning network based on self-attention mechanism, is often applied to natural language processing and computer vision tasks in general.

A swin transducer: a self-attention model with a moving window is provided based on a transform deep learning model. Through the serial window self-attention operation (W-MSA) and the sliding window self-attention operation (SW-MSA), the swin transformer obtains the capability of almost global attention, and simultaneously reduces the calculated amount from the square relation of the image size to the linear relation, thereby greatly reducing the calculated amount and improving the model reasoning speed.

CMF algorithm: cross modification Fusion algorithm, multimodal Fusion algorithm.

And MD5: english is called Message-Digest 5, chinese is called information-summary Algorithm 5, also called Hash Algorithm; data with any length is converted into a data string with a fixed length through a function, and the data string is encrypted through an MD5 algorithm, so that a unique MD5 value can be obtained from a file. In the embodiment of the present specification, the deduplication can be performed by calculating the MD5 value of the picture, and if several pictures are completely the same, the MD5 values are also completely the same.

The image classification model processing method provided by the embodiment of the specification can be applied to an identification scene for classifying skin damage parts of a human body, can also be applied to skin damage classification identification scenes of other individuals (such as animals and the like), or can be applied to other scenes different from the skin damage identification scenes, such as an applicable identification scene in the industrial vision field; the specific application has different recognition scenes, and the processing data adopted in the image classification model is also different; for convenience of understanding, in the embodiments of the present specification, the image classification model processing method is applied to an identification scene for classifying a skin lesion of a human body, which is taken as an example, and is described in detail.

In the present specification, an image classification model processing method is provided. One or more embodiments of the present specification also relate to an image classification model processing method. One or more embodiments of the present specification also relate to an image classification model processing apparatus, an image classification model processing method and apparatus for a skin disease image, a target object classification method and apparatus, a computing device, a computer-readable storage medium, and a computer program, which are described in detail one by one in the following embodiments.

Referring to fig. 1, fig. 1 is a schematic diagram illustrating a specific scene in which a method for classifying a target object is applied to classifying skin lesion parts of a human body according to an embodiment of the present disclosure.

Fig. 1 includes a terminal 102 and a server 104, wherein the terminal 102 may be understood as a terminal embedded with a shooting device, such as a mobile phone or a tablet computer embedded with a camera.

In specific implementation, a user sends a clinical picture, a skin mirror picture and medical history information of a skin lesion part of a human body to be identified to the server 104 through the terminal 102; wherein, the user can be understood as a doctor or other skin researcher, etc.; the clinical picture can be understood as a picture of a human skin lesion part to be identified, which is shot by a user through a shooting device embedded in the terminal 102, or can be understood as a picture of a human skin lesion part to be identified, which is collected by the user from other channels.

The server 104 inputs the clinical picture, the dermatoscope picture and the medical history information of the human skin lesion part to be identified into a pre-trained image classification model, outputs a classification result corresponding to the human skin lesion part to be identified (for example, the skin lesion result of the skin lesion part is melanoma, dermatofibroma, pigmented nevus, solar keratosis and the like), returns the classification result to the terminal 102, and displays the classification result to the user through the terminal 102; the image classification model can be understood as an image classification model obtained by training according to historical clinical pictures, skin mirror pictures and medical history information of the skin disease patient as training data.

In practical application, the image classification model is obtained by training multi-modal training data such as historical clinical pictures, skin mirror pictures and medical history information of the skin disease patient, so that when the image classification model is applied in later period, corresponding classification results can be obtained by inputting data of only one mode, for example, only the clinical pictures, the skin mirror pictures and/or the medical history information of the skin lesion part of the human body to be recognized, and the image classification model can also output the classification results of the skin lesion part of the human body to be recognized.

The target object classification method provided by the embodiment of the specification is applied to a specific scene of human skin lesion part classification, and a classification result of a human skin lesion part to be recognized can be quickly and accurately obtained through a pre-trained image classification model.

Referring to fig. 2, fig. 2 shows a flowchart of an image classification model processing method provided in an embodiment of the present specification, which specifically includes the following steps.

Step 202: acquiring sample data of an image type and/or a text type of an initial object and a target classification result corresponding to the initial object.

Specifically, the specific application scenes of the image classification model processing method are different, and the objects are also different; for example, if the image classification model processing method is applied to a classification scene of a skin lesion of a human body, an object can be understood as a skin lesion of a human body (such as the skin of a historical skin patient); if the image classification model processing method is applied to a classification scene of animal skin diseases, an object can be understood as animal skin and the like.

For convenience of understanding, the following embodiments all use the example that the image classification model processing method is applied to a classification scene of a human skin lesion part, and an object is understood as a human skin lesion part, and detailed descriptions are provided for specific implementations of the image classification model processing method.

Taking an object as a skin damage part of a human body as an example, the sample data of the picture type of the initial object can be understood as a clinical picture, a skin mirror picture and the like of the skin damage part of the human body obtained from historical skin patients; the sample data of the text type of the initial object can be understood as case information of a skin damage part of a human body acquired from historical skin patients; the target classification result corresponding to each initial object can be understood as a pathological classification result of a skin lesion of each skin patient, for example, the target classification result is melanoma, dermatofibroma, pigmented nevus, solar keratosis, and the like.

The image classification model processing method can be understood as an image classification model training method, so that when the image classification model processing method is applied to a classification scene of a skin damage part of a human body, multi-mode sample data of a skin patient is obtained, and an image classification model can be trained by combining the sample data of any one picture type and text type when the image classification model is trained, and in order to improve the training accuracy of the image classification model, in the embodiment of the specification, the sample data of at least two picture types are combined with the sample data of the text type, and the image classification model obtained by training can achieve higher accuracy through the comprehensive multi-mode sample data.

Acquiring sample data of an image type and/or a text type of an initial object and a target classification result corresponding to the initial object; it can be understood that sample data of at least two picture types and/or text types of an initial object and a target classification result corresponding to the initial object are obtained, where the text types may be one or two, and the like. Meanwhile, under the condition that the image classification model processing method is applied to a classification scene of a skin lesion part of a human body, sample data of at least two picture types can be understood as a clinical picture and a skin mirror picture, and sample data of a text type can be understood as medical history information.

In practical application, in order to improve the accuracy of subsequent use of the image classification model, a large amount of training sample data is adopted to pre-train the image classification model when the image classification model is trained, and then sample data of the picture type and/or the text type of each initial object in a plurality of initial objects and a target classification result corresponding to each initial object are obtained before the image classification model is trained; sample data such as clinical pictures, dermatoscope pictures and/or medical history information of a plurality of historical skin patients are acquired, and the skin pathology results (melanoma, dermatofibroma, pigmented nevi, solar keratosis, etc.) of each skin patient.

In addition, in order to ensure the quality of the training samples, in the embodiment of the present specification, after the sample data of the picture type and/or the text type of the initial object is obtained, data preprocessing is performed on the sample data of the picture type and/or the text type of each initial object. The specific implementation mode is as follows:

after the obtaining of the sample data of the picture type and/or the text type of the initial object, the method further includes:

and carrying out data cleaning on the sample data of the picture type and/or the text type of the initial object.

The data cleaning of the sample data of the picture type can be understood as deleting fuzzy shooting, skin lesion clinical pictures after operation, removing duplication of all pictures by using an MD5 algorithm and the like, and the data cleaning of the data of the text type can be understood as structured medical history information data and the like.

Step 204: and determining an object to be processed from the initial objects.

And the object to be processed is an object which only comprises sample data of a picture type or a text type in the initial object.

And under the condition that the picture types are at least two or the text types are at least two, the object to be processed can be understood as an initial object which lacks sample data of any one picture type or any one text type.

In specific implementation, not every initial object contains sample data of a picture type and a text type, if only the initial object containing the sample data of the picture type and the text type is used for training an image classification model, the amount of the sample data is inevitably reduced, and under the condition that the number of the sample data is insufficient, the classification precision of the image classification model obtained by training is inevitably lost.

Therefore, in order to avoid the problem that the number of samples of the image classification model is small due to the lack of the modality of sample data of some initial objects, when a training sample (sample data of the initial object) is obtained, not only an initial object including the sample data of the picture type and the text type, but also an initial object including only single-modality data such as sample data of the picture type or the text type, or an initial object including only sample data of one picture type or one text type is obtained; then, an initial object only containing sample data of an image type or a text type and an initial object only containing sample data of an image type or a text type are screened from the initial objects to be used as objects to be processed, and then the sample data of the image type or the text type of the objects to be processed are processed to complete and/or amplify the training samples.

For example, 100 initial objects are obtained, where there are 30 initial objects that lack sample data of any modality (a certain picture type or a certain text type), and then these 30 initial objects are the objects to be processed.

Step 206: and according to a preset sampling strategy, carrying out data processing on the sample data of the picture type or the text type of the object to be processed to obtain at least one group of sample data of the object to be processed, wherein the group of sample data comprises the picture type and the text type.

The preset sampling strategy can be understood as an inter-class random pairing sampling strategy, that is, sample data of a missing mode in an object to be processed is supplemented in a classification mode according to a target classification result corresponding to an initial object. Here, completion is understood to mean completion, or completion and amplification.

In specific implementation, after the object to be processed of the sample data of the missing modality is determined, in order to ensure multi-modality sample training of the image classification model, the sample data of any missing model in the object to be processed can be supplemented according to an inter-class random pairing sampling strategy, so that the training precision of the image classification model is ensured. The specific implementation mode is as follows:

the data processing is carried out on the sample data of the picture type or the text type of the object to be processed according to a preset sampling strategy, and at least one group of sample data including the picture type and the text type of the object to be processed is obtained, wherein the method comprises the following steps:

and according to the inter-class random pairing sampling strategy, completing the sample data of the text type or the image type of the object to be processed only containing the sample data of the image type or the text type to obtain at least one group of sample data of the object to be processed containing the image type and the text type.

The object to be processed only including sample data of a picture type or a text type can be understood as an object to be processed lacking any sample data of a picture type or a text type. For example, the sample data of the multimodal of other subjects is < clinical picture, dermatoscope picture, medical history information >; the object to be processed can be understood as an object with any one of the missing clinical picture, the dermatoscope picture and the medical history information.

Specifically, after the object to be processed is determined, according to a random pairing strategy among classes, the object to be processed, which lacks any sample data of a picture type, is subjected to sample data complementation of the picture type, the object to be processed, which lacks any sample data of a text type, is subjected to sample data complementation of the text type, or the object to be processed, which lacks any sample data of a picture type and a text type, is subjected to sample data complementation of two modes; thus, at least one group of sample data of the object to be processed, including the picture type and the text type, is obtained. The specific implementation mode is as follows:

the method for completing the sample data of the text type or the image type of the object to be processed which only comprises the sample data of the image type or the text type according to the inter-class random pairing sampling strategy to obtain at least one group of sample data of the object to be processed which comprises the image type and the text type comprises the following steps:

according to the inter-class random pairing sampling strategy, completing sample data of a text type for an object to be processed only comprising sample data of a picture type, and obtaining at least one group of sample data comprising the picture type and the text type of the object to be processed; and

and according to the inter-class random pairing sampling strategy, completing sample data of the picture type for the object to be processed only containing the sample data of the text type, and obtaining at least one group of sample data of the object to be processed containing the picture type and the text type.

According to the inter-class random matching sampling strategy, completing the sample data of the picture type for the object to be processed which lacks the sample data of the picture type; completing the sample data of the text type for the object to be processed which lacks the sample data of the text type; the method comprises the steps of conducting completion of sample data of a certain picture type and a certain text type on an object to be processed which is short of sample data of the certain picture type and the certain text type; therefore, each object to be processed is the same as other objects and comprises complete picture type and text type multi-modal sample data; and further ensuring the training precision of the subsequently trained image classification model.

Along with the above example, the above manner is adopted, so that the multi-modal sample data of the object to be processed also comprises < clinical picture, skin mirror picture, medical history information >.

In specific implementation, the inter-class random pairing sampling strategy can be understood as that sample data of a missing mode in an object to be processed is supplemented in a classification mode according to a target classification result corresponding to an initial object. By the method, the sample data of the missing modality in the object to be processed can be supplemented, and the sample amount can be increased, so that the training effect of the image classification model is further improved. The specific implementation mode is as follows:

the method for completing the sample data of the text type or the picture type of the object to be processed, which only comprises the sample data of the picture type or the text type, according to the inter-class random pairing sampling strategy to obtain at least one group of sample data of the object to be processed, which comprises the following steps:

classifying the object to be processed and the other objects according to a target classification result corresponding to the initial object;

sequentially determining a target object to be processed from objects to be processed with the same category, and completing sample data of a text type or a picture type of the target object to be processed according to sample data of the picture type and the text type of other objects with the same category and sample data of the picture type or the text type of other objects to be processed to obtain at least one group of sample data of the target object to be processed, wherein the group of sample data comprises the picture type and the text type.

And the other objects to be processed are objects to be processed except the target object to be processed in all the objects to be processed with the same category.

In practical application, the initial objects are different, and the target classification results corresponding to the initial objects are also different; along the above example, in the case that the object is a skin lesion of a human body, the target classification result corresponding to the initial object can be understood as the name of a skin disease, such as melanoma, dermatofibroma, pigmented nevus, solar keratosis, and the like.

Specifically, the method includes classifying objects to be processed and other objects according to a target classification result corresponding to each initial object, for example, classifying the objects to be processed of melanoma and other objects into one class, classifying the objects to be processed of dermatofibroma and other objects into one class, and the like; sequentially determining a target object to be processed from the objects to be processed with the same category, for example, sequentially selecting one object to be processed from the objects to be processed with the melanoma category as the target object to be processed; determining sample data of the picture type and the text type of other objects and sample data of the picture type and the text type of other objects to be processed except the currently selected target object to be processed according to the sample data of the picture type and the text type of other objects with the same category and sample data of the picture type or the text type of other objects to be processed, such as objects in a melanoma category; performing sample data completion of a text type or a picture type on the target object to be processed, namely performing sample data completion of the text type or the picture type on the currently selected target object to be processed according to the selected sample data of the picture type and the text type of other objects to be processed and the sample data of the picture type and the text type of other objects to be processed except the currently selected object to be processed; and finally, obtaining at least one group of sample data of the target object to be processed, wherein the sample data comprises an image type and a text type.

In the embodiment of the present specification, the picture types include a first picture type and a second picture type, and therefore, in the case that the object to be processed lacks sample data of any picture type or sample data of a text type, the sample data needs to be supplemented, so as to ensure the number of training samples and improve the training effect of the image classification model. The specific implementation mode is as follows:

the picture types comprise a first picture type and a second picture type;

correspondingly, the sequentially determining a target object to be processed from objects to be processed with the same category, completing sample data of the text type or the picture type of the target object to be processed according to sample data of the picture type and the text type of other objects with the same category and sample data of the picture type or the text type of other objects to be processed, and obtaining at least one group of sample data of the target object to be processed, including the picture type and the text type, includes:

sequentially determining a target object to be processed from objects to be processed with the same category;

under the condition that the target object to be processed only comprises sample data of the first picture type, according to sample data of the second picture type and the text type of other objects with the same category and sample data of the second picture type and/or the text type of other objects to be processed, sample data of the second picture type and the text type of the target object to be processed is supplemented, and at least one group of sample data of the first picture type, the second picture type and the text type of the target object to be processed is obtained;

under the condition that the target object to be processed only comprises sample data of a second picture type, according to the sample data of the first picture type and the text type of other objects with the same category and the sample data of the first picture type and/or the text type of other objects to be processed, sample data of the first picture type and the text type of the target object to be processed is supplemented, and at least one group of sample data of the first picture type, the second picture type and the text type of the target object to be processed is obtained; or alternatively

And under the condition that the target object to be processed only comprises sample data of a text type, completing the sample data of the first picture type and the second picture type of the target object to be processed according to the sample data of the first picture type and the second picture type of other objects with the same category and the sample data of the first picture type and/or the second picture type of other objects to be processed to obtain at least one group of sample data of the target object to be processed, wherein the group of sample data comprises the first picture type, the second picture type and the text type.

Along with the above example, still taking the object to be processed of the melanoma category as an example, one object to be processed is selected from the objects to be processed of the melanoma category as a target object to be processed; if the target object to be processed only includes sample data of the first picture type, it indicates that the target object to be processed lacks sample data of the second picture type and sample data of the text type, and at this time, the target object to be processed may be supplemented according to sample data of the second picture type and sample data of the text type of other objects in the melanoma category, and sample data of the second picture type and/or sample data of the text type of other objects to be processed except the target object to be processed in the category, so as to determine that the target object to be processed includes at least one set of sample data including the first picture type, the second picture type, and the text type. The modality of the target object to be processed of the sample data of the first picture type is supplemented, and the modality can be understood as that at least one group of sample data of the second picture type and sample data of the text type is randomly selected from the sample data of the second picture type and the text type of other objects to be processed except the target object to be processed in the category, and is supplemented as the sample data of the second picture type and the sample data of the text type of the target object to be processed. In practical application, in order to amplify the sample amount and further improve the training effect of the image classification model, the sample amount of the target object to be processed is amplified on the basis of supplementing the sample data of the missing mode of the target object to be processed; the sample data of the first picture type of the target object to be processed can be combined with the sample data of the randomly selected multiple second picture types and the sample data of the text type, and multiple groups of sample data containing the first picture type, the second picture type and the text type of the target object to be processed are obtained.

Similarly, when the target object to be processed only includes sample data of the second picture type or the text type, the method for supplementing the sample data of the missing modality of the target object to be processed is the same as that of the target object to be processed only including the sample data of the first picture type, and details are not repeated here.

Specifically, taking the application of the image classification model processing method to a scene of human skin lesion part classification as an example, a data enhancement method of an inter-class random pairing sampling strategy provided for solving a common mode missing problem in multi-modal data is introduced in detail.

First, defining the training data as S and the total amount of samples as I, i.e. S = { S = { (S) _i |i∈{1，...，I}}，C ^k ，D ^k ，M ^k Respectively representing three modal data of a clinical picture, a skin mirror picture and medical history information of the skin of the patient corresponding to the disease K (namely the target classification result). Implementing a sampling strategy of random pairing between classes in the training process of the image classification model, and aiming at a certain training sample s _i ＝{c _i ，d _i ，m _i }，c _i ，d _i ，m _i Respectively representing three modal data of clinical pictures, skin mirror pictures and medical history information in the training sample, if c is missing during the training of the image classification model _i Then c is _i All other clinical pictures C that can be from the K category _k In which random sampling is performed, i.e.

Likewise, d _i From D _k Medium random sampling

m _i Random sampling from Mk

By the data enhancement means, the inter-class random pairing sampling strategy can fuse multi-mode data from different patients in the same disease, and the generalization performance of the image classification model is improved.

In practical application, because the image classification model is obtained by multiple times of iterative training, in order to improve the training efficiency of the image classification model, if the above method is adopted to complement and expand the training samples in each iteration, the training speed of the image classification model is reduced, so that a switch can be arranged, the training samples can be complemented and expanded without adopting the above method in the training of the image classification model at a certain time, and other images can be complemented and expanded in other timesWhen the image classification model is trained, the training samples are supplemented and expanded by adopting the method so as to improve the training efficiency of the image classification model. For example, a probability threshold T can be set _p If the probability p of the image classification model is greater than T _p ，p∈[0，1]Under the condition of (1), a random pairing method among the classes can be adopted to carry out data enhancement, completion and seed expansion training sample quantity; otherwise, the random pairing method among the classes is not adopted for data enhancement, completion and seed expansion training sample quantity.

Step 208: and processing an image classification model according to the sample data of the image types and the text types of the object to be processed and other objects and target classification results corresponding to the object to be processed and other objects to obtain the image classification model.

Specifically, under the condition that the object to be processed and the other objects completely include sample data of each modality, the image classification model can be trained according to the sample data of the picture type and the text type of the object to be processed and the other objects, and the target classification results corresponding to the object to be processed and the other sample objects, so as to obtain the image classification model with higher precision. Under the condition that the picture types comprise a first picture type and a second picture type, processing an image classification model according to sample data of the picture types and the text types of the object to be processed and other objects and target classification results corresponding to the object to be processed and other objects to obtain the image classification model; the image classification model is processed according to the sample data of the first picture type, the second picture type and the text type of the object to be processed, the sample data of the first picture type, the second picture type and the text type of other objects, the target classification result corresponding to the object to be processed and the target classification result corresponding to other objects.

Specifically, the processing an image classification model according to the sample data of the image type and the text type of the object to be processed and the other objects and the target classification result corresponding to the object to be processed and the other objects to obtain the image classification model includes:

determining a target object according to the object to be processed and the other objects;

determining sample data of the picture type and the text type of the target object according to the sample data of the picture type and the text type of the object to be processed and the sample data of the picture type and the text type of the other objects;

determining a target classification result corresponding to the target object according to the target classification result corresponding to the object to be processed and the target classification results corresponding to the other objects;

acquiring image feature coding vectors and character feature coding vectors of sample data of the picture type and the text type of the target object;

and processing an image classification model according to the image feature coding vector, the character feature coding vector and a target classification result corresponding to the target object to obtain the image classification model.

In specific implementation, the object to be processed and other objects are used as target objects, and according to the sample data of the image types and the text types of the object to be processed and other objects, the target classification results corresponding to the object to be processed and other objects, the sample data of the image types and the text types of the target objects and the target classification results corresponding to the target objects are determined; then obtaining image feature coding vectors of the sample data of the picture type of the target object (such as the picture feature coding vectors of the sample data of the first picture type and the picture feature coding vectors of the sample data of the second picture type) and text feature coding vectors of the sample data of the text type; and finally, processing (such as model training) the image classification model through the multi-mode data according to the image feature coding vector and the text feature coding vector of the target object and a target classification result corresponding to the target object to obtain a trained image classification model, so that the training precision of the image classification model is improved.

In practical application, in order to improve the training precision of an image classification model in the training process of the image classification model, the embodiment of the specification is applied to the training of the image classification model by fusing the global features and the local features of sample data of different types; therefore, in the process of obtaining the feature coding vector of the sample data of the target object, the local image feature coding vector and the global image feature coding vector of the sample data of the picture type of the target object are extracted, and the initial text feature coding vector and the global text feature coding vector of the sample data of the text type of the target object are extracted. The specific implementation mode is as follows:

the obtaining of the image feature coding vector and the character feature coding vector of the sample data of the picture type and the text type of the target object includes:

obtaining local image feature coding vectors and global image feature coding vectors of the sample data of the picture type of the target object through an image feature extraction network of an image classification model;

and obtaining the character feature coding vector of the sample data of the text type through the text feature extraction network of the image classification model by using the sample data of the text type of the target object.

The image feature extraction network of the image classification model can adopt a swin transformer-base original frame as an image feature extractor.

According to the above example, under the condition that the sample data of the picture type comprises a clinical picture and a dermatoscope picture and the sample data of the text type comprises medical history information, the local image feature coding vector and the global image feature coding vector of the sample data of the picture type are obtained by the sample data of the picture type of the target object through an image feature extraction network of an image classification model; obtaining character feature coding vectors of the sample data of the text type through a text feature extraction network of the image classification model by using the sample data of the text type of the target object; the method can be understood as that a clinical picture of a target object is subjected to an image feature extraction network of an image classification model to obtain a local image feature coding vector and a global image feature coding vector of the clinical picture; similarly, the dermatoscope picture of the target object passes through an image feature extraction network of an image classification model to obtain a local image feature coding vector and a global image feature coding vector of the dermatoscope picture; and (3) extracting the medical history information of the target object through a text feature extraction network of the image classification model to obtain a character feature coding vector of the medical history information.

In specific implementation, the manner of acquiring the local image feature coding vector and the global image feature coding vector of the sample data of the image modality by using the image feature extraction network of the image classification model is as follows:

the obtaining, by the image feature extraction network of an image classification model, a local image feature coding vector and a global image feature coding vector of the sample data of the picture type of the target object includes:

obtaining local image feature coding vectors of the sample data of the picture type of the target object through an image feature extraction network of an image classification model;

and coding the local feature coding vector of the sample data of the picture type by using a global average pooling method to obtain a global image feature coding vector of the sample data of the picture type.

According to the above example, still taking an image feature extraction network as a swin transformer model as an example, obtaining local image feature coding vectors of sample data of the image type of the target object by using the sample data of the image type through the image feature extraction network of the image classification model; encoding the local feature encoding vector of the sample data of the picture type by using a global average pooling method to obtain a global image feature encoding vector of the sample data of the picture type; after determining a clinical picture and a skin mirror picture of a target object, respectively performing image coding on the clinical picture and the skin mirror picture by using a swin transformer model as an image feature extraction network to obtain a local image feature coding vector of the clinical picture and a local image feature coding vector of the skin mirror picture; and then obtaining the global image feature coding vector of the clinical picture and the global image feature coding vector of the skin mirror picture by using a global average pooling method on the basis of the local image feature coding vector of the clinical picture and the global image feature coding vector of the skin mirror picture.

In the image classification model processing method provided in the embodiment of the present specification, a local image feature coding vector and a global image feature coding vector of sample data of a picture type of a target object are respectively obtained through an image feature extraction network; and then, the local image feature coding vector, the global image feature coding vector and the character feature coding vector can be subjected to cross fusion, so that the training precision of the image classification model is improved.

In addition, the method for acquiring the initial text feature encoding vector and the global text feature encoding vector of the text mode sample data by adopting the text feature extraction network of the image classification model is as follows:

the obtaining of the character feature coding vector of the sample data of the text type by the sample data of the text type of the target object through the text feature extraction network of the image classification model comprises:

obtaining initial character feature coding vectors of the sample data of the text type through a text feature extraction network of the image classification model by using the sample data of the text type of the target object;

and encoding the initial character feature encoding vector of the sample data of the text type by utilizing the linear network of the image classification model to obtain the global character feature encoding vector of the sample data of the text type.

Still continuing the above example, taking sample data of a text type as medical history information as an example, extracting a network from the text feature of the image classification model by using the sample data of the text type of the target object, and obtaining an initial character feature coding vector of the sample data of the text type; encoding the initial character feature encoding vector of the sample data of the text type by using the linear network of the image classification model to obtain a global character feature encoding vector of the sample data of the text type; the medical history information of the target object is extracted through a text feature extraction network of an image classification model to obtain an initial character feature coding vector of the medical history information; and (4) coding the initial character feature coding vector of the medical history information by utilizing a linear network of the image classification model to obtain a global character feature coding vector of the medical history information.

In practical application, after determining sample data of a picture type and a text type of a target object, respectively performing image coding on a clinical picture and a skin mirror picture by using a swin transform model as an image feature extractor to respectively obtain local image feature coding vectors of the clinical picture and the skin mirror picture; then, on the basis of the local image feature coding vectors, global image feature coding vectors of clinical pictures and skin mirror pictures are obtained by using a global average pooling method; meanwhile, the discrete medical history information is converted into corresponding initial character feature coding vectors in a one-hot coding mode, and the one-hot initial character feature coding vectors are further coded to be represented in a higher dimension by utilizing a linear layer, so that the global character feature coding vectors of the medical history information are obtained.

In the image classification model processing method provided in the embodiment of the present specification, a local image feature coding vector, a global image feature coding vector, and a global text feature coding vector of a target object are obtained through an image feature extraction network and a text feature extraction network; and subsequently, the local image feature coding vector, the global image feature coding vector and the global text feature coding vector can be fused to improve the training accuracy of the image classification model.

Then after obtaining the local image feature coded vector, the global image feature coded vector, and the global text feature coded vector of the target object, the specific implementation manner of the cross-fusion of the local image feature coded vector, the global image feature coded vector, and the global text feature coded vector of the target object is as follows:

the processing an image classification model according to the image feature coding vector, the text feature coding vector and a target classification result corresponding to the target object to obtain the image classification model comprises:

performing feature fusion on the local image feature coding vector, the global image feature coding vector and the character feature coding vector according to a preset multi-mode fusion algorithm to obtain a target feature coding vector of the target object;

and processing an image classification model according to the target feature coding vector of the target object and a target classification result corresponding to the target object to obtain the image classification model.

The preset multimodal Fusion algorithm can be understood as a CMF algorithm (Cross-model Fusion algorithm).

In specific implementation, after a local image feature coding vector, a global image feature coding vector and a global text feature coding vector of a target object are determined, firstly, feature fusion is carried out on the local image feature coding vector, the global image feature coding vector and the text feature coding vector according to a preset multi-modal fusion algorithm to obtain a target feature coding vector of the target object; and processing the image classification model according to the target feature coding vector of the target object and a target classification result corresponding to the target object to obtain the image classification model.

In the image classification model processing method provided in the embodiment of the present specification, a preset multi-modal fusion algorithm is used to perform cross fusion on the global features of each modality and the local features of another modality, so that each modality can learn not only the features of its own modality, but also the features of other modalities, global knowledge, and the like, thereby improving the training of the image classification model.

In practical application, a specific implementation manner of performing cross fusion on the global features of each modality and the local features of another modality by using a preset multi-modality fusion algorithm is as follows:

the performing feature fusion on the local image feature coding vector, the global image feature coding vector and the text feature coding vector according to a preset multi-modal fusion algorithm to obtain a target feature coding vector of the target object includes:

and carrying out cross fusion on the global image feature coding vector of the sample data of the picture type of the target object or the global character feature coding vector of the sample data of the text type of the target object and the local image feature coding vector of the picture type data of the target object according to a preset multi-mode fusion algorithm to obtain the target feature coding vector of the target object.

Still continuing with the above example, the local image feature encoding vector, the global image feature encoding vector, and the global text feature encoding vector of the target object are used; local image feature coding vectors and global image feature coding vectors of clinical pictures of skin disease patients; local image feature coding vectors and global image feature coding vectors of the skin mirror picture; and taking the global text feature coding vector of the medical history information as an example, a specific implementation mode of performing cross fusion on the global image feature coding vector of the sample data of the picture type of the target object or the global character feature coding vector of the sample data of the text type of the target object and the local image feature coding vector of the picture type data of the target object according to a preset multi-mode fusion algorithm is described in detail. See in particular fig. 3.

Referring to fig. 3, fig. 3 is a schematic diagram illustrating feature fusion in an image classification model processing method according to an embodiment of the present disclosure.

Still taking a preset multi-mode fusion algorithm as an example of the CMF algorithm, obtaining the global image feature coding vector g of the skin mirror picture _D And local image feature code vector l _D (ii) a Global image feature coding vector g of clinical picture _C And local image featuresSyndrome coding vector l _C Global character feature code vector g of medical history information _M Then, the global features of each modality (i.e. the dermatoscope picture, the clinical picture and the medical history information) are subjected to information fusion with the local features of other modalities through multi-head attentions, wherein the local features are used as K and V, and the global features are used as Q.

Section a of FIG. 3 illustrates feature fusion using three modalities, a dermoscopic modality, a clinical modality, and a medical history modality, as examples, g _M And l _D Feature fusion, g, by Attention Module 2 (Attention Module 2) _M And l _C Feature fusion by Attention module 3 (Attention module 3), g _D And l _C Feature fusion by Attention Module 1 (Attention Module 1), g _C And l _D Feature fusion is performed by the Attention module 4 (Attention module 4). And part B of fig. 3 is an example in which two modes are represented by a and B, and feature fusion is represented, g _A Global features representing modality A,/ _B Representing a local feature of modality B.

Specifically, assuming that X and X' represent different modalities, respectively, l represents a patch local feature extracted by the swin transformer network. The algorithm implementation is as follows:

wherein equation 1 is used to generateThe composition part characteristic and the global characteristic, LN represents layer normalization, and GAP represents a global pooling layer; k and V in the formula 2 are local features, and Q is a global feature; equation 3 represents an implementation of the multi-head attention method; the purpose of formula 4 is to obtain a fused feature vector, X and X' represent two different modes, and LN in formula 4 represents a linear layer; and in the above formula

Is a learnable parameter, F is a feature dimension, and h represents the number of heads in the attention mechanism.

The image classification model processing method provided by the embodiment of the specification can train the image classification model through multi-mode sample data of a picture mode and a text mode, so that the object to be recognized can be efficiently and accurately classified subsequently according to the image classification model; meanwhile, in order to improve the generalization performance of the image classification model, when the image classification model is trained through multi-mode sample data, in order to solve the problem of mode loss in the sample data of the initial object, the missing mode data in each sample data can be supplemented according to a preset sampling strategy, so that the training accuracy and effectiveness of the multi-mode sample data on the image classification model are improved, and the classification accuracy of the subsequent image classification model is improved.

In the following, with reference to fig. 4, the application of the image classification model processing method provided in this specification to a skin disease classification scene is taken as an example to further describe the image classification model processing method. Fig. 4 shows a processing flow chart of an image classification model processing method provided in an embodiment of the present specification, which specifically includes the following steps.

The method comprises the following steps: acquiring a dermatoscope picture, a clinical picture and medical history information of a historical dermatosis patient; each dermatoscope picture, each clinical picture and each medical history information correspond to a skin disease name. Such as diseases 1 and 2 (e.g., disease1 and Disease 2) carried in the skin mirror image of fig. 4, diseases 1 and 2 carried in the clinical image, and diseases 1 and 2 carried in the medical history information.

Step two: random Sampling (RS) is performed on the dermatoscope picture, the clinical picture, and the medical history information, for example, the inter-class Random pairing method in the above embodiment is used to enhance the data of the dermatoscope picture, the clinical picture, and the medical history information.

Step three: respectively processing a skin mirror picture and a clinical picture obtained after random sampling through an image block coding layer (Patch Embedding) to obtain image blocks; meanwhile, the medical history information obtained after random sampling is processed by a One-hot encoding layer (One-hot Embedding) to obtain One-hot feature encoding vectors.

Step four: respectively inputting image blocks obtained by a Patch Embedding layer of a skin mirror picture and a clinical picture into an image feature extractor (switch vector model) to obtain a local image feature coding vector of each image block of the skin mirror picture and a local image feature coding vector of each image block of the clinical picture; meanwhile, the One-hot feature encoding vector passes through a Linear layer (Linear), batch normalization (BatchNorm) and an activation function (Linear rectification activation function, reLU), so as to obtain a global text feature encoding vector.

Step five: the local image feature coding vectors of the dermatoscope picture and the clinical picture are processed through a Pooling (GAP, global average Pooling layer) respectively to obtain global image feature coding vectors of the dermatoscope picture and the clinical picture.

Step six: inputting the local image feature coding vector of the dermatoscope picture, the clinical picture, the global image feature coding vector of the dermatoscope picture, the clinical picture and the global text feature coding vector of the medical history information into a cross-modal fusion model (CMF) to carry out cross fusion of local features and global features, and obtaining a global image feature coding vector g after the fusion of the dermatoscope picture _D And a global image feature coding vector g after clinical picture fusion _C And global text feature vector g after medical history information fusion _M 。

Among them, in FIG. 4

Represents Concatenate, splicing.

Step seven: inputting the global image feature coding vector after fusing the dermatoscope picture and the clinical picture and the global text feature vector after fusing the medical history information into a Classifier (Classifier) to carry out image classification model training.

According to the image classification model processing method provided by the embodiment of the specification, for the problem of mode loss frequently occurring in multi-mode data, a method for random pairing sampling among classes is provided, so that the defect of the multi-mode data is made up, and the generalization performance of an image classification model can be improved; for the problem of effective fusion of multi-modal information, a new modal cross fusion module is adopted, so that the global features and the local features of different modals can be effectively fused, and the classification precision of the image classification model is improved.

In the following, with reference to fig. 5, the image classification model processing method provided in this specification is further described by taking an application of the image classification model processing method in a skin disease classification scene as an example. Fig. 5 shows a flowchart of a specific processing procedure of an image classification model processing method provided in an embodiment of the present specification, which specifically includes the following steps.

Step 502: and acquiring multi-modal sample data of the historical skin disease patient, such as modes of a skin mirror picture, a clinical picture, medical history information and the like.

In specific implementation, the multimodal sample data is stored in a format of a skin disease picture, a clinical picture and medical history information according to the category dimension of the skin disease, and a multimodal sample data set is obtained. In practice, the data set may be divided into multiple parts at random according to the dimension of the patient, and then divided into a training set, a validation set and a test set according to the ratio (e.g. 3: 1), while the data of the same patient only exists in a certain set.

Step 504: and for the sample data with the missing mode, randomly sampling by adopting an inter-class random pairing sampling strategy (DWP).

Step 506: and determining multi-modal sample data which are obtained after random sampling and respectively comprise a skin mirror picture, a clinical picture, medical history information and the like.

Step 508: inputting the clinical picture and the dermatoscope picture of each sample data into the image feature extraction module to obtain the local feature and the global feature of the picture, and inputting the medical history information of each sample data into the text feature extraction module to obtain the global feature of the text.

Step 510: and inputting the picture local features, the picture global features and the text global features into a multi-mode fusion module to perform cross fusion of the global features and the local features.

Step 512: and inputting the target features obtained after the cross fusion into a classifier to obtain a classification result.

Specifically, the image classification model can be trained and optimized subsequently according to the classification result and the real classification result; and defining an optimizer and a loss function to iteratively update network parameters to complete model training. And then, a better image classification model can be selected for packaging, and during actual application, multi-mode data can be used as input to obtain a classification result.

The image classification model processing method provided by the embodiment of the specification provides a multi-modal skin tumor classification method based on a transformer model, and can effectively realize classification and identification of skin tumor diseases; and a random pairing sampling strategy among categories expands data samples, which is used for solving the problem of modal loss in a multi-modal data set and improving the generalization performance and precision of the model; meanwhile, a novel multi-mode cross fusion module is provided, so that global features and local features of different modes can be effectively fused, and the classification precision is effectively improved.

Corresponding to the above method embodiment, the present specification further provides an embodiment of an image classification model processing apparatus, and fig. 6 shows a schematic structural diagram of an image classification model processing apparatus provided in an embodiment of the present specification. As shown in fig. 6, the apparatus includes:

a data obtaining module 602, configured to obtain sample data of a picture type and/or a text type of an initial object and a target classification result corresponding to the initial object;

an object determination module 604, configured to determine an object to be processed from the initial object, where the object to be processed is an object that only includes sample data of a picture type or a text type in the initial object;

the data processing module 606 is configured to perform data processing on sample data of the picture type or the text type of the object to be processed according to a preset sampling strategy, so as to obtain at least one group of sample data of the picture type and the text type of the object to be processed;

a model processing module 608 configured to process the image classification model according to the sample data of the picture type and the text type of the object to be processed and the other objects, and the target classification result corresponding to the object to be processed and the other objects, to obtain the image classification model,

Optionally, the data processing module 606 is further configured to:

according to an inter-class random pairing sampling strategy, completing sample data of a text type for an object to be processed only containing sample data of an image type to obtain at least one group of sample data of the object to be processed containing the image type and the text type; and

Optionally, the data processing module 606 is further configured to:

sequentially determining a target object to be processed from objects to be processed with the same category, and completing sample data of a text type or an image type of the target object to be processed according to sample data of an image type and a text type of other objects with the same category and sample data of the image type or the text type of other objects to be processed to obtain at least one group of sample data of the target object to be processed, wherein the group of sample data comprises the image type and the text type.

Optionally, the picture types include a first picture type and a second picture type;

accordingly, the data processing module 606 is further configured to:

under the condition that the target object to be processed only comprises sample data of a first picture type, according to the sample data of a second picture type and a text type of other objects with the same category and the sample data of the second picture type and/or the text type of other objects to be processed, completing the sample data of the second picture type and the text type of the target object to be processed, and obtaining at least one group of sample data of the first picture type, the second picture type and the text type of the target object to be processed;

under the condition that the target object to be processed only comprises sample data of a second picture type, according to the sample data of the first picture type and the text type of other objects with the same category and the sample data of the first picture type and/or the text type of other objects to be processed, sample data of the first picture type and the text type of the target object to be processed is supplemented, and at least one group of sample data of the first picture type, the second picture type and the text type of the target object to be processed is obtained; or

Under the condition that the target object to be processed only comprises sample data of a text type, according to the sample data of the first picture type and the second picture type of other objects with the same category and the sample data of the first picture type and/or the second picture type of other objects to be processed, the sample data of the first picture type and the second picture type of the target object to be processed is supplemented, and at least one group of sample data comprising the first picture type, the second picture type and the text type of the target object to be processed is obtained.

Optionally, the model processing module 608 is further configured to:

obtaining initial character feature coding vectors of the sample data of the text type through a text feature extraction network of the image classification model for the sample data of the text type of the target object;

Optionally, the model processing module 608 is further configured to:

performing feature fusion on the local image feature coding vector, the global image feature coding vector and the text feature coding vector according to a preset multi-modal fusion algorithm to obtain a target feature coding vector of the target object;

Optionally, the model processing module 608 is further configured to:

Optionally, the apparatus further comprises:

and the data cleaning module is configured to perform data cleaning on the sample data of the picture type and/or the text type of the initial object.

The image classification model processing device provided in the embodiment of the present specification can train an image classification model through multi-modal sample data of a picture mode and a text mode, so that an object to be recognized can be efficiently and accurately classified subsequently according to the image classification model; meanwhile, in order to improve the generalization performance of the image classification model, when the image classification model is trained through multi-mode sample data, in order to solve the problem of mode loss in the sample data of the initial object, the missing mode data in each sample data can be supplemented according to a preset sampling strategy, so that the training accuracy and effectiveness of the multi-mode sample data on the image classification model are improved, and the classification accuracy of the subsequent image classification model is improved.

The above is a schematic scheme of an image classification model processing apparatus of the present embodiment. It should be noted that the technical solution of the image classification model processing apparatus and the technical solution of the image classification model processing method belong to the same concept, and details that are not described in detail in the technical solution of the image classification model processing apparatus can be referred to the description of the technical solution of the image classification model processing method.

Referring to fig. 7, fig. 7 shows a flowchart of an image classification model processing method for a skin disease image according to an embodiment of the present specification, which specifically includes the following steps.

Step 702: acquiring sample data of a picture type and/or a text type of an initial body skin damage part and a target classification result corresponding to the initial body skin damage part;

in the embodiment of the present specification, the initial body skin lesion part may be understood as a skin to be identified of a human body, such as a skin lesion part of a skin lesion category to be identified.

Step 704: determining an object to be processed from the initial body skin damage part, wherein the object to be processed is an object which only comprises sample data of a picture type or a text type in the initial body skin damage part;

step 706: according to a preset sampling strategy, carrying out data processing on sample data of the picture type or the text type of the object to be processed to obtain at least one group of sample data of the object to be processed, wherein the sample data comprises the picture type and the text type;

step 708: processing an image classification model according to the sample data of the image type and the text type of the object to be processed and other objects and the target classification result corresponding to the object to be processed and other objects to obtain the image classification model,

The image classification model processing method for the skin disease image provided by the embodiment of the specification can train the image classification model through multi-mode sample data of a picture mode and a text mode, so that the object to be recognized can be efficiently and accurately classified subsequently according to the image classification model; meanwhile, in order to improve the generalization performance of the image classification model, when the image classification model is trained by multi-mode sample data, in order to solve the problem of mode loss in the sample data of the initial object, the missing mode data in each sample data can be supplemented according to a preset sampling strategy, so that the training accuracy and effectiveness of the multi-mode sample data on the image classification model are improved, and the classification accuracy of the subsequent image classification model is improved.

The foregoing is a schematic diagram of an image classification model processing method for skin disease images according to this embodiment. It should be noted that the technical solution of the image classification model processing method for skin disease images belongs to the same concept as the technical solution of the image classification model processing method, and details of the technical solution of the image classification model processing method for skin disease images, which are not described in detail, can be referred to the description of the technical solution of the image classification model processing method.

An embodiment of the present specification further provides an image classification model processing apparatus for a skin disease image, including:

an object determination module configured to determine an object to be processed from the initial body lesion part, wherein the object to be processed is an object in the initial body lesion part, and only includes sample data of a picture type or a text type;

the data processing module is configured to perform data processing on sample data of the picture type or the text type of the object to be processed according to a preset sampling strategy to obtain at least one group of sample data of the picture type and the text type of the object to be processed;

wherein the other objects are other initial body skin damage parts except the object to be processed in the initial body skin damage parts.

The image classification model processing device for skin disease images provided by the embodiment of the specification can train the image classification model through multi-mode sample data of a picture mode and a text mode, so that objects to be recognized can be efficiently and accurately classified subsequently according to the image classification model; meanwhile, in order to improve the generalization performance of the image classification model, when the image classification model is trained by multi-mode sample data, in order to solve the problem of mode loss in the sample data of the initial object, the missing mode data in each sample data can be supplemented according to a preset sampling strategy, so that the training accuracy and effectiveness of the multi-mode sample data on the image classification model are improved, and the classification accuracy of the subsequent image classification model is improved.

The above is an exemplary scheme of the image classification model processing model of the skin disease image according to the embodiment. It should be noted that the technical solution of the image classification model processing method for skin disease images and the technical solution of the image classification model processing method for skin disease images belong to the same concept, and details of the technical solution of the image classification model processing model for skin disease images, which are not described in detail, can be referred to the description of the technical solution of the image classification model processing method for skin disease images.

Referring to fig. 8, fig. 8 is a flowchart illustrating a target object classification method according to an embodiment of the present disclosure, which specifically includes the following steps.

Step 802: acquiring picture data of a first picture type, picture data of a second picture type and/or text data of a text type of a target object;

step 804: and inputting the picture data of the first picture type, the picture data of the second picture type and/or the text data of the text type into an image classification model to obtain a target classification result corresponding to the target object.

The image classification model is obtained by the image classification model processing method.

In the target object classification method provided in the embodiment of the present specification, the image classification model based on the Transformer model framework, which is trained by multi-modal data extended by the inter-class random matching sampling strategy, is used to classify the target classification result of the target object, so that the classification accuracy is greatly improved.

The foregoing is a schematic scheme of a target object classification method according to this embodiment. It should be noted that the technical solution of the target object classification method for skin disease images and the technical solution of the image classification model processing method described above belong to the same concept, and details that are not described in detail in the technical solution of the target object classification method can be referred to the description of the technical solution of the image classification model processing method described above.

An embodiment of the present specification further provides a target object classification apparatus, including:

and the classification module is configured to input the picture data of the first picture type, the picture data of the second picture type and/or the text data of the text type into an image classification model to obtain a target classification result corresponding to the target object, wherein the image classification model is obtained by the image classification model processing method.

The target object classification device provided in the embodiment of the present specification classifies the target classification result of the target object through the image classification model based on the Transformer model framework, which is trained by the multi-modal data extended by the inter-class random pairing sampling strategy, so as to greatly improve the classification accuracy.

The foregoing is a schematic diagram of a target object classification apparatus of this embodiment. It should be noted that the technical solution of the target object classification apparatus and the technical solution of the target object classification method belong to the same concept, and details that are not described in detail in the technical solution of the target object classification apparatus can be referred to the description of the technical solution of the target object classification method.

FIG. 9 illustrates a block diagram of a computing device 900 provided in accordance with one embodiment of the present specification. Components of the computing device 900 include, but are not limited to, a memory 910 and a processor 920. The processor 920 is coupled to the memory 910 via a bus 930, and a database 950 is used to store data.

Computing device 900 also includes access device 940, access device 940 enabling computing device 900 to communicate via one or more networks 960. Examples of such networks include the Public Switched Telephone Network (PSTN), a Local Area Network (LAN), a Wide Area Network (WAN), a Personal Area Network (PAN), or a combination of communication networks such as the internet. Access device 940 may include one or more of any type of network interface (e.g., a Network Interface Card (NIC)) whether wired or wireless, such as an IEEE802.11 Wireless Local Area Network (WLAN) wireless interface, a worldwide interoperability for microwave access (Wi-MAX) interface, an ethernet interface, a Universal Serial Bus (USB) interface, a cellular network interface, a bluetooth interface, a Near Field Communication (NFC) interface, and so forth.

In one embodiment of the present description, the above-described components of computing device 900, as well as other components not shown in FIG. 9, may also be connected to each other, such as by a bus. It should be understood that the block diagram of the computing device architecture shown in FIG. 9 is for purposes of example only and is not limiting as to the scope of the description. Those skilled in the art may add or replace other components as desired.

Computing device 900 may be any type of stationary or mobile computing device, including a mobile computer or mobile computing device (e.g., tablet, personal digital assistant, laptop, notebook, netbook, etc.), a mobile phone (e.g., smartphone), a wearable computing device (e.g., smartwatch, smartglasses, etc.), or other type of mobile device, or a stationary computing device such as a desktop computer or PC. Computing device 900 may also be a mobile or stationary server.

Wherein the processor 920 is configured to execute computer-executable instructions which, when executed by the processor, implement the steps of the image classification model processing method, the image classification model processing method of skin disease images and the target object classification method described above.

The above is an illustrative scheme of a computing device of the present embodiment. It should be noted that the technical solution of the computing device is the same as the technical solutions of the image classification model processing method, the image classification model processing method of the skin disease image, and the target object classification method, and details of the technical solutions of the computing device, which are not described in detail, can be referred to the descriptions of the technical solutions of the image classification model processing method, the image classification model processing method of the skin disease image, and the target object classification method.

An embodiment of the present specification further provides a computer-readable storage medium, which stores computer-executable instructions, which when executed by a processor, implement the steps of the image classification model processing method, the image classification model processing method of a skin disease image, and the target object classification method described above.

The above is an illustrative scheme of a computer-readable storage medium of the present embodiment. It should be noted that the technical solution of the storage medium is the same as the technical solutions of the image classification model processing method, the image classification model processing method of the skin disease image, and the target object classification method, and details of the technical solution of the storage medium, which are not described in detail, can be referred to the descriptions of the technical solutions of the image classification model processing method, the image classification model processing method of the skin disease image, and the target object classification method.

An embodiment of the present specification further provides a computer program, wherein when the computer program is executed in a computer, the computer program causes the computer to execute the steps of the image classification model processing method, the image classification model processing method of a skin disease image, and the target object classification method described above.

The above is an illustrative scheme of a computer program of the present embodiment. It should be noted that the technical solution of the computer program is the same as the technical solutions of the image classification model processing method, the image classification model processing method of the skin disease image, and the target object classification method, and details of the technical solution of the computer program, which are not described in detail, can be referred to the descriptions of the technical solutions of the image classification model processing method, the image classification model processing method of the skin disease image, and the target object classification method.

The foregoing description of specific embodiments has been presented for purposes of illustration and description. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.

The computer instructions comprise computer program code which may be in source code form, object code form, an executable file or some intermediate form, or the like. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, read-Only Memory (ROM), random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, and the like. It should be noted that the computer-readable medium may contain suitable additions or subtractions depending on the requirements of legislation and patent practice in jurisdictions, for example, in some jurisdictions, computer-readable media may not include electrical carrier signals or telecommunication signals in accordance with legislation and patent practice.

It should be noted that, for the sake of simplicity, the foregoing method embodiments are described as a series of combinations of acts, but it should be understood by those skilled in the art that the embodiments are not limited by the described order of acts, as some steps may be performed in other orders or simultaneously according to the embodiments. Further, those skilled in the art should also appreciate that the embodiments described in this specification are preferred embodiments and that acts and modules referred to are not necessarily required for an embodiment of the specification.

In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.

The preferred embodiments of the present specification disclosed above are intended only to aid in the description of the specification. Alternative embodiments are not exhaustive and do not limit the invention to the precise embodiments described. Obviously, many modifications and variations are possible in light of the teaching of the embodiments of the present disclosure. The embodiments were chosen and described in order to best explain the principles of the embodiments and the practical application, to thereby enable others skilled in the art to best understand and utilize the embodiments. The specification is limited only by the claims and their full scope and equivalents.

Claims

1. An image classification model processing method, comprising:

according to a preset sampling strategy, carrying out data processing on sample data of the picture type or the text type of the object to be processed to obtain at least one group of sample data of the object to be processed, wherein the sample data comprises the picture type and the text type;

2. The image classification model processing method according to claim 1, wherein the data processing is performed on sample data of a picture type or a text type of the object to be processed according to a preset sampling policy, so as to obtain at least one group of sample data of the object to be processed, the group of sample data including the picture type and the text type, and the method includes:

according to the inter-class random pairing sampling strategy, completing sample data of the text type or the picture type for the object to be processed only including the sample data of the picture type or the text type, and obtaining at least one group of sample data of the object to be processed including the picture type and the text type.

3. The image classification model processing method according to claim 2, wherein the completing sample data of a text type or a picture type for the object to be processed only including sample data of a picture type or a text type according to the inter-class random pairing sampling strategy to obtain at least one group of sample data of the object to be processed including the picture type and the text type comprises:

4. The image classification model processing method according to claim 2, wherein the completing sample data of a text type or a picture type for a to-be-processed object that only includes sample data of a picture type or a text type according to an inter-class random pairing sampling strategy to obtain at least one group of sample data of the to-be-processed object that includes the picture type and the text type comprises:

5. The image classification model processing method according to claim 4, the picture types including a first picture type and a second picture type;

correspondingly, the step of sequentially determining a target object to be processed from objects to be processed with the same category, and completing the sample data of the text type or the image type of the target object to be processed according to the sample data of the image type and the text type of other objects with the same category and the sample data of the image type or the text type of other objects to be processed to obtain at least one group of sample data of the target object to be processed, wherein the group of sample data comprises the image type and the text type, and comprises the following steps:

sequentially determining a target object to be processed from the objects to be processed with the same category;

under the condition that the target object to be processed only comprises sample data of a second picture type, completing the sample data of the first picture type and the text type of the target object to be processed according to the sample data of the first picture type and the text type of other objects with the same category and the sample data of the first picture type and/or the text type of other objects to be processed to obtain at least one group of sample data of the first picture type, the second picture type and the text type of the target object to be processed; or

6. The image classification model processing method according to claim 1, wherein the processing an image classification model according to the sample data of the picture type and the text type of the object to be processed and the other objects and the target classification result corresponding to the object to be processed and the other objects to obtain the image classification model comprises:

acquiring image feature coding vectors and text feature coding vectors of sample data of the picture type and the text type of the target object;

and processing an image classification model according to the image feature coding vector, the text feature coding vector and a target classification result corresponding to the target object to obtain the image classification model.

7. The image classification model processing method according to claim 6, the obtaining image feature coding vectors and text feature coding vectors of sample data of a picture type and a text type of the target object, comprising:

and obtaining the text feature coding vector of the sample data of the text type through a text feature extraction network of the image classification model by using the sample data of the text type of the target object.

8. The image classification model processing method according to claim 7, wherein the obtaining, through an image feature extraction network of an image classification model, local image feature coding vectors and global image feature coding vectors of sample data of a picture type of the target object includes:

9. The image classification model processing method according to claim 8, wherein the obtaining, through a text feature extraction network of the image classification model, text feature encoding vectors of sample data of the text type for the sample data of the target object includes:

obtaining initial text feature coding vectors of the sample data of the text type through a text feature extraction network of the image classification model by using the sample data of the text type of the target object;

and encoding the initial text feature encoding vector of the sample data of the text type by utilizing the linear network of the image classification model to obtain the global text feature encoding vector of the sample data of the text type.

10. The method for processing an image classification model according to claim 9, wherein the processing an image classification model according to the image feature coding vector, the text feature coding vector and a target classification result corresponding to the target object to obtain the image classification model comprises:

11. The image classification model processing method according to claim 10, wherein the performing feature fusion on the local image feature coding vector, the global image feature coding vector and the text feature coding vector according to a preset multi-modal fusion algorithm to obtain a target feature coding vector of the target object comprises:

and carrying out cross fusion on the global image feature coding vector of the sample data of the picture type of the target object or the global text feature coding vector of the sample data of the text type of the target object and the local image feature coding vector of the picture type data of the target object according to a preset multi-mode fusion algorithm to obtain the target feature coding vector of the target object.

12. The image classification model processing method according to claim 1, further comprising, after the obtaining sample data of a picture type and/or a text type of the initial object:

13. An image classification model processing method of a skin disease image comprises the following steps:

14. A target object classification method, comprising:

inputting the picture data of the first picture type, the picture data of the second picture type and/or the text data of the text type into an image classification model to obtain a target classification result corresponding to the target object, wherein the image classification model is obtained by the image classification model processing method according to any one of claims 1 to 12.