CN116883765B

CN116883765B - Image classification method, device, electronic equipment and storage medium

Info

Publication number: CN116883765B
Application number: CN202311151897.3A
Authority: CN
Inventors: 杜俊珑; 鄢科
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2023-09-07
Filing date: 2023-09-07
Publication date: 2024-01-09
Anticipated expiration: 2043-09-07
Also published as: CN116883765A

Abstract

The disclosure relates to an image classification method, an image classification device, an electronic device and a storage medium, and relates to the technical field of artificial intelligence, comprising the following steps: acquiring an image to be classified; inputting an image to be classified into a target extraction sub-model corresponding to each of a plurality of service scene types in a target image classification model to perform feature extraction processing to obtain target image feature information corresponding to each service scene type; the target image classification model is obtained by training a sample scene adaptation module based on mutual exclusion loss information, the mutual exclusion loss information characterizes the similarity degree of target adaptation characteristic information, and the target adaptation characteristic information is adaptation characteristic information output by the sample scene adaptation module corresponding to any two business scene types; and determining target image category information corresponding to the image to be classified based on the target image characteristic information and the target classification sub-model corresponding to each service scene type. By utilizing the embodiment of the invention, complex and changeable application scenes can be better dealt with, and the model effect is improved.

Description

Image classification method, device, electronic equipment and storage medium

Technical Field

The disclosure relates to the technical field of artificial intelligence, and in particular relates to an image classification method, an image classification device, electronic equipment and a storage medium.

Background

The image classification can be an image processing method for distinguishing images of different categories according to different features reflected in image information, and is one of the core problems in the field of computer vision. Image classification has a number of different practical applications; for example, sensitive image recognition, face recognition, automatic driving, etc. may be included. In an application scene of sensitive image recognition, whether the image belongs to a sensitive category can be judged by classifying the image; where a sensitive image may refer to an image in which there is a violation or illegal content. Aiming at sensitive image recognition, the existing image classification model mainly adopts a deep learning classification method to assist manual secondary auditing. At present, in a content auditing application scene, sensitive image classification tasks are multiple, the number of labels is multiple, a service scene is complex, and the operation timeliness requirement is high. The single model is directly used for classification, the effect is difficult to cope with complex and changeable application scenes, and meanwhile, the operation requirement of the rapid optimization iteration is difficult to reach, so that the sensitive image classification can be realized through the integrated learning.

In particular, ensemble learning refers to a technique that uses multiple compatible learning algorithms or learning models to perform a single task in order to achieve better predictive performance. The main methods of ensemble learning can be categorized into three main categories: stacking, lifting algorithms, and bagging methods. The stacking method is to construct a plurality of first-stage learners of different types, use the first-stage learners to obtain first-stage prediction results, and then construct a second-stage learner based on the first-stage prediction results to obtain final prediction results. However, the stacking method needs a plurality of heterogeneous models to obtain a better integration result, meanwhile, the data quantity requirement of the training set is higher, the training complexity is high, the parameter adjusting difficulty is high, and meanwhile, if different types of learners have good differences, the effect of the stacking method is difficult to guarantee.

Disclosure of Invention

In view of the above-mentioned technical problems, the present disclosure provides an image classification method, an apparatus, an electronic device, and a storage medium.

According to an aspect of the embodiments of the present disclosure, there is provided an image classification method including:

acquiring an image to be classified;

inputting the image to be classified into a target extraction sub-model corresponding to each of a plurality of service scene types in a target image classification model to perform feature extraction processing to obtain target image feature information corresponding to each service scene type; the target image classification model is obtained by training a sample scene adaptation module in a sample extraction sub-model based on mutual exclusion loss information, the mutual exclusion loss information represents the similarity degree of target adaptation characteristic information, and the target adaptation characteristic information is adaptation characteristic information output by the sample scene adaptation module corresponding to any two business scene types;

and determining the target image category information corresponding to the image to be classified based on the target image characteristic information corresponding to each service scene type and the target classification sub-model corresponding to each service scene type.

According to another aspect of the embodiments of the present disclosure, there is provided an image classification apparatus including:

The image acquisition module is used for acquiring images to be classified;

the first feature extraction module is used for inputting the image to be classified into a target extraction sub-model corresponding to each of a plurality of service scene types in a target image classification model to perform feature extraction processing to obtain target image feature information corresponding to each service scene type; the target image classification model is obtained by training a sample scene adaptation module in a sample extraction sub-model based on mutual exclusion loss information, the mutual exclusion loss information represents the similarity degree of target adaptation characteristic information, and the target adaptation characteristic information is adaptation characteristic information output by the sample scene adaptation module corresponding to any two business scene types;

and the image category determining module is used for determining the target image category information corresponding to the image to be classified based on the target image characteristic information corresponding to each service scene type and the target classification sub-model corresponding to each service scene type.

According to another aspect of the embodiments of the present disclosure, there is provided an electronic device including: a processor; a memory for storing the processor-executable instructions; wherein the processor is configured to execute the instructions to implement the image classification method described above.

According to another aspect of the embodiments of the present disclosure, there is provided a computer-readable storage medium, which when executed by a processor of an electronic device, causes the electronic device to perform the above-described image classification method.

According to another aspect of the disclosed embodiments, there is provided a computer program product containing instructions which, when run on a computer, cause the computer to perform the above-described image classification method.

The technical scheme provided by the embodiment of the disclosure at least brings the following beneficial effects:

the method comprises the steps of obtaining an image to be classified, inputting the image to be classified into target extraction sub-models corresponding to a plurality of service scene types in a target image classification model, and carrying out feature extraction processing on the target extraction sub-models corresponding to the service scene types to obtain target image feature information corresponding to each service scene type, wherein the target image classification model is obtained by training a sample scene adaptation module in the sample extraction sub-model based on mutual exclusion loss information, the mutual exclusion loss information characterizes the similarity degree between adaptation feature information output by the sample scene adaptation module corresponding to any two service scene types in the plurality of service scene types, the similarity redundancy between the target extraction sub-models corresponding to different service scene types can be reduced through mutual exclusion loss information training, the complementarity is increased, the relevance of feature information output by different target extraction sub-models can be reduced as much as possible, the orthogonality of the feature information output by different target extraction sub-models can be improved, the integration effect among the plurality of sub-models can be improved, the training quantity is small by training the sample scene adaptation module, the training cost can be greatly reduced, the image classification effect can be improved by combining the target scene adaptation information corresponding to the target image models corresponding to the service scene types in the plurality of service scene types, and the image classification information can be more accurate, and the image classification effect can be more improved for the target classification model corresponding to the target image types can be more than the target classification model.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the disclosure and together with the description, serve to explain the principles of the disclosure and do not constitute an undue limitation on the disclosure.

FIG. 1 is a schematic diagram of an application system shown in accordance with an exemplary embodiment;

FIG. 2 is a flow chart illustrating a method of image classification according to an exemplary embodiment;

FIG. 3 is a block diagram of a sample extraction sub-model, shown in accordance with an exemplary embodiment;

FIG. 4 is a block diagram of another sample extraction sub-model, shown in accordance with an exemplary embodiment;

FIG. 5 is a schematic diagram of a training process for a target image classification model, according to an exemplary embodiment;

FIG. 6 is a block diagram of an image classification device according to an exemplary embodiment;

FIG. 7 is a block diagram of an electronic device for classifying images to be classified, according to an exemplary embodiment;

fig. 8 is a block diagram of another electronic device for classifying images to be classified, shown according to an exemplary embodiment.

Detailed Description

Various exemplary embodiments, features and aspects of the present application will be described in detail below with reference to the accompanying drawings. In the drawings, like reference numbers indicate identical or functionally similar elements. Although various aspects of the embodiments are illustrated in the accompanying drawings, the drawings are not necessarily drawn to scale unless specifically indicated.

The word "exemplary" is used herein to mean "serving as an example, embodiment, or illustration. Any embodiment described herein as "exemplary" is not necessarily to be construed as preferred or advantageous over other embodiments.

In addition, numerous specific details are set forth in the following detailed description in order to provide a better understanding of the present application. It will be understood by those skilled in the art that the present application may be practiced without some of these specific details. In some instances, methods, means, elements, and circuits have not been described in detail as not to unnecessarily obscure the present application.

Artificial intelligence (Artificial Intelligence, AI) is the theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and extend human intelligence, sense the environment, acquire knowledge and use the knowledge to obtain optimal results. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and other directions.

In recent years, with research and progress of artificial intelligence technology, the artificial intelligence technology is widely applied in a plurality of fields, and the scheme provided by the embodiment of the application relates to technologies such as computer vision technology, and is specifically described by the following embodiments:

referring to fig. 1, fig. 1 is a schematic diagram illustrating an application system according to an exemplary embodiment. The application system can be used for the image classification method of the application. As shown in fig. 1, the application system may include at least a server 01 and a terminal 02.

In the embodiment of the present application, the server 01 may be used for classifying the image to be classified. Specifically, the server 01 may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server that provides cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, CDNs, basic cloud computing services such as big data and artificial intelligence platforms, and the like.

In the embodiment of the present application, the terminal 02 may be configured to generate an image to be classified. The terminal 02 may include a smart phone, a desktop computer, a tablet computer, a notebook computer, a smart speaker, an in-vehicle terminal, a digital assistant, an augmented reality (augmented reality, AR)/Virtual Reality (VR) device, an intelligent wearable device, or other type of physical device, or may include software running in the physical device, such as an application program, or the like. The operating system running on the terminal 02 in the embodiment of the present application may include, but is not limited to, an android system, an IOS system, linux, windows, and the like.

In addition, it should be noted that, fig. 1 is only an application environment provided by the disclosure, and in practical application, other application environments may also be included, for example, a classification process for an image to be classified may also be implemented on the terminal 02.

In the embodiment of the present disclosure, the terminal 02 and the server 01 may be directly or indirectly connected through a wired or wireless communication method, which is not limited in this application.

It should be noted that, a possible sequence of steps is shown in the drawings, and the order is not necessarily strictly required. Some steps may be performed in parallel without mutual dependency.

Specifically, fig. 2 is a flow chart illustrating a method of image classification according to an exemplary embodiment. As shown in fig. 2, the image classification method may be used in an electronic device such as a terminal or a server, and specifically may include the following steps:

s201: and acquiring an image to be classified.

In a specific embodiment, the image to be classified may refer to an image to be identified as belonging to the image category information. By way of example, the image class information may include sensitive class information or non-sensitive class information, taking an application scenario for sensitive image recognition as an example.

In a specific embodiment, taking an application scenario of sensitive image recognition as an example, the user terminal may upload a multimedia resource to be processed to the service platform by triggering a resource uploading operation, so that the service platform performs corresponding service processing on the multimedia resource. Further, after receiving the multimedia resource, the service platform can process the multimedia resource according to the resource type under the condition that the sensitive class classification is required to be performed on the image to be processed; under the condition that the resource type of the multimedia resource is a video type or a moving picture type, frame extraction and long picture segmentation can be sequentially carried out to obtain an image sequence to be classified, and correspondingly, the image sequence to be classified can be sequentially input into a target image classification model for classification; in the case that the resource type of the multimedia resource is an image type, the multimedia resource may be used as an image to be classified, so as to perform classification processing on the image to be classified.

S203: and inputting the image to be classified into a target extraction sub-model corresponding to each of a plurality of service scene types in the target image classification model, and performing feature extraction processing to obtain target image feature information corresponding to each service scene type.

In a specific embodiment, the target image classification model may be used to classify the image to be classified. The target image classification model can be obtained by training a sample scene adaptation module in the sample extraction sub-model based on mutual exclusion loss information. The mutual exclusion loss information can represent the similarity degree of the target adaptation characteristic information; the target adaptation characteristic information may be adaptation characteristic information output by a sample scene adaptation module corresponding to any two of the plurality of service scene types. It can be understood that the higher the similarity degree between the adaptation characteristic information output by the sample scene adaptation modules corresponding to any two service scene types, the higher the mutual exclusion loss information correspondingly. By way of example, taking an application scenario of sensitive image recognition as an example, the service scenario type may include a color drawing type, a simple drawing type, and the like.

In a specific embodiment, the target image classification model may include a target extraction sub-model corresponding to each of the plurality of service scene types and a target classification sub-model corresponding to each of the plurality of service scene types. The target extraction sub-model corresponding to any business scene type can be used for carrying out feature extraction processing aiming at any business scene type on the image to be classified. The target classification sub-model corresponding to any business scene type can be used for classifying the image to be classified based on the target image characteristic information corresponding to any business scene type.

In a specific embodiment, the target image classification model may be obtained by:

acquiring a first sample image set;

inputting the first sample image set into a sample extraction sub-model corresponding to each of a plurality of service scene types in a preset machine learning model to perform feature extraction processing to obtain sample adaptation feature information corresponding to each service scene type;

determining mutual exclusion loss information corresponding to the target scene type based on sample adaptation characteristic information corresponding to the target scene type;

training a corresponding sample scene adaptation module of the target scene type based on the mutual exclusion loss information corresponding to the target scene type to obtain a target image classification model.

In a particular embodiment, the first set of sample images may be used to train a pre-set machine learning model. The first set of sample images may comprise at least one first sample image.

In a specific embodiment, the preset machine learning model may refer to an image classification model to be trained. The preset machine learning model may include a sample extraction sub-model corresponding to each of the plurality of business scenario types and a sample extraction sub-model corresponding to each of the plurality of business scenario types.

In a specific embodiment, the sample adaptation feature information corresponding to any service scene type may refer to feature information output by the sample scene adaptation module corresponding to any service scene type in the preset machine learning model. In particular, the representation of the sample adaptation characteristic information may comprise a vector or a matrix, etc.

In a specific embodiment, sample extraction sub-models of different structures may be selected according to different business characteristics or final effects; specifically, a serial structure or a hybrid serial-parallel structure embedded between layers may be included.

In a particular embodiment, FIG. 3 is a block diagram of a sample extraction sub-model, shown in accordance with an exemplary embodiment. As shown in fig. 3, the sample extraction sub-model corresponding to each service scene type may include a sample encoding module and a sample feature extraction module; the sample feature extraction module may include a first attention module, a first non-linear module, a sample scene adaptation module, and a first fusion module.

In a specific embodiment, in a case that the sample feature extraction module includes a first attention module, a first nonlinear module, a sample scene adaptation module, and a first fusion module, inputting a first sample image set to a sample extraction sub-model corresponding to each of a plurality of service scene types in a preset machine learning model to perform feature extraction processing, to obtain sample adaptation feature information corresponding to each service scene type, the method may include:

Inputting the first sample image set to a sample coding module corresponding to each service scene type for coding processing to obtain sample image coding information corresponding to each service scene type;

inputting sample image coding information corresponding to each service scene type into a first attention module corresponding to each service scene type for attention weighting processing to obtain first weighting characteristic information corresponding to each service scene type;

inputting the first weighted characteristic information corresponding to each service scene type and the sample image coding information corresponding to each service scene type into a first nonlinear module corresponding to each service scene type for nonlinear transformation processing to obtain first sample characteristic information corresponding to each service scene type;

inputting the first weighted feature information corresponding to each service scene type and the sample image coding information corresponding to each service scene type into a sample scene adaptation module corresponding to each service scene type for feature adaptation processing to obtain first adaptation feature information corresponding to each service scene type;

and inputting the first sample characteristic information corresponding to each service scene type, the first adapting characteristic information corresponding to each service scene type, the first weighting characteristic information corresponding to each service scene type and the sample image coding information corresponding to each service scene type into a first fusion module corresponding to each service scene type to perform characteristic fusion processing to obtain second sample characteristic information corresponding to each service scene type.

In a particular embodiment, the sample encoding module may be configured to encode each first sample image in the first set of sample images. The sample encoding module may comprise a sample linear projection module.

In a specific embodiment, the sample image coding information corresponding to any service scene type may refer to image coding information of each first sample image in the first sample image set under any service scene type. The representation of the sample image coding information corresponding to any service scene type can comprise vectors or matrixes and the like.

In a specific embodiment, the segmentation processing may be performed on any one of the first sample images to obtain a plurality of first segmented images, and then the plurality of first segmented images are input to a sample linear projection module corresponding to any one of the service scene types to perform linear projection processing, so that coding information corresponding to each of the plurality of first segmented images in any one of the service scene types may be obtained, and correspondingly, the coding information corresponding to each of the plurality of first segmented images may be used as the sample image coding information corresponding to any one of the first sample images in any one of the service scene types.

In a specific embodiment, the first attention module corresponding to any business scenario type may include a first attention layer and a first normalization layer.

In a specific embodiment, the first weighted feature information corresponding to any one of the service scene types may refer to feature information of each of the first sample images in the first sample image set output by the first attention module corresponding to any one of the service scene types. The first weighted feature information corresponding to any one of the service scene types may include feature information corresponding to a plurality of first sample images in the first sample image set under any one of the service scene types.

In a specific embodiment, sample image coding information corresponding to any first sample image under any service scene type is input to a first normalization layer corresponding to any service scene type for normalization processing, so that first standard feature information corresponding to any first sample image under any service scene type can be obtained, correspondingly, the first standard feature information is input to a first attention layer corresponding to any service scene type for weighting processing, so that first weighted feature information corresponding to any first sample image under any service scene type can be obtained.

In a specific embodiment, the first sample feature information corresponding to any one of the service scene types may refer to feature information of each of the first sample images in the first sample image set output by the first nonlinear module corresponding to any one of the service scene types. The first sample feature information corresponding to any one of the service scene types may include feature information corresponding to a plurality of first sample images in the first sample image set under any one of the service scene types.

In a specific embodiment, the first nonlinear module corresponding to any traffic scene type may include a second normalization layer and a first feature processing layer. Wherein the first feature handling layer may comprise an MLP (Multilayer Perceptron, multi-layer perceptron).

In a specific embodiment, first weighted feature information corresponding to any first sample image under any service scene type is input to the second normalization layer corresponding to any service scene type for normalization processing, so that second standard feature information corresponding to any first sample image under any service scene type can be obtained, correspondingly, the second standard feature information is input to the first feature processing layer corresponding to any service scene type for nonlinear transformation processing, and first sample feature information corresponding to any first sample image under any service scene type can be obtained.

In a specific embodiment, the first adapting feature information corresponding to any service scene type may refer to feature information of each first sample image in the first sample image set output by the sample scene adapting module corresponding to any service scene type under any service scene type. The first adaptive feature information corresponding to any one of the service scene types may include feature information corresponding to each of the first sample images in the first sample image set under any one of the service scene types.

In a specific embodiment, the sample scene adaptation module corresponding to any one of the service scene types may be used to extract the feature information corresponding to any one of the service scene types. The sample scene adaptation module corresponding to any service scene type may include two full connection layers and a linear rectification layer.

In a specific embodiment, the first adapting feature information corresponding to any service scenario type may be obtained by the following formula:

out is first adaptive feature information corresponding to any first sample image under any service scene type; w (W) ^up And W is ^down The weight of the full connection layer;to activate a response function;xthe feature information is obtained by superposing the first weighted feature information corresponding to any service scene type and the sample image coding information corresponding to any service scene type; t is the transpose operation. Specifically, the activation response function may include a ReLU (Linear rectification function ).

In a specific embodiment, the second sample feature information corresponding to any service scene type may refer to feature information of each first sample image in the first sample image set output by the first fusion module in the sample extraction sub-model corresponding to any service scene type under any service scene type. The second sample characteristic information corresponding to any one of the service scene types may include characteristic information corresponding to each of the first sample images in the first sample image set under any one of the service scene types.

In a specific embodiment, the second sample feature information corresponding to any traffic scenario type may be obtained by the following formula:

wherein V is ₂ The second sample characteristic information corresponding to any business scene type; v (V) _A The first adapting feature information corresponding to any business scene type; v (V) ₁ The method comprises the steps of obtaining first sample characteristic information corresponding to any business scene type; v (V) _W The first weighting characteristic information corresponding to any business scene type is obtained; v (V) _C And encoding information for the sample image corresponding to any service scene type.

In a particular embodiment, the target scene type pair may be any two of a plurality of business scene types.

In a specific embodiment, the sample adaptation feature information corresponding to the target scene type may refer to adaptation feature information output by the sample scene adaptation modules corresponding to two service scene types in the target scene type pair. The target scene type pair may include a first traffic scene type and a second traffic scene type. The sample adaptation feature information corresponding to the target scene type may include adaptation feature information corresponding to the first service scene type and adaptation feature information corresponding to the second service scene type.

In a specific embodiment, mutual exclusion loss information corresponding to the target scene type may represent a degree of similarity between the target scene type and the adapting feature information output by the corresponding sample scene adapting module. Specifically, mutual exclusion loss information corresponding to the target scene type can be obtained by the following formula:

the method comprises the steps that a repetitive Loss is mutual exclusion Loss information corresponding to a target scene type; a and B are sample adaptation characteristic information corresponding to the target scene type; n is the number of elements in the adaptation characteristic information; a is that _i Is the ith element in the adaptation characteristic information A; b (B) _i To adapt characteristic informationThe ith element in information B. Specifically, the number of elements in the adaptation characteristic information a and the adaptation characteristic information B may be the same.

In a specific embodiment, based on mutual exclusion loss information corresponding to each two service scene types, the sample scene adaptation module corresponding to each two service scene types may be trained to obtain a target image classification model. Specifically, two different update gradients may be determined based on mutual exclusion loss information corresponding to the target scene type, and then the corresponding sample scene adaptation modules may be trained for the target scene type based on the two different update gradients. Further, the module parameters in the sample scene adaptation module may be updated according to random gradient descent.

In a specific embodiment, in the process of training to obtain the target image classification model, only the sample scene adaptation module in the preset machine learning model may be trained, other module parameters are not updated, and after training of the sample scene adaptation module is completed, the preset machine learning model obtained by training may be used as the target image classification model.

In the above embodiment, the mutual exclusion loss information training can reduce the similarity redundancy between the target extraction sub-models corresponding to different service scene types, increase the complementarity, reduce the correlation of the output characteristic information of different target extraction sub-models as much as possible, improve the orthogonality of the output characteristic information of different target extraction sub-models, and further help to improve the integration effect among multiple sub-models, and obtain the target image classification model through the training sample scene adaptation module, so that the training parameters are less, and the training cost can be greatly reduced.

In a specific embodiment, the method may further include:

acquiring first label category information corresponding to each sample image in a first sample image set;

inputting second sample characteristic information corresponding to each service scene type into a sample classification sub-model corresponding to each service scene type in a preset machine learning model for classification processing to obtain third category prediction information corresponding to each service scene type;

inputting the first sample image set into a sample weight learning module in a preset machine learning model for weight analysis processing to obtain sample weight information corresponding to each sample image in the first sample image set;

based on the sample weight information, carrying out fusion processing on third category prediction information corresponding to each of a plurality of service scene types to obtain fourth category prediction information corresponding to each sample image in the first sample image set;

determining first weight loss information based on the fourth category prediction information and the first tag category information;

correspondingly, training the sample scene adaptation module corresponding to the target scene type based on the mutual exclusion loss information corresponding to the target scene type to obtain the target image classification model may include:

Training a corresponding sample scene adaptation module of the target scene type based on mutual exclusion loss information corresponding to the target scene type, and training a sample weight learning module based on the first weight loss information to obtain a target image classification model.

In a specific embodiment, the first label class information corresponding to any first sample image may be used to provide a reference for training of a preset machine learning model. For example, taking an application scenario for sensitive image recognition as an example, the first tag class information may include sensitive class information or non-sensitive class information.

In a specific embodiment, the sample classification sub-model corresponding to any one of the service scene types may be used to perform a class prediction on the sample image based on the sample feature information corresponding to any one of the service scene types. The sample classification sub-model corresponding to any traffic scene type may include an MLP for classification.

In a specific embodiment, the third category prediction information corresponding to any one of the service scene types may characterize a probability that each of the first sample images in the first sample image set belongs to a plurality of image category information under any one of the service scene types. The third category prediction information corresponding to any one of the service scene types may include third prediction probabilities corresponding to each of the plurality of first sample images under any one of the service scene types. For example, taking an application scenario of sensitive image recognition as an example, the third prediction probability corresponding to any first sample image under any service scenario type may represent the probability that any first sample image belongs to sensitive category information under any service scenario type.

In a specific embodiment, second sample feature information corresponding to any one of the first sample images in any one of the service scene types is input into a sample classification sub-model corresponding to any one of the service scene types in a preset machine learning model to perform classification processing, so that third category prediction information corresponding to any one of the service scene types of any one of the first sample images can be obtained.

In a particular embodiment, the sample weight learning module may be configured to adaptively learn weights of each of the prediction information output by the plurality of different sample classification sub-models. The sample weight learning module may include a sample convolution module, a sample multi-layer perceptron, and a sample logistic regression module connected in sequence.

In a specific embodiment, the sample weight information may represent respective importance degrees of a plurality of third category prediction information corresponding to a plurality of service scene types.

In a specific embodiment, a sample convolution module in a sample weight learning module inputs any one of the first sample images to perform feature extraction processing, so as to obtain convolution feature information corresponding to the any one of the first sample images, and then inputs the convolution feature information corresponding to the any one of the first sample images to a sample multi-layer perceptron in the sample weight learning module to perform nonlinear transformation processing, so as to obtain transformed feature information corresponding to the any one of the first sample images, and then inputs the transformed feature information corresponding to the any one of the first sample images to a sample logistic regression module in the sample weight learning module to perform mapping processing, so as to obtain sample weight information corresponding to the any one of the first sample images.

In a specific embodiment, the fourth category prediction information corresponding to any sample image may be used to characterize the probability that any sample image belongs to a plurality of image category information. For example, taking an application scenario of sensitive image recognition as an example, the fourth category prediction information corresponding to any sample image may include a fourth prediction probability corresponding to any sample image; the fourth predictive probability may characterize a probability that any of the sample images belongs to sensitive class information.

In a specific embodiment, the fourth type of prediction information corresponding to any sample image may be obtained by the following formula:

wherein,Zfourth category prediction information corresponding to any sample image;α _i sample weight information corresponding to any sample imageiWeight information corresponding to each service scene type;Y _i in the first sample imageiThe corresponding third category prediction information under the individual service scene types;Nis the number of traffic scenario types.

In a specific embodiment, the first weight loss information may characterize a degree of difference between the predicted category result corresponding to the fourth category predicted information and the first tag category information.

In a specific embodiment, the first weight loss information may be obtained by the following formula:

wherein,L _W the first weight loss information;q（x _i ) Set first for first sample imageiPredictive probability corresponding to the first label category information of the sample images;nfor inclusion in the first sample image setNumber of sample images. Exemplary, taking the application scenario of sensitive image recognition as an example, in the firstiIn the case that the first tag class information of the individual sample image is sensitive class information, the prediction probability corresponding to the first tag class information may refer to the first tag class informationiProbability that the individual sample image belongs to the sensitive image; in the first placeiIn the case that the first tag class information of the sample image is non-sensitive class information, the prediction probability corresponding to the first tag class information may refer to the first tag class informationiProbability that a sample image belongs to a non-sensitive image.

In a specific embodiment, the training may be performed on the corresponding sample scene adaptation module based on the mutual exclusion loss information corresponding to the target scene type in the preset machine learning model, and the training may be performed on the sample weight learning module in the preset machine learning model based on the first weight loss information, and accordingly, the preset machine learning model obtained after the training may be used as the target image classification model. Wherein the module parameters in the sample weight learning module may be updated in a random gradient descent.

In the above embodiment, the second sample feature information corresponding to each service scene type is input to the sample classification sub-model corresponding to each service scene type in the preset machine learning model to perform classification processing, so as to obtain third category prediction information corresponding to each service scene type, the first sample image set is input to the sample weight learning module in the preset machine learning model to perform weight analysis processing, so as to obtain sample weight information corresponding to each sample image in the first sample image set, fusion processing is performed on the third category prediction information corresponding to each of the plurality of service scene types based on the sample weight information, so as to obtain fourth category prediction information corresponding to each sample image in the first sample image set, the weight information can be made to be no longer a sub-optimal solution designed manually through the sample weight learning module, and the global optimal solution is more approximated through the learning weight.

In a particular embodiment, FIG. 4 is a block diagram of another sample extraction sub-model, shown in accordance with an exemplary embodiment. As shown in fig. 4, the sample extraction sub-model corresponding to each service scene type may include a sample encoding module and a sample feature extraction module; the sample feature extraction module may include a second attention module, a second nonlinear module, a sample scene adaptation module, and a second fusion module; the sample scene adaptation module may include a first sample adaptation module and a second sample adaptation module.

In a specific embodiment, the inputting the first sample image set into the sample extraction sub-model corresponding to each of the plurality of service scene types in the preset machine learning model to perform feature extraction processing, to obtain sample adaptive feature information corresponding to each service scene type may include:

inputting sample image coding information corresponding to each service scene type into a second attention module corresponding to each service scene type for attention weighting processing to obtain second weighting characteristic information corresponding to each service scene type;

inputting second weighted characteristic information corresponding to each service scene type into a first sample adaptation module corresponding to each service scene type for characteristic adaptation processing to obtain second adaptation characteristic information corresponding to each service scene type;

inputting second adaptive feature information corresponding to each service scene type and sample image coding information corresponding to each service scene type into a second nonlinear module corresponding to each service scene type to perform nonlinear transformation processing to obtain third sample feature information corresponding to each service scene type;

Inputting the third sample characteristic information corresponding to each service scene type into a second sample adaptation module corresponding to each service scene type to perform characteristic adaptation processing to obtain third adaptation characteristic information corresponding to each service scene type;

and inputting the third adapting characteristic information corresponding to each service scene type, the second adapting characteristic information corresponding to each service scene type and the sample image coding information corresponding to each service scene type into a second fusion module corresponding to each service scene type to perform characteristic fusion processing to obtain fourth sample characteristic information corresponding to each service scene type.

In a specific embodiment, the second weighted feature information corresponding to any one of the service scene types may refer to feature information of each of the first sample images in the first sample image set output by the second attention module corresponding to any one of the service scene types. The second weighted feature information corresponding to any one of the service scene types may include feature information corresponding to a plurality of first sample images in the first sample image set under any one of the service scene types.

In a specific embodiment, the second weighted feature information corresponding to any one of the service scene types may refer to feature information of each of the first sample images in the first sample image set output by the second attention module corresponding to any one of the service scene types.

In a specific embodiment, the second attention module corresponding to any traffic scene type may include a third attention layer and a first feed-forward layer. Specifically, sample image coding information corresponding to any service scene type can be input to a third attention layer corresponding to any service scene type for weighting processing, so as to obtain fifth weighting characteristic information corresponding to any first sample image under any service scene type; correspondingly, the fifth weighted feature information is input into the first feedforward layer corresponding to any business scene type for feature processing, so that the second weighted feature information corresponding to any first sample image under any business scene type can be obtained.

In a specific embodiment, the second adapting feature information corresponding to any service scene type may include feature information corresponding to a plurality of first sample images in the first sample image set under any service scene type.

In a specific embodiment, the first sample adaptation module corresponding to any traffic scenario type may include two full connection layers and a linear rectifying layer. Specifically, the method for acquiring the second adapting feature information corresponding to any service scene type may refer to the method for acquiring the first adapting feature information corresponding to any service scene type, which is not described herein.

In a specific embodiment, the sample feature extraction module corresponding to each service scene type may further include a third normalization layer. Specifically, the second adapting feature information corresponding to each service scene type and the sample image coding information corresponding to each service scene type may be input to a third normalization layer corresponding to each service scene type for normalization processing, so as to obtain third standard feature information corresponding to any service scene type. Correspondingly, the third standard characteristic information corresponding to each service scene type can be input into a second nonlinear module corresponding to each service scene type to perform nonlinear transformation processing, so as to obtain third sample characteristic information corresponding to each service scene type.

In a specific embodiment, the second nonlinear module corresponding to any traffic scene type may include a second feedforward layer and a third feedforward layer. Specifically, the second adaptive feature information corresponding to each service scene type and the sample image coding information corresponding to each service scene type are input to a second feedforward layer corresponding to each service scene type for feature processing, so that first feedforward feature information corresponding to each service scene type can be obtained; correspondingly, the first feedforward characteristic information corresponding to each service scene type can be input to a third feedforward layer corresponding to each service scene type for characteristic processing, and the third sample characteristic information corresponding to each service scene type is obtained.

In a specific embodiment, the second sample adaptation module corresponding to any traffic scenario type may include two full connection layers and a linear rectifying layer. Specifically, the method for acquiring the third adapting feature information corresponding to any service scene type may refer to the method for acquiring the first adapting feature information corresponding to any service scene type, which is not described herein.

In a specific embodiment, the third standard feature information corresponding to any service scene type and the third adapting feature information corresponding to any service scene type may be input to the second fusion module corresponding to any service scene type to perform feature fusion processing, so as to obtain fourth sample feature information corresponding to any service scene type.

In a specific embodiment, the sample feature extraction module corresponding to each service scene type may further include a fourth normalization layer. Specifically, the fourth sample characteristic information corresponding to any service scene type can be input into a fourth normalization layer corresponding to any service scene type for normalization processing, so as to obtain fourth standard characteristic information. Correspondingly, fourth standard characteristic information corresponding to each service scene type can be input into a sample classification sub-model corresponding to each service scene type in a preset machine learning model to be classified, so that fifth category prediction information corresponding to each service scene type is obtained.

In a specific embodiment, mutual exclusion loss information corresponding to the target scene type may be obtained by the following formula:

the method comprises the steps that a repetitive Loss is mutual exclusion Loss information corresponding to a target scene type; a and B are second adapting characteristic information corresponding to the target scene type; c and D are third adapting feature information corresponding to the target scene type; n is the number of elements in the adaptation characteristic information; a is that _i Is the ith element in the second adapting feature information A; b (B) _i Is the ith element in the second adapting feature information B;an ith element in the third adapting feature information C; />Is the i-th element in the third adaptation characteristic information D. Specifically, the second adapting feature information a, the second adapting feature informationThe number of elements in B may be the same, and correspondingly, the number of elements in the third adaptation characteristic information C and the third adaptation characteristic information D may be the same.

In a specific embodiment, the method may further include:

inputting fourth sample characteristic information corresponding to each service scene type into a sample classification sub-model corresponding to each service scene type in a preset machine learning model for classification processing to obtain fifth category prediction information corresponding to each service scene type;

based on the sample weight information, carrying out fusion processing on fifth category prediction information corresponding to each of a plurality of service scene types to obtain sixth category prediction information corresponding to each sample image in the first sample image set;

determining second weight loss information based on the sixth category prediction information and the first tag category information;

training a corresponding sample scene adaptation module of the target scene type based on mutual exclusion loss information corresponding to the target scene type, and training a sample weight learning module based on second weight loss information to obtain a target image classification model.

In a specific embodiment, the fifth category prediction information corresponding to any one of the service scene types may characterize a probability that each first sample image in the first sample image set belongs to a plurality of image category information under any one of the service scene types. The fifth category prediction information corresponding to any one of the service scene types may include fifth prediction probabilities corresponding to each of the plurality of first sample images under any one of the service scene types. For example, taking an application scenario of sensitive image recognition as an example, the fifth prediction probability corresponding to any first sample image under any service scenario type may represent the probability that any first sample image belongs to sensitive category information under any service scenario type.

In a specific embodiment, the process of obtaining the fifth category prediction information by performing the classification processing through the sample classification sub-model corresponding to any service scene type may refer to the process of obtaining the third category prediction information by performing the classification processing through the sample classification sub-model corresponding to any service scene type, which is not described herein.

In a specific embodiment, the sixth category prediction information corresponding to any sample image may be used to characterize the probability that any sample image belongs to a plurality of image category information. For example, taking an application scenario of sensitive image recognition as an example, the sixth category of prediction information corresponding to any sample image may include a sixth prediction probability corresponding to any sample image; the sixth predictive probability may characterize a probability that any of the sample images belongs to sensitive class information.

In a specific embodiment, the above-mentioned process of obtaining the sixth category prediction information corresponding to any one sample image through the fusion process may refer to the above-mentioned process of obtaining the fourth category prediction information corresponding to any one sample image through the fusion process, which is not described herein.

In a specific embodiment, the second weight loss information may characterize a degree of difference between a prediction category junction corresponding to the sixth category prediction information and the first tag category information. Specifically, the determining process of the second weight loss information may refer to the determining process of the first weight loss information, which is not described herein.

In a specific embodiment, the training may be performed on the corresponding sample scene adaptation module based on the mutual exclusion loss information corresponding to the target scene type in the preset machine learning model, and the training may be performed on the sample weight learning module in the preset machine learning model based on the second weight loss information, and accordingly, the preset machine learning model obtained after the training may be used as the target image classification model. Wherein the module parameters in the sample weight learning module may be updated in a random gradient descent.

In a specific embodiment, the method may further include:

acquiring a second sample image set and second label category information corresponding to each sample image in the second sample image set;

inputting the second sample image set into a sample extraction sub-model corresponding to each of a plurality of service scene types in a preset machine learning model to perform feature extraction processing to obtain fifth sample feature information corresponding to each service scene type;

Inputting fifth sample characteristic information corresponding to each service scene type into a sample classification sub-model corresponding to each service scene type for classification processing to obtain seventh category prediction information corresponding to each service scene type;

determining service loss information corresponding to each service scene type based on the seventh category prediction information and the second label category information corresponding to each service scene type;

training a sample scene adaptation module corresponding to the target scene type based on mutual exclusion loss information corresponding to the target scene type, and training a sample basic module corresponding to each service scene type in a preset machine learning model based on service loss information corresponding to each service scene type to obtain a target image classification model.

In a particular embodiment, the second sample image set may be used to train a pre-set machine learning model. The second set of sample images may comprise at least one second sample image. Specifically, the second sample image set may be used to train a sample base module corresponding to each service scene type in the preset machine learning model. The sample basic module corresponding to any service scene type may refer to a module in the sample extraction sub-model corresponding to any service scene type except the sample scene adaptation module corresponding to any service scene type.

In a specific embodiment, the second label category information corresponding to any one of the second sample images may be used to provide a reference for training of a preset machine learning model. For example, taking an application scenario for sensitive image recognition as an example, the second tag class information may include sensitive class information or non-sensitive class information.

In a specific embodiment, the fifth sample feature information corresponding to any service scene type may refer to feature information of each second sample image in the second sample image set output by the sample extraction sub-model corresponding to any service scene type under any service scene type. The fifth sample feature information corresponding to any one of the service scene types may include feature information corresponding to each of the second sample images in the second sample feature set under any one of the service scene types.

In a specific embodiment, the feature extraction process of the sample extraction sub-model may refer to the feature extraction process described above, and will not be described herein.

In a specific embodiment, the seventh category prediction information corresponding to any one of the service scene types may characterize a probability that each of the second sample images in the second sample image set belongs to the plurality of image category information under any one of the service scene types. The seventh category prediction information corresponding to any one of the service scene types may include a seventh prediction probability corresponding to each of the plurality of second sample images under any one of the service scene types. For example, taking an application scenario of sensitive image recognition as an example, the seventh prediction probability corresponding to any second sample image under any service scenario type may represent the probability that any second sample image belongs to sensitive category information under any service scenario type.

In a specific embodiment, the processing procedure of classifying the sample classification sub-model corresponding to any service scene type to obtain the seventh category prediction information may refer to the processing procedure of classifying the sample classification sub-model corresponding to any service scene type to obtain the third category prediction information, which is not described herein.

In a specific embodiment, the service loss information corresponding to any service scene type may represent a degree of difference between a prediction category result corresponding to the seventh category prediction information corresponding to any service scene type and the second tag category information.

In a specific embodiment, the service loss information corresponding to any service scenario type may be obtained by the following formula:

wherein,L _m for the first of a plurality of business scenario typesmService loss information corresponding to the individual service scene types;q _m （x _i ) To at the firstmUnder the type of personal business scenarioiA prediction probability corresponding to second label category information of the second sample images;nis the number of sample images contained in the second set of sample images. Exemplary, taking the application scenario of sensitive image recognition as an example, in the firstiIn the case where the second tag class information corresponding to the second sample image is sensitive class information, q _m （x _i ) Can be based onmThe seventh category prediction information corresponding to the individual business scene types is determinediProbability that the second sample image belongs to the sensitive image; in the first placeiIn case the second label class information of the second sample image is non-sensitive class information,q _m （x _i ) Can be based onmThe seventh category prediction information corresponding to the individual business scene types is determinediProbability that the second sample image belongs to a non-sensitive image.

In a specific embodiment, the training may be performed on the sample scene adaptation module corresponding to the target scene type in the preset machine learning model based on the mutual exclusion loss information corresponding to the target scene type, and the training may be performed on the sample base module corresponding to each service scene type in the preset machine learning model based on the service loss information corresponding to each service scene type, so as to obtain the target image classification model. Wherein the module parameters in the sample base module may be updated in a random gradient descent.

In a specific embodiment, the sample basic module corresponding to each service scene type in the preset machine learning model may be trained based on the service loss information corresponding to each service scene type, then the corresponding sample scene adaptation module may be trained based on the mutual exclusion loss information corresponding to the target scene type in the preset machine learning model, and the sample weight learning module in the preset machine learning model may be trained based on the weight loss information, so as to obtain the target image classification model.

In a specific embodiment, the target image feature information may refer to feature information of the image to be classified. Further, the target image feature information corresponding to any one of the service scene types may refer to feature information obtained by performing feature extraction processing based on the target extraction sub-model corresponding to any one of the service scene types. The representation of the target image characteristic information may include a vector or matrix, or the like.

In a specific embodiment, the step S203 may include:

inputting the images to be classified into a target coding module in a target extraction sub-model corresponding to each service scene type to carry out coding processing to obtain target image coding information corresponding to each service scene type;

and inputting the target image coding information corresponding to each service scene type into a target feature extraction module in a target extraction sub-model corresponding to each service scene type for extraction processing to obtain target image feature information corresponding to each service scene type.

In a specific embodiment, the target image coding information corresponding to any service scene type may refer to image coding information of an image to be classified under any service scene type. The representation form of the target image coding information corresponding to any business scene type can comprise vectors, matrixes or the like.

In a particular embodiment, the target encoding module may be a trained sample encoding module. The target coding module corresponding to any service scene type may include a target linear projection module.

In a specific embodiment, the image to be classified may be subjected to segmentation processing to obtain a plurality of second segmented images; inputting the plurality of second divided images to a target linear projection module corresponding to any service scene type for linear projection processing, so that coding information corresponding to each of the plurality of second divided images can be obtained; then, the target image coding information corresponding to any service scene type can be generated based on the coding information corresponding to each of the plurality of second divided images.

In a specific embodiment, the target feature extraction module corresponding to each service scene type may include a third attention module, a third nonlinear module, a target scene adaptation module, and a third fusion module.

In a specific embodiment, the inputting the target image coding information corresponding to each service scene type into the target feature extraction module in the target extraction sub-model corresponding to each service scene type to perform extraction processing to obtain the target image feature information corresponding to each service scene type may include:

Inputting target image coding information corresponding to each service scene type into a third attention module corresponding to each service scene type for attention weighting processing to obtain third weighting characteristic information corresponding to each service scene type;

inputting target image coding information corresponding to each service scene type and third weighting characteristic information corresponding to each service scene type into a third nonlinear module corresponding to each service scene type to perform nonlinear transformation processing to obtain first target characteristic information corresponding to each service scene type;

inputting the target image coding information corresponding to each service scene type and the third weighted characteristic information corresponding to each service scene type into a target scene adaptation module corresponding to each service scene type to perform characteristic adaptation processing to obtain fourth adaptation characteristic information corresponding to each service scene type;

and inputting fourth adapting feature information corresponding to each service scene type, first target feature information corresponding to each service scene type, target image coding information corresponding to each service scene type and third weighting feature information corresponding to each service scene type into a third fusion module corresponding to each service scene type to perform feature fusion processing to obtain target image feature information corresponding to each service scene type.

In a specific embodiment, the third attention module corresponding to each business scene type may include a fourth attention layer and a fifth normalization layer.

In a specific embodiment, the target image coding information is input to a fifth normalization layer corresponding to any one of the service scene types for normalization processing, so that fifth standard characteristic information corresponding to any one of the service scene types of the image to be classified can be obtained; and then inputting the fifth standard characteristic information corresponding to any business scene type into a fourth attention layer corresponding to any business scene type for weighting treatment, so as to obtain the third weighted characteristic information corresponding to any business scene type.

In a specific embodiment, the third nonlinear module corresponding to any traffic scene type may include a sixth normalization layer and a second feature processing layer. Wherein the second feature handling layer may comprise an MLP.

In a specific embodiment, inputting target image coding information corresponding to any service scene type and third weighted feature information corresponding to any service scene type into a sixth normalization layer corresponding to any service scene type for normalization processing, so as to obtain sixth standard feature information corresponding to any service scene type of an image to be classified; and then, inputting the sixth standard characteristic information corresponding to any business scene type into the second characteristic processing layer corresponding to any business scene type, so as to obtain the first target characteristic information corresponding to any business scene type.

In a specific embodiment, the target image coding information corresponding to each service scene type and the third weighted feature information corresponding to each service scene type may be fused to obtain first fused feature information corresponding to each service scene type; and inputting the first fusion characteristic information corresponding to each service scene type into a target scene adaptation module corresponding to each service scene type to perform characteristic adaptation processing to obtain fourth adaptation characteristic information corresponding to each service scene type. Specifically, the process of obtaining the fourth adapting feature information through the feature adapting process performed by the target scene adapting module corresponding to any service scene type may refer to the process of obtaining the first adapting feature information corresponding to any service scene type, which is not described herein again.

In a specific embodiment, the target feature extraction module corresponding to each service scene type may include a fourth attention module, a fourth nonlinear module, a target scene adaptation module, and a fourth fusion module. The target scene adaptation module corresponding to each service scene type may include a first target adaptation module and a second target adaptation module.

inputting target image coding information corresponding to each service scene type into a fourth attention module corresponding to each service scene type for attention weighting processing to obtain fourth weighting characteristic information corresponding to each service scene type;

inputting fourth weighted characteristic information corresponding to each service scene type into a first target adaptation module corresponding to each service scene type for characteristic adaptation processing to obtain fifth adaptation characteristic information corresponding to each service scene type;

inputting fifth adaptive feature information corresponding to each service scene type and target image coding information corresponding to each service scene type into a fourth nonlinear module corresponding to each service scene type for nonlinear transformation processing to obtain second target feature information corresponding to each service scene type;

inputting second target characteristic information corresponding to each service scene type into a second target adaptation module corresponding to each service scene type to perform characteristic adaptation processing to obtain sixth adaptation characteristic information corresponding to each service scene type;

And inputting the sixth adapting characteristic information corresponding to each service scene type, the fifth adapting characteristic information corresponding to each service scene type and the target image coding information corresponding to each service scene type into a fourth fusion module corresponding to each service scene type to perform characteristic fusion processing to obtain the target image characteristic information corresponding to each service scene type.

In a specific embodiment, the fourth attention module corresponding to any traffic scene type may include a fifth attention layer and a fourth feed-forward layer. Specifically, the target image coding information corresponding to any service scene type can be input into the fifth attention layer corresponding to any service scene type for weighting treatment, so as to obtain sixth weighting characteristic information corresponding to any service scene type; and inputting the sixth weighted characteristic information corresponding to any business scene type into a fourth feedforward layer corresponding to any business scene type for characteristic processing, so as to obtain the fourth weighted characteristic information corresponding to any business scene type.

In a specific embodiment, the first target adaptation module corresponding to any traffic scenario type may include two full connection layers and a linear rectifying layer. Specifically, in the process of obtaining the fifth adapting feature information by performing feature adaptation processing through the first target adapting module corresponding to any service scene type, reference may be made to the obtaining manner of the first adapting feature information corresponding to any service scene type, which is not described herein again.

In a specific embodiment, the target feature extraction module corresponding to any service scene type may further include a seventh normalization layer. Specifically, the fifth adapting feature information corresponding to each service scene type and the target image coding information corresponding to each service scene type may be input to a seventh normalization layer corresponding to each service scene type for normalization processing, so as to obtain seventh standard feature information corresponding to each service scene type. Correspondingly, the seventh standard characteristic information corresponding to each service scene type can be input to a fourth nonlinear module corresponding to each service scene type to perform nonlinear transformation processing, so as to obtain second target characteristic information corresponding to each service scene type.

In a specific embodiment, the second target adaptation module corresponding to any traffic scenario type may include two full connection layers and a linear rectifying layer. Specifically, in the process of obtaining the sixth adapting feature information by performing feature adaptation processing through the second target adapting module corresponding to any service scene type, reference may be made to the obtaining manner of the first adapting feature information corresponding to any service scene type, which is not described herein again.

In a specific embodiment, the fourth nonlinear module corresponding to any traffic scene type may include a fifth feedforward layer and a sixth feedforward layer. Specifically, the fifth adapting feature information corresponding to each service scene type and the target image coding information corresponding to each service scene type are input to a fifth feedforward layer corresponding to each service scene type for feature processing, so that second feedforward feature information corresponding to each service scene type can be obtained; correspondingly, the second feedforward characteristic information corresponding to each service scene type can be input to a sixth feedforward layer corresponding to each service scene type for characteristic processing, so as to obtain the second target characteristic information corresponding to each service scene type.

In a specific embodiment, the second target adaptation module corresponding to any traffic scenario type may include two full connection layers and a linear rectifying layer. Specifically, the method for acquiring the sixth adapting feature information corresponding to any service scene type may refer to the method for acquiring the first adapting feature information corresponding to any service scene type, which is not described herein.

In a specific embodiment, the seventh standard feature information corresponding to any service scene type and the sixth adapting feature information corresponding to any service scene type may be input into the fourth fusion module corresponding to any service scene type to perform feature fusion processing, so as to obtain the target image feature information corresponding to any service scene type.

In a specific embodiment, the target feature extraction module corresponding to any service scene type may further include an eighth normalization layer. Specifically, the seventh standard feature information corresponding to any service scene type and the sixth adapting feature information corresponding to any service scene type may be input into the fourth fusion module corresponding to any service scene type to perform feature fusion processing, so as to obtain second fusion feature information corresponding to any service scene type; and inputting the second fusion characteristic information corresponding to any business scene type into an eighth normalization layer corresponding to any business scene type for normalization processing, so as to obtain the target image characteristic information corresponding to any business scene type.

S205: and determining target image category information corresponding to the image to be classified based on the target image characteristic information corresponding to each service scene type and the target classification sub-model corresponding to each service scene type.

In a specific embodiment, the target image category information may characterize a category to which the image to be classified belongs. The target image category information may be one of a plurality of image category information.

In a specific embodiment, the step S205 may include:

Inputting the target image characteristic information corresponding to each service scene type into a target classification sub-model corresponding to each service scene type for classification processing to obtain first class prediction information corresponding to each service scene type;

and determining target image category information corresponding to the image to be classified based on the first category prediction information corresponding to each of the plurality of service scene types.

In a specific embodiment, the first category prediction information corresponding to any one of the service scene types may represent a probability that the image to be classified belongs to a plurality of image category information under any one of the service scene types. The first class prediction information corresponding to any one of the service scene types may include a first prediction probability corresponding to any one of the service scene types. For example, taking an application scenario of sensitive image recognition as an example, the first prediction probability corresponding to any one of the service scenario types may represent a probability that the image to be classified belongs to sensitive category information under any one of the service scenario types.

In a specific embodiment, the method may further include:

inputting the image to be classified into a target weight learning module in a target image classification model to perform weight analysis processing to obtain target weight information;

Correspondingly, the determining, based on the first category prediction information corresponding to each of the plurality of service scene types, the target image category information corresponding to the image to be classified may include:

based on the target weight information, carrying out fusion processing on the first class prediction information corresponding to each of the plurality of business scene types to obtain second class prediction information corresponding to the image to be classified;

the target image category information is determined based on the second category prediction information.

In a specific embodiment, the target weight learning module may refer to a trained weight learning module. The target weight learning module may include a target convolution module, a target multi-layer perceptron, and a target logistic regression module connected in sequence.

In a specific embodiment, the target weight information may represent respective importance degrees of a plurality of first class prediction information corresponding to a plurality of service scene types.

In a specific embodiment, the weight analysis processing procedure of the target weight learning module may refer to the weight analysis processing procedure of the sample weight learning module, which is not described herein.

In a specific embodiment, the second class prediction information may characterize a probability that the image to be classified belongs to a plurality of image class information. Specifically, the process of fusion processing of the first category prediction information may refer to the process of fusion processing of the third category prediction information, which is not described herein.

In a specific embodiment, the image type information corresponding to the largest prediction probability in the second type prediction information may be used as the target image type information.

In the above embodiment, the feature extraction processing is performed by obtaining the image to be classified and inputting the image to be classified into the target extraction sub-model corresponding to each of the plurality of service scene types in the target image classification model, so as to obtain the target image feature information corresponding to each service scene type, wherein the target image classification model is obtained by training the sample scene adaptation module in the sample extraction sub-model based on mutual exclusion loss information, the mutual exclusion loss information characterizes the similarity between the adaptation feature information output by the sample scene adaptation module corresponding to any two service scene types in the plurality of service scene types, the similarity redundancy between the target extraction sub-models corresponding to different service scene types can be reduced through the mutual exclusion loss information training, the complementarity is increased, the correlation of the feature information output by the different target extraction sub-models can be reduced as much as possible, the orthogonality of the feature information output by the different target extraction sub-models can be improved, the integration effect among the plurality of sub-models can be improved, the training parameter quantity is small, the training cost can be greatly reduced, the image classification effect can be improved by combining the sample scene adaptation module, the corresponding to the target image information corresponding to the target scene types can be more accurately classified, and the image classification effect can be improved, and the image classification effect can be more accurately determined for the image classification model corresponding to the target image types.

FIG. 5 is a diagram illustrating a training process for a target image classification model according to an exemplary embodiment. As shown in fig. 5, a second sample image set and second tag class information corresponding to each sample image in the second sample image set may be acquired first; inputting the second sample image set into a sample extraction sub-model corresponding to each of a plurality of service scene types in a preset machine learning model to perform feature extraction processing, so that fifth sample feature information corresponding to each service scene type can be obtained; inputting fifth sample characteristic information corresponding to each service scene type into a sample classification sub-model corresponding to each service scene type for classification processing, so as to obtain seventh category prediction information corresponding to each service scene type; based on the seventh category prediction information and the second label category information corresponding to each service scene type, service loss information corresponding to each service scene type can be determined; the sample basic module corresponding to each service scene type in the preset machine learning model can be trained based on the service loss information corresponding to each service scene type, and the trained sample basic module is obtained.

Then, on the basis of the trained sample basic module, a first sample image set is input into a sample extraction sub-model corresponding to each of a plurality of service scene types in a preset machine learning model to be subjected to feature extraction processing, and second sample feature information corresponding to each service scene type can be obtained; inputting second sample characteristic information corresponding to each service scene type into a sample classification sub-model corresponding to each service scene type in a preset machine learning model for classification processing to obtain third category prediction information corresponding to each service scene type; inputting the first sample image set into a sample weight learning module in a preset machine learning model for weight analysis processing to obtain sample weight information corresponding to each sample image in the first sample image set; based on the sample weight information, carrying out fusion processing on third category prediction information corresponding to each of a plurality of service scene types to obtain fourth category prediction information corresponding to each sample image in the first sample image set; determining first weight loss information based on the fourth category prediction information and the first tag category information; training a corresponding sample scene adaptation module of the target scene type based on mutual exclusion loss information corresponding to the target scene type, and training a sample weight learning module based on the first weight loss information, so that a target image classification model can be obtained.

Fig. 6 is a block diagram illustrating an image classification apparatus according to an exemplary embodiment. As shown in fig. 6, the apparatus may include:

an image acquisition module 610, configured to acquire an image to be classified;

the first feature extraction module 620 may be configured to input an image to be classified into a target extraction sub-model corresponding to each of a plurality of service scene types in the target image classification model to perform feature extraction processing, so as to obtain target image feature information corresponding to each service scene type; the target image classification model is obtained by training a sample scene adaptation module in a sample extraction sub-model based on mutual exclusion loss information, the mutual exclusion loss information characterizes the similarity degree of target adaptation characteristic information, and the target adaptation characteristic information is adaptation characteristic information output by the sample scene adaptation module corresponding to any two business scene types;

the image class determining module 630 may be configured to determine, based on the target image feature information corresponding to each service scene type and the target classification sub-model corresponding to each service scene type, target image class information corresponding to the image to be classified.

In a specific embodiment, the apparatus may further include:

The first sample acquisition module can be used for acquiring a first sample image set;

the second feature extraction module can be used for inputting the first sample image set into a sample extraction sub-model corresponding to each of a plurality of service scene types in a preset machine learning model to perform feature extraction processing to obtain sample adaptation feature information corresponding to each service scene type;

the first loss determination module may be configured to determine mutual exclusion loss information corresponding to a target scene type based on sample adaptation feature information corresponding to the target scene type; the target scene type pair is any two of a plurality of business scene types;

the first training module can be used for training the corresponding sample scene adaptation module of the target scene type based on the mutual exclusion loss information corresponding to the target scene type to obtain a target image classification model.

In a specific embodiment, the second feature extraction module may include:

the first coding module can be used for inputting the first sample image set into the sample coding module corresponding to each service scene type for coding processing to obtain sample image coding information corresponding to each service scene type;

The first weighting processing module can be used for inputting sample image coding information corresponding to each service scene type into the first attention module corresponding to each service scene type to carry out attention weighting processing to obtain first weighting characteristic information corresponding to each service scene type;

the first transformation module can be used for inputting the first weighted characteristic information corresponding to each service scene type and the sample image coding information corresponding to each service scene type into the first nonlinear module corresponding to each service scene type to perform nonlinear transformation processing to obtain the first sample characteristic information corresponding to each service scene type;

the first adaptation module can be used for inputting the first weighted feature information corresponding to each service scene type and the sample image coding information corresponding to each service scene type into the sample scene adaptation module corresponding to each service scene type for feature adaptation processing to obtain first adaptation feature information corresponding to each service scene type;

the first feature fusion module can be used for inputting the first sample feature information corresponding to each service scene type, the first adapting feature information corresponding to each service scene type, the first weighting feature information corresponding to each service scene type and the sample image coding information corresponding to each service scene type into the first fusion module corresponding to each service scene type for feature fusion processing to obtain the second sample feature information corresponding to each service scene type.

In a specific embodiment, the apparatus may further include:

the first tag obtaining module can be used for obtaining first tag class information corresponding to each sample image in the first sample image set;

the first classification module can be used for inputting the second sample characteristic information corresponding to each service scene type into a sample classification sub-model corresponding to each service scene type in a preset machine learning model to carry out classification processing to obtain third category prediction information corresponding to each service scene type;

the first weight analysis module can be used for inputting the first sample image set into a sample weight learning module in a preset machine learning model to perform weight analysis processing to obtain sample weight information corresponding to each sample image in the first sample image set;

the first fusion processing module can be used for carrying out fusion processing on the third category prediction information corresponding to each of the plurality of service scene types based on the sample weight information to obtain fourth category prediction information corresponding to each sample image in the first sample image set;

the second loss determination module may be configured to determine first weight loss information based on the fourth category prediction information and the first tag category information;

Correspondingly, the first training module may include:

the second training module can be used for training the corresponding sample scene adaptation module of the target scene type based on the mutual exclusion loss information corresponding to the target scene type, and training the sample weight learning module based on the first weight loss information to obtain a target image classification model.

In a specific embodiment, the second feature extraction module may include:

the second coding module can be used for inputting the first sample image set into the sample coding module corresponding to each service scene type for coding processing to obtain sample image coding information corresponding to each service scene type;

the second weighting processing module can be used for inputting sample image coding information corresponding to each service scene type into the second attention module corresponding to each service scene type to carry out attention weighting processing to obtain second weighting characteristic information corresponding to each service scene type;

the second adapting module can be used for inputting second weighted characteristic information corresponding to each service scene type into the first sample adapting module corresponding to each service scene type for characteristic adapting processing to obtain second adapting characteristic information corresponding to each service scene type;

The second transformation module can be used for inputting second adaptive feature information corresponding to each service scene type and sample image coding information corresponding to each service scene type into the second nonlinear module corresponding to each service scene type to perform nonlinear transformation processing to obtain third sample feature information corresponding to each service scene type;

the third adapting module can be used for inputting the third sample characteristic information corresponding to each service scene type into the second sample adapting module corresponding to each service scene type for characteristic adapting processing to obtain the third adapting characteristic information corresponding to each service scene type;

the second feature fusion module can be used for inputting the third adapting feature information corresponding to each service scene type, the second adapting feature information corresponding to each service scene type and the sample image coding information corresponding to each service scene type into the second fusion module corresponding to each service scene type to perform feature fusion processing, so as to obtain fourth sample feature information corresponding to each service scene type.

In a specific embodiment, the apparatus may further include:

the second tag obtaining module can be used for obtaining first tag class information corresponding to each sample image in the first sample image set;

The second classification module can be used for inputting the fourth sample characteristic information corresponding to each service scene type into a sample classification sub-model corresponding to each service scene type in a preset machine learning model to carry out classification processing to obtain fifth category prediction information corresponding to each service scene type;

the second weight analysis module can be used for inputting the first sample image set into a sample weight learning module in a preset machine learning model to perform weight analysis processing to obtain sample weight information corresponding to each sample image in the first sample image set;

the second fusion processing module can be used for carrying out fusion processing on the fifth category prediction information corresponding to each of the plurality of service scene types based on the sample weight information to obtain sixth category prediction information corresponding to each sample image in the first sample image set;

a third loss determination module operable to determine second weight loss information based on the sixth category prediction information and the first tag category information;

correspondingly, the first training module may include:

the third training module can be used for training the corresponding sample scene adaptation module of the target scene type based on the mutual exclusion loss information corresponding to the target scene type, and training the sample weight learning module based on the second weight loss information to obtain a target image classification model.

In a specific embodiment, the apparatus may further include:

the third tag obtaining module may be configured to obtain a second sample image set and second tag class information corresponding to each sample image in the second sample image set;

the third feature extraction module can be used for inputting the second sample image set into a sample extraction sub-model corresponding to each of a plurality of service scene types in a preset machine learning model to perform feature extraction processing to obtain fifth sample feature information corresponding to each service scene type;

the third classification module can be used for inputting fifth sample characteristic information corresponding to each service scene type into a sample classification sub-model corresponding to each service scene type for classification processing to obtain seventh category prediction information corresponding to each service scene type;

the fourth loss determining module may be configured to determine service loss information corresponding to each service scene type based on the seventh category prediction information and the second tag category information corresponding to each service scene type;

correspondingly, the first training module may include:

the fourth training module can be used for training the sample scene adaptation module corresponding to the target scene type based on mutual exclusion loss information corresponding to the target scene type, and training the sample basic module corresponding to each service scene type in the preset machine learning model based on service loss information corresponding to each service scene type to obtain a target image classification model; and the sample basic module corresponding to each service scene type extracts the modules except the sample scene adaptation module corresponding to each service scene type in the sub-model for the sample corresponding to each service scene type.

In a specific embodiment, the image category determining module 630 may include:

the fourth classification module can be used for inputting the target image characteristic information corresponding to each service scene type into a target classification sub-model corresponding to each service scene type for classification processing to obtain first class prediction information corresponding to each service scene type;

the first class determining module may be configured to determine target image class information corresponding to the image to be classified based on first class prediction information corresponding to each of the plurality of service scene types.

In a specific embodiment, the apparatus may further include:

the third weight analysis module can be used for inputting the images to be classified into a target weight learning module in the target image classification model to perform weight analysis processing to obtain target weight information;

accordingly, the first class determination module may include:

the third fusion processing module can be used for carrying out fusion processing on the first-class prediction information corresponding to each of the multiple service scene types based on the target weight information to obtain second-class prediction information corresponding to the image to be classified;

the second class determination module may be configured to determine the target image class information based on the second class prediction information.

In a specific embodiment, the first feature extraction module 620 may include:

the third coding module can be used for inputting the image to be classified into the target coding module in the target extraction sub-model corresponding to each service scene type to carry out coding processing to obtain target image coding information corresponding to each service scene type;

the fourth feature extraction module may be configured to input the target image coding information corresponding to each service scene type into the target feature extraction module in the target extraction sub-model corresponding to each service scene type to perform extraction processing, so as to obtain target image feature information corresponding to each service scene type.

In a specific embodiment, the fourth feature extraction module may include:

the third weighting processing module can be used for inputting the target image coding information corresponding to each service scene type into the third attention module corresponding to each service scene type to carry out attention weighting processing to obtain third weighting characteristic information corresponding to each service scene type;

the third transformation module can be used for inputting the target image coding information corresponding to each service scene type and the third weighting characteristic information corresponding to each service scene type into the third nonlinear module corresponding to each service scene type to perform nonlinear transformation processing to obtain the first target characteristic information corresponding to each service scene type;

The fourth adaptation module can be used for inputting the target image coding information corresponding to each service scene type and the third weighted characteristic information corresponding to each service scene type into the target scene adaptation module corresponding to each service scene type for characteristic adaptation processing to obtain fourth adaptation characteristic information corresponding to each service scene type;

the third feature fusion module may be configured to input fourth adaptive feature information corresponding to each service scene type, first target feature information corresponding to each service scene type, target image coding information corresponding to each service scene type, and third weighted feature information corresponding to each service scene type to the third fusion module corresponding to each service scene type to perform feature fusion processing, so as to obtain target image feature information corresponding to each service scene type.

In a specific embodiment, the fourth feature extraction module may include:

the fourth weighting processing module can be used for inputting the target image coding information corresponding to each service scene type into the fourth attention module corresponding to each service scene type to carry out attention weighting processing to obtain fourth weighting characteristic information corresponding to each service scene type;

The fifth adapting module can be used for inputting fourth weighted characteristic information corresponding to each service scene type into the first target adapting module corresponding to each service scene type for characteristic adapting processing to obtain fifth adapting characteristic information corresponding to each service scene type;

the fourth transformation module can be used for inputting fifth adaptive feature information corresponding to each service scene type and target image coding information corresponding to each service scene type into the fourth nonlinear module corresponding to each service scene type to perform nonlinear transformation processing to obtain second target feature information corresponding to each service scene type;

the sixth adapting module may be configured to input second target feature information corresponding to each service scene type to the second target adapting module corresponding to each service scene type to perform feature adapting processing, so as to obtain sixth adapting feature information corresponding to each service scene type;

the fourth feature fusion module may be configured to input the sixth adaptive feature information corresponding to each service scene type, the fifth adaptive feature information corresponding to each service scene type, and the target image coding information corresponding to each service scene type to the fourth fusion module corresponding to each service scene type to perform feature fusion processing, so as to obtain the target image feature information corresponding to each service scene type.

The specific manner in which the individual modules and units perform the operations in relation to the apparatus of the above embodiments has been described in detail in relation to the embodiments of the method and will not be described in detail here.

Fig. 7 is a block diagram illustrating an electronic device for classifying images to be classified, which may be a server, and an internal structure diagram thereof may be as shown in fig. 7, according to an exemplary embodiment. The electronic device includes a processor, a memory, and a network interface connected by a system bus. Wherein the processor of the electronic device is configured to provide computing and control capabilities. The memory of the electronic device includes a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The network interface of the electronic device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement an image classification method.

Fig. 8 is a block diagram illustrating another electronic device for classifying images to be classified, which may be a terminal, according to an exemplary embodiment, and an internal structure diagram thereof may be as shown in fig. 8. The electronic device includes a processor, a memory, a network interface, a display screen, and an input device connected by a system bus. Wherein the processor of the electronic device is configured to provide computing and control capabilities. The memory of the electronic device includes a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The network interface of the electronic device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement an image classification method. The display screen of the electronic equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the electronic equipment can be a touch layer covered on the display screen, can also be keys, a track ball or a touch pad arranged on the shell of the electronic equipment, and can also be an external keyboard, a touch pad or a mouse and the like.

It will be appreciated by those skilled in the art that the structures shown in fig. 7 or 8 are merely block diagrams of portions of structures related to the disclosed aspects and do not constitute limitations of the electronic devices to which the disclosed aspects may be applied, and that a particular electronic device may include more or fewer components than shown, or may combine certain components, or have a different arrangement of components.

In an exemplary embodiment, there is also provided an electronic device including: a processor; a memory for storing the processor-executable instructions; wherein the processor is configured to execute the instructions to implement the image classification method as in the embodiments of the present disclosure.

In an exemplary embodiment, a computer readable storage medium is also provided, which when executed by a processor of an electronic device, enables the electronic device to perform the image classification method in the embodiments of the disclosure.

In an exemplary embodiment, a computer program product containing instructions is also provided, which when run on a computer, cause the computer to perform the image classification method in the embodiments of the present disclosure.

Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in the various embodiments provided herein may include non-volatile and/or volatile memory. The nonvolatile memory can include Read Only Memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double Data Rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous Link DRAM (SLDRAM), memory bus direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), among others.

It will be appreciated that in the specific embodiments of the present application, related data such as user information is referred to, and when the above embodiments of the present application are applied to specific products or technologies, user permissions or consents need to be obtained, and the collection, use and processing of related data need to comply with related laws and regulations and standards of related countries and regions.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any adaptations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It is to be understood that the present disclosure is not limited to the precise arrangements and instrumentalities shown in the drawings, and that various modifications and changes may be effected without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims

1. A method of classifying images, the method comprising:

Acquiring an image to be classified;

determining target image category information corresponding to the image to be classified based on the target image feature information corresponding to each service scene type and the target classification sub-model corresponding to each service scene type;

wherein the target image classification model is obtained by the following steps:

acquiring a first sample image set;

inputting the first sample image set into a sample extraction sub-model corresponding to each of the plurality of service scene types in a preset machine learning model to perform feature extraction processing, so as to obtain sample adaptation feature information corresponding to each service scene type;

Determining mutual exclusion loss information corresponding to a target scene type based on sample adaptation characteristic information corresponding to the target scene type; the target scene type pair is any two of the plurality of business scene types;

inputting the second sample image set to sample extraction sub-models corresponding to the multiple service scene types in the preset machine learning model to perform feature extraction processing to obtain fifth sample feature information corresponding to each service scene type;

inputting the fifth sample characteristic information corresponding to each service scene type into a sample classification sub-model corresponding to each service scene type for classification processing to obtain seventh category prediction information corresponding to each service scene type;

determining service loss information corresponding to each service scene type based on the seventh category prediction information corresponding to each service scene type and the second tag category information;

training a corresponding sample scene adaptation module of the target scene type based on mutual exclusion loss information corresponding to the target scene type to obtain the target image classification model; training the corresponding sample scene adaptation module based on the mutual exclusion loss information corresponding to the target scene type to obtain the target image classification model, wherein the training comprises the following steps:

Training a corresponding sample scene adaptation module of the target scene type based on mutual exclusion loss information corresponding to the target scene type, and training a sample basic module corresponding to each service scene type in the preset machine learning model based on service loss information corresponding to each service scene type to obtain the target image classification model; the sample basic module corresponding to each service scene type is a module except the sample scene adaptation module corresponding to each service scene type in the sample extraction sub-model corresponding to each service scene type.

2. The method of claim 1, wherein the sample extraction sub-model corresponding to each business scenario type comprises a sample encoding module and a sample feature extraction module, the sample feature extraction module comprising a first attention module, a first non-linear module, a sample scenario adaptation module, and a first fusion module; the step of inputting the first sample image set to a sample extraction sub-model corresponding to each of the plurality of service scene types in a preset machine learning model to perform feature extraction processing, to obtain sample adaptation feature information corresponding to each service scene type, includes:

inputting the sample image coding information corresponding to each service scene type into a first attention module corresponding to each service scene type for attention weighting processing to obtain first weighting characteristic information corresponding to each service scene type;

inputting the first weighted feature information corresponding to each service scene type and the sample image coding information corresponding to each service scene type into a first nonlinear module corresponding to each service scene type to perform nonlinear transformation processing to obtain the first sample feature information corresponding to each service scene type;

3. The method according to claim 2, wherein the method further comprises:

acquiring first label category information corresponding to each sample image in the first sample image set;

inputting the second sample characteristic information corresponding to each service scene type into a sample classification sub-model corresponding to each service scene type in the preset machine learning model for classification processing to obtain third category prediction information corresponding to each service scene type;

inputting the first sample image set to a sample weight learning module in the preset machine learning model for weight analysis processing to obtain sample weight information corresponding to each sample image in the first sample image set;

Based on the sample weight information, carrying out fusion processing on the third category prediction information corresponding to each of the plurality of service scene types to obtain fourth category prediction information corresponding to each sample image in the first sample image set;

training the corresponding sample scene adaptation module based on the mutual exclusion loss information corresponding to the target scene type to obtain the target image classification model, wherein the training comprises the following steps:

training a corresponding sample scene adaptation module of the target scene type based on mutual exclusion loss information corresponding to the target scene type, and training the sample weight learning module based on the first weight loss information to obtain the target image classification model.

4. The method of claim 1, wherein the sample extraction sub-model corresponding to each traffic scene type comprises a sample encoding module and a sample feature extraction module, the sample feature extraction module comprising a second attention module, a second non-linear module, a sample scene adaptation module, and a second fusion module, the sample scene adaptation module comprising a first sample adaptation module and a second sample adaptation module; the step of inputting the first sample image set to a sample extraction sub-model corresponding to each of the plurality of service scene types in a preset machine learning model to perform feature extraction processing, to obtain sample adaptation feature information corresponding to each service scene type, includes:

inputting the sample image coding information corresponding to each service scene type to a second attention module corresponding to each service scene type for attention weighting processing to obtain second weighting characteristic information corresponding to each service scene type;

inputting the second weighted feature information corresponding to each service scene type into a first sample adaptation module corresponding to each service scene type for feature adaptation processing to obtain second adaptation feature information corresponding to each service scene type;

inputting the second adaptive feature information corresponding to each service scene type and the sample image coding information corresponding to each service scene type into a second nonlinear module corresponding to each service scene type to perform nonlinear transformation processing to obtain third sample feature information corresponding to each service scene type;

inputting the third sample characteristic information corresponding to each service scene type to a second sample adaptation module corresponding to each service scene type for characteristic adaptation processing to obtain third adaptation characteristic information corresponding to each service scene type;

And inputting the third adapting feature information corresponding to each service scene type, the second adapting feature information corresponding to each service scene type and the sample image coding information corresponding to each service scene type into a second fusion module corresponding to each service scene type to perform feature fusion processing to obtain fourth sample feature information corresponding to each service scene type.

5. The method according to claim 4, wherein the method further comprises:

inputting the fourth sample characteristic information corresponding to each service scene type into a sample classification sub-model corresponding to each service scene type in the preset machine learning model for classification processing to obtain fifth category prediction information corresponding to each service scene type;

based on the sample weight information, carrying out fusion processing on the fifth category prediction information corresponding to each of the plurality of service scene types to obtain sixth category prediction information corresponding to each sample image in the first sample image set;

training a corresponding sample scene adaptation module of the target scene type based on mutual exclusion loss information corresponding to the target scene type, and training the sample weight learning module based on the second weight loss information to obtain the target image classification model.

6. The method according to claim 1, wherein the determining the target image class information corresponding to the image to be classified based on the target image feature information corresponding to each service scene type and the target classification sub-model corresponding to each service scene type includes:

7. The method of claim 6, wherein the method further comprises:

inputting the image to be classified into a target weight learning module in the target image classification model to perform weight analysis processing to obtain target weight information;

the determining, based on the first category prediction information corresponding to each of the plurality of service scene types, the target image category information corresponding to the image to be classified includes:

based on the target weight information, carrying out fusion processing on the first-class prediction information corresponding to each of the plurality of service scene types to obtain second-class prediction information corresponding to the image to be classified;

and determining the target image category information based on the second category prediction information.

8. The method according to claim 1, wherein the step of inputting the image to be classified into the target extraction sub-model corresponding to each of the plurality of service scene types in the target image classification model to perform feature extraction processing to obtain target image feature information corresponding to each service scene type includes:

and inputting the target image coding information corresponding to each service scene type into a target feature extraction module in a target extraction sub-model corresponding to each service scene type to perform extraction processing, so as to obtain target image feature information corresponding to each service scene type.

9. The method of claim 8, wherein the target feature extraction module corresponding to each business scenario type comprises a third attention module, a third nonlinear module, a target scenario adaptation module, and a third fusion module; the step of inputting the target image coding information corresponding to each service scene type into a target feature extraction module in a target extraction sub-model corresponding to each service scene type to perform extraction processing, to obtain target image feature information corresponding to each service scene type, includes:

inputting the target image coding information corresponding to each service scene type into a third attention module corresponding to each service scene type for attention weighting processing to obtain third weighted characteristic information corresponding to each service scene type;

Inputting the target image coding information corresponding to each service scene type and the third weighted feature information corresponding to each service scene type into a third nonlinear module corresponding to each service scene type to perform nonlinear transformation processing to obtain first target feature information corresponding to each service scene type;

inputting the target image coding information corresponding to each service scene type and the third weighted feature information corresponding to each service scene type into a target scene adaptation module corresponding to each service scene type to perform feature adaptation processing to obtain fourth adaptation feature information corresponding to each service scene type;

10. The method of claim 8, wherein the target feature extraction module corresponding to each business scenario type comprises a fourth attention module, a fourth nonlinear module, a target scenario adaptation module, and a fourth fusion module, and wherein the target scenario adaptation module corresponding to each business scenario type comprises a first target adaptation module and a second target adaptation module; the step of inputting the target image coding information corresponding to each service scene type into a target feature extraction module in a target extraction sub-model corresponding to each service scene type to perform extraction processing, to obtain target image feature information corresponding to each service scene type, includes:

inputting the target image coding information corresponding to each service scene type into a fourth attention module corresponding to each service scene type for attention weighting processing to obtain fourth weighting characteristic information corresponding to each service scene type;

inputting the fourth weighted feature information corresponding to each service scene type into a first target adaptation module corresponding to each service scene type for feature adaptation processing to obtain fifth adaptation feature information corresponding to each service scene type;

inputting the second target characteristic information corresponding to each service scene type into a second target adaptation module corresponding to each service scene type to perform characteristic adaptation processing to obtain sixth adaptation characteristic information corresponding to each service scene type;

and inputting the sixth adapting characteristic information corresponding to each service scene type, the fifth adapting characteristic information corresponding to each service scene type and the target image coding information corresponding to each service scene type into a fourth fusion module corresponding to each service scene type for characteristic fusion processing to obtain the target image characteristic information corresponding to each service scene type.

11. An image classification apparatus, the apparatus comprising:

the image acquisition module is used for acquiring images to be classified;

the image category determining module is used for determining target image category information corresponding to the image to be classified based on the target image characteristic information corresponding to each service scene type and the target classification sub-model corresponding to each service scene type;

the first sample acquisition module is used for acquiring a first sample image set;

the second feature extraction module is used for inputting the first sample image set into a sample extraction sub-model corresponding to each of the plurality of service scene types in a preset machine learning model to perform feature extraction processing to obtain sample adaptation feature information corresponding to each service scene type;

The first loss determination module is used for determining mutual exclusion loss information corresponding to the target scene type based on sample adaptation characteristic information corresponding to the target scene type; the target scene type pair is any two of the plurality of business scene types;

the third tag acquisition module is used for acquiring a second sample image set and second tag category information corresponding to each sample image in the second sample image set;

the third feature extraction module is used for inputting the second sample image set into sample extraction sub-models corresponding to the multiple service scene types in the preset machine learning model to perform feature extraction processing to obtain fifth sample feature information corresponding to each service scene type;

the third classification module is used for inputting the fifth sample characteristic information corresponding to each service scene type into the sample classification sub-model corresponding to each service scene type for classification processing to obtain seventh category prediction information corresponding to each service scene type;

a fourth loss determining module, configured to determine service loss information corresponding to each service scene type based on the seventh category prediction information corresponding to each service scene type and the second tag category information;

The first training module is used for training the corresponding sample scene adaptation module of the target scene type based on the mutual exclusion loss information corresponding to the target scene type to obtain the target image classification model;

wherein, the first training module includes:

the fourth training module is used for training the sample scene adaptation module corresponding to the target scene type based on the mutual exclusion loss information corresponding to the target scene type, and training the sample base module corresponding to each service scene type in the preset machine learning model based on the service loss information corresponding to each service scene type to obtain the target image classification model; the sample basic module corresponding to each service scene type is a module except the sample scene adaptation module corresponding to each service scene type in the sample extraction sub-model corresponding to each service scene type.

12. The apparatus of claim 11, wherein the sample extraction sub-model for each traffic scene type comprises a sample encoding module and a sample feature extraction module, the sample feature extraction module comprising a first attention module, a first non-linear module, a sample scene adaptation module, and a first fusion module; the second feature extraction module includes:

The first coding module is used for inputting the first sample image set to the sample coding module corresponding to each service scene type for coding processing to obtain sample image coding information corresponding to each service scene type;

the first weighting processing module is used for inputting the sample image coding information corresponding to each service scene type into the first attention module corresponding to each service scene type to carry out attention weighting processing to obtain first weighting characteristic information corresponding to each service scene type;

the first transformation module is used for inputting the first weighted characteristic information corresponding to each service scene type and the sample image coding information corresponding to each service scene type into the first nonlinear module corresponding to each service scene type to perform nonlinear transformation processing to obtain the first sample characteristic information corresponding to each service scene type;

the first adaptation module is used for inputting the first weighted feature information corresponding to each service scene type and the sample image coding information corresponding to each service scene type into the sample scene adaptation module corresponding to each service scene type for feature adaptation processing to obtain the first adaptation feature information corresponding to each service scene type;

The first feature fusion module is used for inputting the first sample feature information corresponding to each service scene type, the first adapting feature information corresponding to each service scene type, the first weighting feature information corresponding to each service scene type and the sample image coding information corresponding to each service scene type into the first fusion module corresponding to each service scene type for feature fusion processing to obtain the second sample feature information corresponding to each service scene type.

13. The apparatus of claim 12, wherein the apparatus further comprises:

the first tag acquisition module is used for acquiring first tag class information corresponding to each sample image in the first sample image set;

the first classification module is used for inputting the second sample characteristic information corresponding to each service scene type into a sample classification sub-model corresponding to each service scene type in the preset machine learning model to carry out classification processing to obtain third category prediction information corresponding to each service scene type;

the first weight analysis module is used for inputting the first sample image set into a sample weight learning module in the preset machine learning model to perform weight analysis processing to obtain sample weight information corresponding to each sample image in the first sample image set;

The first fusion processing module is used for carrying out fusion processing on the third category prediction information corresponding to each of the plurality of service scene types based on the sample weight information to obtain fourth category prediction information corresponding to each sample image in the first sample image set;

a second loss determination module configured to determine first weight loss information based on the fourth category prediction information and the first tag category information;

the first training module includes:

the second training module is used for training the sample scene adaptation module corresponding to the target scene type based on the mutual exclusion loss information corresponding to the target scene type, and training the sample weight learning module based on the first weight loss information to obtain the target image classification model.

14. The apparatus of claim 11, wherein the sample extraction sub-model for each traffic scene type comprises a sample encoding module and a sample feature extraction module, the sample feature extraction module comprising a second attention module, a second non-linear module, a sample scene adaptation module, and a second fusion module, the sample scene adaptation module comprising a first sample adaptation module and a second sample adaptation module; the second feature extraction module includes:

The second coding module is used for inputting the first sample image set to the sample coding module corresponding to each service scene type for coding processing to obtain sample image coding information corresponding to each service scene type;

the second weighting processing module is used for inputting the sample image coding information corresponding to each service scene type to the second attention module corresponding to each service scene type for carrying out attention weighting processing to obtain second weighting characteristic information corresponding to each service scene type;

the second adapting module is used for inputting the second weighted characteristic information corresponding to each service scene type into the first sample adapting module corresponding to each service scene type to perform characteristic adapting processing to obtain the second adapting characteristic information corresponding to each service scene type;

the second transformation module is used for inputting the second adapting characteristic information corresponding to each service scene type and the sample image coding information corresponding to each service scene type into the second nonlinear module corresponding to each service scene type to perform nonlinear transformation processing to obtain third sample characteristic information corresponding to each service scene type;

The third adaptation module is used for inputting the third sample characteristic information corresponding to each service scene type into the second sample adaptation module corresponding to each service scene type to perform characteristic adaptation processing to obtain the third adaptation characteristic information corresponding to each service scene type;

and the second feature fusion module is used for inputting the third adapting feature information corresponding to each service scene type, the second adapting feature information corresponding to each service scene type and the sample image coding information corresponding to each service scene type into the second fusion module corresponding to each service scene type for feature fusion processing to obtain fourth sample feature information corresponding to each service scene type.

15. The apparatus of claim 14, wherein the apparatus further comprises:

the second tag acquisition module is used for acquiring first tag class information corresponding to each sample image in the first sample image set;

the second classification module is used for inputting the fourth sample characteristic information corresponding to each service scene type into a sample classification sub-model corresponding to each service scene type in the preset machine learning model to carry out classification processing to obtain fifth category prediction information corresponding to each service scene type;

The second weight analysis module is used for inputting the first sample image set into the sample weight learning module in the preset machine learning model to perform weight analysis processing to obtain sample weight information corresponding to each sample image in the first sample image set;

the second fusion processing module is used for carrying out fusion processing on the fifth category prediction information corresponding to each of the plurality of service scene types based on the sample weight information to obtain sixth category prediction information corresponding to each sample image in the first sample image set;

a third loss determination module configured to determine second weight loss information based on the sixth category prediction information and the first tag category information;

the first training module includes:

and the third training module is used for training the sample scene adaptation module corresponding to the target scene type based on the mutual exclusion loss information corresponding to the target scene type, and training the sample weight learning module based on the second weight loss information to obtain the target image classification model.

16. The apparatus of claim 11, wherein the image category determination module comprises:

The fourth classification module is used for inputting the target image characteristic information corresponding to each service scene type into the target classification sub-model corresponding to each service scene type for classification processing to obtain first class prediction information corresponding to each service scene type;

and the first class determining module is used for determining target image class information corresponding to the image to be classified based on the first class prediction information corresponding to each of the plurality of service scene types.

17. The apparatus of claim 16, wherein the apparatus further comprises:

the third weight analysis module is used for inputting the image to be classified into the target weight learning module in the target image classification model to perform weight analysis processing to obtain target weight information;

the first class determination module includes:

the third fusion processing module is used for carrying out fusion processing on the first type prediction information corresponding to each of the plurality of service scene types based on the target weight information to obtain second type prediction information corresponding to the image to be classified;

and the second class determining module is used for determining the target image class information based on the second class prediction information.

18. The apparatus of claim 11, wherein the first feature extraction module comprises:

the third coding module is used for inputting the image to be classified into the target coding module in the target extraction sub-model corresponding to each service scene type to carry out coding processing to obtain target image coding information corresponding to each service scene type;

and the fourth feature extraction module is used for inputting the target image coding information corresponding to each service scene type into the target feature extraction module in the target extraction sub-model corresponding to each service scene type to perform extraction processing so as to obtain the target image feature information corresponding to each service scene type.

19. The apparatus of claim 18, wherein the target feature extraction module corresponding to each traffic scene type comprises a third attention module, a third nonlinear module, a target scene adaptation module, and a third fusion module; the fourth feature extraction module includes:

the third weighting processing module is used for inputting the target image coding information corresponding to each service scene type into the third attention module corresponding to each service scene type to carry out attention weighting processing to obtain third weighting characteristic information corresponding to each service scene type;

The third transformation module is used for inputting the target image coding information corresponding to each service scene type and the third weighted characteristic information corresponding to each service scene type into the third nonlinear module corresponding to each service scene type to perform nonlinear transformation processing to obtain the first target characteristic information corresponding to each service scene type;

the fourth adaptation module is used for inputting the target image coding information corresponding to each service scene type and the third weighted characteristic information corresponding to each service scene type into the target scene adaptation module corresponding to each service scene type for characteristic adaptation processing to obtain fourth adaptation characteristic information corresponding to each service scene type;

and the third feature fusion module is used for inputting the fourth adapting feature information corresponding to each service scene type, the first target feature information corresponding to each service scene type, the target image coding information corresponding to each service scene type and the third weighting feature information corresponding to each service scene type into the third fusion module corresponding to each service scene type for feature fusion processing to obtain the target image feature information corresponding to each service scene type.

20. The apparatus of claim 18, wherein the target feature extraction module for each business scenario type comprises a fourth attention module, a fourth non-linear module, a target scenario adaptation module, and a fourth fusion module, and wherein the target scenario adaptation module for each business scenario type comprises a first target adaptation module and a second target adaptation module; the fourth feature extraction module includes:

the fourth weighting processing module is used for inputting the target image coding information corresponding to each service scene type into the fourth attention module corresponding to each service scene type to carry out attention weighting processing to obtain fourth weighting characteristic information corresponding to each service scene type;

the fifth adapting module is used for inputting the fourth weighted characteristic information corresponding to each service scene type into the first target adapting module corresponding to each service scene type to perform characteristic adapting processing to obtain fifth adapting characteristic information corresponding to each service scene type;

the fourth transformation module is used for inputting the fifth adapting characteristic information corresponding to each service scene type and the target image coding information corresponding to each service scene type into the fourth nonlinear module corresponding to each service scene type to perform nonlinear transformation processing to obtain second target characteristic information corresponding to each service scene type;

The sixth adaptation module is used for inputting the second target feature information corresponding to each service scene type into the second target adaptation module corresponding to each service scene type to perform feature adaptation processing to obtain sixth adaptation feature information corresponding to each service scene type;

and the fourth feature fusion module is used for inputting the sixth adapting feature information corresponding to each service scene type, the fifth adapting feature information corresponding to each service scene type and the target image coding information corresponding to each service scene type into the fourth fusion module corresponding to each service scene type for feature fusion processing to obtain the target image feature information corresponding to each service scene type.

21. An electronic device, comprising:

a processor;

a memory for storing processor-executable instructions;

wherein the processor is configured to execute the executable instructions to implement the image classification method of any one of claims 1 to 10.

22. A non-transitory computer readable storage medium having stored thereon computer program instructions, which when executed by a processor, implement the image classification method of any of claims 1 to 10.