CN116258911A

CN116258911A - Training method, device, equipment and storage medium for image classification model

Info

Publication number: CN116258911A
Application number: CN202310266453.8A
Authority: CN
Inventors: 杨志雄; 雷鑫华; 杨延展; 李永会
Original assignee: Douyin Vision Co Ltd
Current assignee: Douyin Vision Co Ltd
Priority date: 2023-03-13
Filing date: 2023-03-13
Publication date: 2023-06-13

Abstract

The embodiment of the disclosure provides a training method, device and equipment for an image classification model and a storage medium. Inputting the current sample graph into an image classification model, and outputting a similarity set; determining probability distribution information according to the similarity set; determining a label information set corresponding to the current sample graph; training the first image classification model based on the label information set, the first probability distribution information and the second probability distribution information, and training the second image classification model based on the trained first image classification model. According to the training method for the image classification model, the first image classification model and the second image classification model are subjected to contrast learning training based on the label information set, the first probability distribution information and the second probability distribution information, so that the long tail problem of the sample set can be solved, and the recognition accuracy of the trained image classification model is improved.

Description

Training method, device, equipment and storage medium for image classification model

Technical Field

The embodiment of the disclosure relates to the technical field of image processing, in particular to a training method, device and equipment of an image classification model and a storage medium.

Background

Training of neural network models requires taking a large number of samples. For example: for an image classification model, various image samples of different types need to be acquired, and when the number of sample sets of different types is unbalanced, the trained model can be inaccurate in identifying the image types.

Disclosure of Invention

The embodiment of the disclosure provides a training method, device and equipment for an image classification model and a storage medium, which can overcome the long tail problem of a sample set, thereby improving the recognition accuracy of the image classification model.

In a first aspect, an embodiment of the present disclosure provides a training method for an image classification model, including:

inputting the current sample graph into an image classification model, and outputting a similarity set; the image classification model is a first image classification model or a second image classification model, and the first image classification model and the second image classification model have the same structure and different internal parameters; the similarity set consists of the similarity between the current sample graph and the set image category and the similarity between the current sample graph and the historical sample graph, and is a first similarity set or a second similarity set; the first similarity set is output by the first image classification model, and the second similarity set is output by the second image classification model;

Determining probability distribution information according to the similarity set; the probability distribution information is first probability distribution information or second probability distribution information; the first probability distribution information is determined according to the first similarity set, and the second probability distribution information is determined according to the second similarity set;

determining a label information set corresponding to the current sample graph; the tag set is determined by the category of the current sample image, the category of the historical sample image and the set image category;

training the first image classification model based on the label information set, the first probability distribution information and the second probability distribution information, and training the second image classification model based on the trained first image classification model.

In a second aspect, an embodiment of the present disclosure further provides a training apparatus for an image classification model, including:

the similarity set acquisition module is used for inputting the current sample graph into the image classification model and outputting a similarity set; the image classification model is a first image classification model or a second image classification model, and the first image classification model and the second image classification model have the same structure and different internal parameters; the similarity set consists of the similarity between the current sample graph and the set image category and the similarity between the current sample graph and the historical sample graph, and is a first similarity set or a second similarity set; the first similarity set is output by the first image classification model, and the second similarity set is output by the second image classification model;

The probability distribution information determining module is used for determining probability distribution information according to the similarity set; the probability distribution information is first probability distribution information or second probability distribution information; the first probability distribution information is determined according to the first similarity set, and the second probability distribution information is determined according to the second similarity set;

the tag information set determining module is used for determining a tag information set corresponding to the current sample graph; the tag set is determined by the category of the current sample image, the category of the historical sample image and the set image category;

the model training module is used for training the first image classification model based on the label information set, the first probability distribution information and the second probability distribution information, and training the second image classification model based on the trained first image classification model. :

in a third aspect, embodiments of the present disclosure further provide an electronic device, including:

one or more processors;

storage means for storing one or more programs,

the one or more programs, when executed by the one or more processors, cause the one or more processors to implement a method of training an image classification model as described in embodiments of the present disclosure.

In a fourth aspect, the disclosed embodiments also provide a storage medium containing computer-executable instructions, which when executed by a computer processor, are for performing a training method of an image classification model as described in the disclosed embodiments.

The embodiment of the disclosure discloses a training method, device, equipment and storage medium of an image classification model. Inputting the current sample graph into an image classification model, and outputting a similarity set; determining probability distribution information according to the similarity set; the probability distribution information is first probability distribution information or second probability distribution information; determining a label information set corresponding to the current sample graph; the tag set is determined by the category of the current sample image, the category of the historical sample image and the category of the set image; training the first image classification model based on the label information set, the first probability distribution information and the second probability distribution information, and training the second image classification model based on the trained first image classification model. According to the training method for the image classification model, the first image classification model and the second image classification model are subjected to contrast learning training based on the label information set, the first probability distribution information and the second probability distribution information, so that the long tail problem of the sample set can be solved, and the recognition accuracy of the trained image classification model is improved.

Drawings

The above and other features, advantages, and aspects of embodiments of the present disclosure will become more apparent by reference to the following detailed description when taken in conjunction with the accompanying drawings. The same or similar reference numbers will be used throughout the drawings to refer to the same or like elements. It should be understood that the figures are schematic and that elements and components are not necessarily drawn to scale.

FIG. 1 is a schematic flow chart of a training method of an image classification model according to an embodiment of the disclosure;

FIG. 2 is an exemplary diagram of an image classification model to be trained provided by an embodiment of the present disclosure;

FIG. 3 is a schematic diagram of a training device for an image classification model according to an embodiment of the disclosure;

fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the disclosure.

Detailed Description

Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present disclosure have been shown in the accompanying drawings, it is to be understood that the present disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein, but are provided to provide a more thorough and complete understanding of the present disclosure. It should be understood that the drawings and embodiments of the present disclosure are for illustration purposes only and are not intended to limit the scope of the present disclosure.

It should be understood that the various steps recited in the method embodiments of the present disclosure may be performed in a different order and/or performed in parallel. Furthermore, method embodiments may include additional steps and/or omit performing the illustrated steps. The scope of the present disclosure is not limited in this respect.

The term "including" and variations thereof as used herein are intended to be open-ended, i.e., including, but not limited to. The term "based on" is based at least in part on. The term "one embodiment" means "at least one embodiment"; the term "another embodiment" means "at least one additional embodiment"; the term "some embodiments" means "at least some embodiments. Related definitions of other terms will be given in the description below.

It should be noted that the terms "first," "second," and the like in this disclosure are merely used to distinguish between different devices, modules, or units and are not used to define an order or interdependence of functions performed by the devices, modules, or units.

It should be noted that references to "one", "a plurality" and "a plurality" in this disclosure are intended to be illustrative rather than limiting, and those of ordinary skill in the art will appreciate that "one or more" is intended to be understood as "one or more" unless the context clearly indicates otherwise.

The names of messages or information interacted between the various devices in the embodiments of the present disclosure are for illustrative purposes only and are not intended to limit the scope of such messages or information.

It will be appreciated that prior to using the technical solutions disclosed in the embodiments of the present disclosure, the user should be informed and authorized of the type, usage range, usage scenario, etc. of the personal information related to the present disclosure in an appropriate manner according to the relevant legal regulations.

For example, in response to receiving an active request from a user, a prompt is sent to the user to explicitly prompt the user that the operation it is requesting to perform will require personal information to be obtained and used with the user. Thus, the user can autonomously select whether to provide personal information to software or hardware such as an electronic device, an application program, a server or a storage medium for executing the operation of the technical scheme of the present disclosure according to the prompt information.

As an alternative but non-limiting implementation, in response to receiving an active request from a user, the manner in which the prompt information is sent to the user may be, for example, a popup, in which the prompt information may be presented in a text manner. In addition, a selection control for the user to select to provide personal information to the electronic device in a 'consent' or 'disagreement' manner can be carried in the popup window.

It will be appreciated that the above-described notification and user authorization process is merely illustrative and not limiting of the implementations of the present disclosure, and that other ways of satisfying relevant legal regulations may be applied to the implementations of the present disclosure.

It will be appreciated that the data (including but not limited to the data itself, the acquisition or use of the data) involved in the present technical solution should comply with the corresponding legal regulations and the requirements of the relevant regulations.

Fig. 1 is a flow chart of a training method for an image classification model according to an embodiment of the present disclosure, where the embodiment of the present disclosure is applicable to a case of training an image classification model, the method may be performed by a training device for an image classification model, where the device may be implemented in a form of software and/or hardware, and optionally, may be implemented by an electronic device, where the electronic device may be a mobile terminal, a PC side, a server, or the like.

As shown in fig. 1, the method includes:

s110, inputting the current sample graph into an image classification model, and outputting a similarity set.

The image classification model is a first image classification model or a second image classification model; the similarity set consists of the similarity between the current sample image and the set image category and the similarity between the current sample image and the historical sample image, and is a first similarity set or a second similarity set. In this embodiment, the first image classification model and the second image classification model have the same structure but different parameters, the first similarity set is output by the first image classification model, and the second similarity set is output by the second image classification model. The historical sample map may be understood as a sample map that has been input to the image classification model, with both the current sample map and the historical sample map belonging to the sample set of maps. The set image category may be a category preset according to the identification requirement, for example: assuming that the animal class in the image needs to be identified, the image class may include: cat, puppy, pig, sheep, calf and the like; assuming that the flower category in the image needs to be identified, the image category may include: rose, carnation, sunflower, daisy, magnolia and the like. Accordingly, a sample atlas containing these animals or plants needs to be acquired to train the image classification model so that the image classification model has the function of identifying these animal or plant categories.

Fig. 2 is an exemplary diagram of an image classification model to be trained in the present embodiment, and as shown in fig. 2, each of the first image classification model and the second image classification model includes a coding sub-network, a classification sub-network, a full-connection sub-network, a fusion sub-network, and a stitching sub-network. In the initial state, parameters in the first image classification model and the second image classification model are the same, and along with the progress of training, the parameters in the first image classification model and the second image classification model gradually generate differences due to different training modes.

Specifically, the process of inputting the current sample graph into the image classification model and outputting the similarity set may be: inputting the current sample graph into a coding sub-network, and outputting first image characteristic information; inputting the first image feature information into a classification sub-network, and outputting a class feature similarity subset; inputting the first image characteristic information into a fully-connected sub-network, and outputting second image characteristic information; inputting the second image characteristic information and the characteristic information of the historical sample graph into a fusion sub-network, and outputting a subset of the characteristic similarity of the historical sample; and inputting the category characteristic similarity subset and the historical sample characteristic similarity subset into a spliced sub-network, and outputting a similarity set.

The class feature similarity subset consists of the similarity between the current sample graph and the set image class. The characteristic similarity subset of the historical sample consists of the similarity between the current sample graph and the historical sample graph.

The first image feature information may be represented by a feature vector f, and the encoding sub-network may be understood as an encoder (encoder) for performing feature extraction on the input sample graph and outputting the first image feature information. The classifying sub-network is understood to be a classifier, which is formed by at least one linear layer, and is used for performing linear processing on the first image characteristic information to obtain the similarity between the sample graph and each set image category. In this embodiment, the parameters in the classifying sub-network are composed of feature vectors representing the set image categories, and the classifying sub-network performs linear fusion on the feature vectors of the set image categories and the first image feature information to obtain the similarity between the sample image and the set image categories, so as to output a category feature similarity subset. Assuming M set image categories, c _t If the feature vector of the t-th image class is represented, the similarity between the sample graph and the t-th set image class can be expressed as: f.c _t . The fully connected subnetwork can be understood as a multi-layer perceptron (multi-layer Pe)rceptron, MLP), the fully connected subnetwork comprises at least two linear layers for linear transformation of the input first image feature information to obtain second image feature information, the second image feature information being represented by feature vector z.

The characteristic information of the historical sample graph can be understood as the characteristic information output by the MLP when the input image classification model of the historical sample graph is thick, the characteristic information of the historical sample graph is stored in a form of a queue (following the first-in first-out principle), and when the queue is full, the characteristic information of the sample graph which is newly obtained can squeeze out the characteristic information of the earliest sample graph in the queue. And the fusion sub-network is used for carrying out vector inner product calculation on the input second image characteristic information and the history sample graph characteristic information and outputting a history characteristic similarity subset. Let z be _k And representing a vector corresponding to the characteristic information of the kth historical sample graph in the queue, wherein the similarity between the sample graph and the kth historical sample graph in the queue is expressed as follows: z.z _k . The splicing sub-network is used for splicing the two input subsets. Assuming that the class feature similarity subset has M elements and the history sample feature similarity subset has N elements, the spliced similarity set contains m+n elements and can be expressed as s= { s ₁ ,s ₂ ,…s _M+N }＝{f·c ₁ ,f·c ₂ ,…,f·c _M ,z·z ₁ ,z·z ₂ ,…z·z _N }。

S120, determining probability distribution information according to the similarity set.

The probability distribution information is first probability distribution information or second probability distribution information. The first probability distribution information is determined by a first set of similarities and the second probability distribution information is determined by a second set of similarities. The probability distribution information is used to represent probability information for each degree of similarity in the set of degrees of similarity.

Specifically, the manner of determining probability distribution information according to the similarity set may be: carrying out index operation on the similarity set to obtain an index similarity set; and determining the proportion of each index similarity in the index similarity sets, and determining the proportion as probability information of each index similarity.

Wherein, similarity is collectedThe row exponent operation can be understood as: and performing an exponential operation based on e on each similarity in the similarity set, thereby obtaining an exponential similarity set. In this embodiment, the manner of performing the exponential operation on the similarity set may be: first, the ratio between the similarity and the set coefficient is determined, and then the ratio is subjected to exponential operation. Can be expressed as: exp(s) = { exp(s) ₁ /τ),exp(s ₂ /τ),…exp(s _M+N τ), where τ is a set coefficient. The process of determining the proportion of the index similarity to the index similarity set may be: and calculating the accumulated results of all the index similarities, and then respectively multiplying each index similarity by the accumulated results to obtain the proportion of the index similarity in the index similarity set, namely the probability information of each index similarity. The probability distribution information can be expressed as:

Wherein s is _i The i-th similarity in the similarity set.

Optionally, after obtaining the index similarity set, the method further includes: determining the number proportion of each set image category in the sample set; updating the category characteristic similarity subset in the index similarity set according to the quantity occupation ratio to obtain an updated index similarity set; correspondingly, the process of determining the proportion of each index similarity in the index similarity sets to the probability information of each index similarity by determining the proportion as the proportion of each index similarity in the index similarity sets may be: and determining the proportion of each index similarity in the updated index similarity set in the index similarity set, and determining the proportion as probability information of each index similarity.

Wherein, the number ratio of each set image category in the sample set is determined as follows: and calculating the ratio of the number of images contained in each set image category in the sample set to the number of the sample lumped images. The number of image categories at t is expressed as: q _t . The method for updating the category characteristic similarity subset in the index similarity set according to the quantity ratio may be as follows: fusing the quantity proportion with the corresponding similarity in the category characteristic similarity subset (can be multiplied) to obtain And (5) updating the index similarity set. Assume that: the first M elements in the exponential similarity belong to the category feature similarity subset, and the updated exponential similarity set may be expressed as: { exp(s) ₁ /τ)·q ₁ ,…,exp(s ₂ /τ)·q _M ,exp(s _M+1 /τ),…,exp(s _M+N /τ)}。

In this embodiment, the manner of determining the proportion of each index similarity in the updated index similarity set in the index similarity set may be: and calculating the accumulated results of all the updated index similarities, and then respectively multiplying each updated index similarity by the accumulated results to obtain the proportion of the index similarities in the index similarity set, namely the probability information of each index similarity. The updated probability distribution information can be expressed as:

in this embodiment, the probability distribution information is updated based on the number of image categories in the sample set to balance the number of image categories in the sample set, thereby alleviating the model learning problem under the long tail condition.

S130, determining a label information set corresponding to the current sample graph.

The label information set is determined by the category of the current sample image, the category of the historical sample image and the set image category. The label information set is used for representing whether the category of the current sample image is the same as the category of the set image and whether the category of the current sample image is the same as the category of the historical sample image.

Specifically, the manner of determining the category label set corresponding to the current sample graph may be: determining a first tag information subset according to the comparison result of the category of the current sample image and the category of the set image; determining a second tag information subset according to a comparison result of the category of the current sample image and the category of the historical sample image; and splicing the first label information subset and the second label information subset to obtain a label information set.

This embodimentIn the step of determining the first tag information subset according to the comparison result of the category of the current sample image and the set image category, the process may be: comparing the category of the current sample image with the categories of the M set images in sequence, if the categories are the same, setting the tag information to be 1, and if the categories are different, setting the tag information to be 0, thereby obtaining a first tag information subset l composed of M tag information ₁ . The manner of determining the second tag information subset according to the comparison result of the category of the current sample image and the category of the history sample image may be: comparing the category of the current sample image with the categories of N historical sample images corresponding to the queue, if the categories are the same, setting the tag information to be 1, and if the categories are different, setting the tag information to be 0, thereby obtaining a second tag information subset l composed of N tag information ₂ . Splicing the first tag information subset and the second tag information subset to obtain M+N tag information sets l, which can be expressed as: l= { l ₁ ,l ₂ }。

Optionally, the manner of determining the second tag information subset according to the comparison result of the category of the current sample image and the category of the history sample image may be: determining an adjustment coefficient according to the number of the historical sample images; adjusting the second tag information subset according to the adjustment coefficient; correspondingly, splicing the first tag information subset and the second tag information subset to obtain a tag information set, including: and splicing the first label information subset and the adjusted second label information subset to obtain a label information set.

Wherein the number of historical sample images may characterize the number of iterations of the image classification model during training. The manner of determining the adjustment coefficient according to the number of the history sample images may be: and carrying out exponential operation on the number of the historical sample images on the set value to obtain an adjustment coefficient. Can be expressed as alpha ⁿ Where α is a set value, and n represents the number of historical sample images, i.e., the number of iterations. The manner of adjusting the second tag information subset according to the adjustment coefficient may be: the adjustment coefficient is fused (multiplication may be performed) with each tag information in the second tag information subset. The adjusted second label The subset of information may be represented as: alpha ⁿ ·l ₂ . The adjusted set of tag information may be expressed as: l= { l ₁ ,α ⁿ ·l ₂ }. In this embodiment, the tag information set is adjusted based on the number of the historical sample images, that is, the tag information set is dynamically adjusted, so that accuracy of subsequent model training can be improved.

And S140, training the first image classification model based on the label information set, the first probability distribution information and the second probability distribution information, and training the second image classification model based on the trained first image classification model.

The training of the first image classification model based on the tag information set, the first probability distribution information and the second probability distribution information may be: and determining loss information based on the label information set, the first probability distribution information and the second probability distribution information, and finally performing reverse gradient parameter adjustment on the first image classification model based on the loss information so as to train the first image classification model.

Specifically, the training the first image classification model based on the tag set, the first probability distribution information and the second probability distribution information may be: linearly superposing the tag information set and the second probability distribution information to obtain intermediate probability distribution information; fusing the intermediate probability distribution information and the first probability distribution information to obtain loss information; and performing inverse gradient parameter adjustment on the first image classification model according to the loss information so as to train the first image classification model.

Wherein, the linear superposition of the label information set and the second probability distribution information can be understood as: firstly, determining a superposition coefficient, and then, respectively carrying out linear superposition on corresponding elements in the label information set and the second probability distribution information based on the superposition coefficient to obtain intermediate probability distribution information. The calculation formula of the intermediate probability distribution information can be expressed as: ω=λ·l+ (1- λ) ·p ₂ Wherein ω represents intermediate probability distribution information, l tag information set, p ₂ Representing second probability distribution information (for calculation see above embodiments).

The method for fusing the intermediate probability distribution information and the first probability distribution information may be: and carrying out inner product calculation on the vector formed by the intermediate probability distribution information and the vector formed by the first probability distribution information to obtain the loss information of the current sample graph.

In this embodiment, the loss information of all the sample graphs in the sample set is averaged to obtain the final loss information. And finally, performing inverse gradient parameter adjustment on the first image classification model according to the final loss information so as to train the first image classification model.

Alternatively, the process of training the second image classification model based on the trained first image classification model may be: acquiring first model parameters in the trained first image classification model and second model parameters of the second image classification model; superposing the first model parameter and the second model parameter to obtain a target model parameter; and updating parameters of the second image classification model according to the target model parameters so as to train the second image classification model.

Wherein the superposition of the first model parameter and the second model parameter can be understood as: firstly, determining a weighting coefficient, and then carrying out weighted summation on the first model parameter and the second model parameter according to the weighting coefficient to obtain the target model parameter. The updating of parameters of the second image classification model based on the target model parameters can be understood as: and replacing the parameters in the second image classification model with the target model parameters. In this embodiment, after the parameter adjustment of the first image classification model and the second image classification model is completed, if the training is not completed, the iterative training of the next round of the adjusted first image classification model and second image classification model is continued until the iteration is terminated.

Optionally, after training the first image classification model based on the tag information set, the first probability distribution information and the second probability distribution information, the method further includes the steps of: deleting the middle-coding full-connection sub-network, the fusion sub-network and the splicing sub-network of the trained first image classification model to obtain a target image classification model; classifying the images to be classified according to the target image classification model.

In this embodiment, the trained target image classification model only retains the coding sub-network and the classification sub-network. And inputting the images to be classified into the coding sub-network to perform feature extraction, outputting feature information, inputting the feature information into the classifying sub-network, and outputting class information. In this embodiment, the deletion of a part of the sub-network of the first image classification model can simplify the structure of the model, and greatly reduce the calculation amount and increase the recognition speed when classifying the images.

According to the technical scheme, a current sample image is input into an image classification model, and a similarity set is output; determining probability distribution information according to the similarity set; the probability distribution information is first probability distribution information or second probability distribution information; determining a label information set corresponding to the current sample graph; the tag set is determined by the category of the current sample image, the category of the historical sample image and the category of the set image; training the first image classification model based on the label information set, the first probability distribution information and the second probability distribution information, and training the second image classification model based on the trained first image classification model. According to the image classification model provided by the embodiment of the disclosure, the first image classification model and the second image classification model are subjected to contrast learning training based on the label information set, the first probability distribution information and the second probability distribution information, so that the long tail problem of the sample set can be solved, and the recognition precision of the trained image classification model is improved.

Fig. 3 is a schematic structural diagram of a training device for an image classification model according to an embodiment of the disclosure, where, as shown in fig. 3, the device includes:

A similarity set obtaining module 310, configured to input the current sample graph into the image classification model, and output a similarity set; the image classification model is a first image classification model or a second image classification model; the first image classification model and the second image classification model have the same structure and different internal parameters; the similarity set consists of the similarity between the current sample graph and the set image category and the similarity between the current sample graph and the historical sample graph, and is a first similarity set or a second similarity set; the first similarity set is output by the first image classification model, and the second similarity set is output by the second image classification model;

a probability distribution information determining module 320, configured to determine probability distribution information according to the similarity set; the probability distribution information is first probability distribution information or second probability distribution information; the first probability distribution information is determined according to the first similarity set, and the second probability distribution information is determined according to the second similarity set;

a tag information set determining module 330, configured to determine a tag information set corresponding to the current sample map; the tag set is determined by the category of the current sample image, the category of the historical sample image and the set image category;

The model training module 340 is configured to train the first image classification model based on the tag information set, the first probability distribution information, and the second probability distribution information, and train the second image classification model based on the trained first image classification model.

Optionally, the image classification model comprises a coding sub-network, a classification sub-network, a full-connection sub-network, a fusion sub-network and a splicing sub-network; the similarity set acquisition module 310 is further configured to:

inputting the current sample graph into the coding sub-network, and outputting first image characteristic information;

inputting the first image feature information into the classification sub-network, and outputting a class feature similarity subset; the class feature similarity subset consists of the similarity between the current sample graph and a set image class;

inputting the first image characteristic information into the fully-connected sub-network, and outputting second image characteristic information;

inputting the second image characteristic information and the characteristic information of the historical sample graph into the fusion sub-network, and outputting a subset of the characteristic similarity of the historical sample; the historical sample feature similarity subset consists of similarities between a current sample graph and a historical sample graph;

And inputting the category characteristic similarity subset and the historical sample characteristic similarity subset into the spliced subnetwork, and outputting a similarity set.

Optionally, the probability distribution information determining module 320 is further configured to:

performing index operation on the similarity set to obtain an index similarity set;

and determining the proportion of each index similarity in the index similarity sets, and determining the proportion as probability information of each index similarity.

determining the number proportion of each set image category in the sample set;

updating the category characteristic similarity subset in the index similarity set according to the quantity ratio to obtain an updated index similarity set;

and determining the proportion of each index similarity in the updated index similarity set in the index similarity set, and determining the proportion as probability information of each index similarity.

Optionally, the tag information set determining module 330 is further configured to:

determining a first tag information subset according to a comparison result of the category of the current sample image and the category of the set image;

determining a second tag information subset according to a comparison result of the category of the current sample image and the category of the historical sample image;

And splicing the first tag information subset and the second tag information subset to obtain a tag information set.

determining an adjustment coefficient according to the number of the historical sample images;

adjusting the second tag information subset according to the adjustment coefficient;

and splicing the first tag information subset and the adjusted second tag information subset to obtain a tag information set.

Optionally, the model training module 340 is further configured to:

linearly superposing the tag information set and the second probability distribution information to obtain intermediate probability distribution information;

fusing the intermediate probability distribution information and the first probability distribution information to obtain loss information;

and performing inverse gradient parameter adjustment on the first image classification model according to the loss information so as to train the first image classification model.

Optionally, the model training module 340 is further configured to:

acquiring first model parameters in the trained first image classification model and second model parameters of the second image classification model;

superposing the first model parameter and the second model parameter to obtain a target model parameter;

And carrying out parameter updating on the second image classification model according to the target model parameters so as to train the second image classification model.

Optionally, the method further comprises: an image classification module for:

deleting the middle-coding full-connection sub-network, the fusion sub-network and the splicing sub-network of the trained first image classification model to obtain a target image classification model;

classifying the images to be classified according to the target image classification model.

The training device for the image classification model provided by the embodiment of the disclosure can execute the training method for the image classification model provided by any embodiment of the disclosure, and has the corresponding functional modules and beneficial effects of the execution method.

It should be noted that each unit and module included in the above apparatus are only divided according to the functional logic, but not limited to the above division, so long as the corresponding functions can be implemented; in addition, the specific names of the functional units are also only for convenience of distinguishing from each other, and are not used to limit the protection scope of the embodiments of the present disclosure.

Fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the disclosure. Referring now to fig. 4, a schematic diagram of an electronic device (e.g., a terminal device or server in fig. 4) 500 suitable for use in implementing embodiments of the present disclosure is shown. The terminal devices in the embodiments of the present disclosure may include, but are not limited to, mobile terminals such as mobile phones, notebook computers, digital broadcast receivers, PDAs (personal digital assistants), PADs (tablet computers), PMPs (portable multimedia players), in-vehicle terminals (e.g., in-vehicle navigation terminals), and the like, and stationary terminals such as digital TVs, desktop computers, and the like. The electronic device shown in fig. 4 is merely an example and should not be construed to limit the functionality and scope of use of the disclosed embodiments.

As shown in fig. 4, the electronic device 500 may include a processing means (e.g., a central processing unit, a graphics processor, etc.) 501, which may perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 502 or a program loaded from a storage means 508 into a Random Access Memory (RAM) 503. In the RAM 503, various programs and data required for the operation of the electronic apparatus 500 are also stored. The processing device 501, the ROM 502, and the RAM 503 are connected to each other via a bus 504. An edit/output (I/O) interface 505 is also connected to bus 504.

In general, the following devices may be connected to the I/O interface 505: input devices 506 including, for example, a touch screen, touchpad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; an output device 507 including, for example, a Liquid Crystal Display (LCD), a speaker, a vibrator, and the like; storage 508 including, for example, magnetic tape, hard disk, etc.; and communication means 509. The communication means 509 may allow the electronic device 500 to communicate with other devices wirelessly or by wire to exchange data. While fig. 4 shows an electronic device 500 having various means, it is to be understood that not all of the illustrated means are required to be implemented or provided. More or fewer devices may be implemented or provided instead.

In particular, according to embodiments of the present disclosure, the processes described above with reference to flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a non-transitory computer readable medium, the computer program comprising program code for performing the method shown in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network via the communication means 509, or from the storage means 508, or from the ROM 502. The above-described functions defined in the methods of the embodiments of the present disclosure are performed when the computer program is executed by the processing device 501.

The electronic device provided by the embodiment of the present disclosure and the training method of the image classification model provided by the foregoing embodiment belong to the same inventive concept, and technical details not described in detail in the present embodiment may be referred to the foregoing embodiment, and the present embodiment has the same beneficial effects as the foregoing embodiment.

The embodiment of the present disclosure provides a computer storage medium having stored thereon a computer program which, when executed by a processor, implements the training method of the image classification model provided by the above embodiment.

It should be noted that the computer readable medium described in the present disclosure may be a computer readable signal medium or a computer readable storage medium, or any combination of the two. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of the computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this disclosure, a computer-readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present disclosure, however, the computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, with the computer-readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, fiber optic cables, RF (radio frequency), and the like, or any suitable combination of the foregoing.

In some implementations, the clients, servers may communicate using any currently known or future developed network protocol, such as HTTP (HyperText Transfer Protocol ), and may be interconnected with any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include a local area network ("LAN"), a wide area network ("WAN"), the internet (e.g., the internet), and peer-to-peer networks (e.g., ad hoc peer-to-peer networks), as well as any currently known or future developed networks.

The computer readable medium may be contained in the electronic device; or may exist alone without being incorporated into the electronic device.

The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to:

the computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: inputting the current sample graph into an image classification model, and outputting a similarity set; the image classification model is a first image classification model or a second image classification model; the similarity set consists of the similarity between the current sample graph and the set image category and the similarity between the current sample graph and the historical sample graph, and is a first similarity set or a second similarity set; determining probability distribution information according to the similarity set; the probability distribution information is first probability distribution information or second probability distribution information; determining a label information set corresponding to the current sample graph; the label information set is determined by the category of the current sample image, the category of the historical sample image and the set image category; training the first image classification model based on the label information set, the first probability distribution information and the second probability distribution information, and training the second image classification model based on the trained first image classification model.

Computer program code for carrying out operations of the present disclosure may be written in one or more programming languages, including, but not limited to, an object oriented programming language such as Java, smalltalk, C ++ and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (for example, through the Internet using an Internet service provider).

The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units involved in the embodiments of the present disclosure may be implemented by means of software, or may be implemented by means of hardware. The name of the unit does not in any way constitute a limitation of the unit itself, for example the first acquisition unit may also be described as "unit acquiring at least two internet protocol addresses".

The functions described above herein may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: a Field Programmable Gate Array (FPGA), an Application Specific Integrated Circuit (ASIC), an Application Specific Standard Product (ASSP), a system on a chip (SOC), a Complex Programmable Logic Device (CPLD), and the like.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

According to one or more embodiments of the present disclosure, there is provided a training method of an image classification model, including:

inputting the current sample graph into an image classification model, and outputting a similarity set; the image classification model is a first image classification model or a second image classification model; the similarity set consists of the similarity between the current sample graph and the set image category and the similarity between the current sample graph and the historical sample graph, and is a first similarity set or a second similarity set;

determining probability distribution information according to the similarity set; the probability distribution information is first probability distribution information or second probability distribution information;

determining a label information set corresponding to the current sample graph; the label information set is determined by the category of the current sample image, the category of the historical sample image and the set image category;

Further, the image classification model comprises a coding sub-network, a classification sub-network, a full-connection sub-network, a fusion sub-network and a splicing sub-network; inputting the current sample graph into an image classification model, and outputting a similarity set, wherein the method comprises the following steps:

Further, determining probability distribution information according to the similarity set includes:

Further, after obtaining the index similarity set, the method further comprises:

determining the number proportion of each set image category in the sample set;

correspondingly, determining the proportion of each index similarity in the index similarity sets, and determining the proportion as probability information of each index similarity, wherein the probability information comprises the following steps:

Further, determining a category label set corresponding to the current sample graph includes:

Further, determining a second tag information subset according to a comparison result of the category of the current sample image and the category of the historical sample image, including:

correspondingly, splicing the first tag information subset and the second tag information subset to obtain a tag information set, including:

Further, training the first image classification model based on the tag set, the first probability distribution information, and the second probability distribution information includes:

Further, training the second image classification model based on the trained first image classification model includes:

Further, after training the first image classification model based on the tag information set, the first probability distribution information, and the second probability distribution information, the method further includes:

The foregoing description is only of the preferred embodiments of the present disclosure and description of the principles of the technology being employed. It will be appreciated by persons skilled in the art that the scope of the disclosure referred to in this disclosure is not limited to the specific combinations of features described above, but also covers other embodiments which may be formed by any combination of features described above or equivalents thereof without departing from the spirit of the disclosure. Such as those described above, are mutually substituted with the technical features having similar functions disclosed in the present disclosure (but not limited thereto).

Moreover, although operations are depicted in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order. In certain circumstances, multitasking and parallel processing may be advantageous. Likewise, while several specific implementation details are included in the above discussion, these should not be construed as limiting the scope of the present disclosure. Certain features that are described in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination.

Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are example forms of implementing the claims.

Claims

1. A method for training an image classification model, comprising:

2. The method of claim 1, wherein the image classification model comprises a coding sub-network, a classification sub-network, a fully connected sub-network, a fusion sub-network, and a stitching sub-network; inputting the current sample graph into an image classification model, and outputting a similarity set, wherein the method comprises the following steps:

3. The method of claim 2, wherein determining probability distribution information from the set of similarities comprises:

4. A method according to claim 3, further comprising, after obtaining the set of exponential similarity:

determining the number proportion of each set image category in the sample set;

5. The method of claim 1, wherein determining the set of class labels to which the current sample graph corresponds comprises:

6. The method of claim 5, wherein determining a second subset of tag information based on a comparison of the category of the current sample image and the category of the historical sample image comprises:

7. The method of claim 1, wherein training the first image classification model based on the tag set, the first probability distribution information, and the second probability distribution information comprises:

8. The method of claim 1 or 7, wherein training the second image classification model based on the trained first image classification model comprises:

9. The method of claim 2, further comprising, after training the first image classification model based on the set of tag information, the first probability distribution information, and the second probability distribution information:

10. A training device for an image classification model, comprising:

The model training module is used for training the first image classification model based on the label information set, the first probability distribution information and the second probability distribution information, and training the second image classification model based on the trained first image classification model.

11. An electronic device, the electronic device comprising:

one or more processors;

storage means for storing one or more programs,

the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method of training an image classification model as claimed in any one of claims 1-9.

12. A storage medium containing computer executable instructions for performing the training method of the image classification model of any of claims 1-9 when executed by a computer processor.