CN111738351B

CN111738351B - Model training method and device, storage medium and electronic equipment

Info

Publication number: CN111738351B
Application number: CN202010623929.5A
Authority: CN
Inventors: 张发恩; 宋亮
Original assignee: Ainnovation Chongqing Technology Co ltd
Current assignee: Ainnovation Chongqing Technology Co ltd
Priority date: 2020-06-30
Filing date: 2020-06-30
Publication date: 2023-12-19
Anticipated expiration: 2040-06-30
Also published as: CN111738351A

Abstract

The application relates to the technical field of image clustering, and provides a model training method, a device, a storage medium and electronic equipment. The model training method comprises the following steps: acquiring a training image and inputting the training image to an encoder to acquire the distribution parameters of the feature vectors output by the encoder; determining the category of the training image according to the distribution parameters of the feature vectors; sampling from the distribution of the categories of the training image to obtain a sampling vector; inputting the sampling vector to a decoder to obtain a reconstructed image output by the decoder; respectively inputting the training image and the reconstructed image into a discriminator to obtain a discrimination result output by the discriminator; repeating the steps from obtaining the training image to obtaining the discrimination result to train the image clustering model including the encoder, the decoder and the discriminator. According to the method, the discriminators are arranged and the model is subjected to countermeasure training, so that the encoder can effectively extract image features, and therefore, a better effect can be achieved by using the trained encoder to execute image clustering tasks later.

Description

Model training method and device, storage medium and electronic equipment

Technical Field

The invention relates to the technical field of image clustering, in particular to a model training method, a device, a storage medium and electronic equipment.

Background

The process of dividing a collection of objects into multiple categories consisting of similar objects is called clustering. The current unsupervised clustering method mainly utilizes the characteristics of the extracted objects to perform clustering, but for some unstructured data, such as images, better characteristics are not easy to extract, so that the clustering effect is poor.

Disclosure of Invention

An objective of an embodiment of the present application is to provide a model training method, a device, a storage medium and an electronic apparatus, so as to improve the above technical problems.

In order to achieve the above purpose, the present application provides the following technical solutions:

in a first aspect, an embodiment of the present application provides a model training method, including: acquiring a training image, inputting the training image to an encoder, and acquiring distribution parameters of feature vectors output by the encoder; determining the category of the training image according to the distribution parameters of the feature vectors; sampling from the distribution of the categories of the training image to obtain a sampling vector; inputting the sampling vector to a decoder to obtain a reconstructed image output by the decoder; respectively inputting the training image and the reconstruction image to a discriminator to obtain a discrimination result output by the discriminator, wherein the discrimination result comprises the authenticity and the category identity of the input image; repeating the steps from obtaining training images to obtaining discrimination results to train an image clustering model including the encoder, the decoder and the discriminator; the training mode is countermeasure training, and the training target comprises the fact that the authenticity and the category of the training image and the reconstructed image cannot be distinguished according to the judging result output by the judging device.

The method is an unsupervised clustering method based on a variational self-encoder, in which, by performing countertraining on an image clustering model comprising an encoder, a decoder and a discriminator, after training is finished, the discriminator is difficult to distinguish a reconstructed image output by the decoder from a real training image, and the image reconstruction by the decoder is performed based on the distribution parameters of feature vectors extracted by the encoder, which means that the encoder can effectively extract the features of the image at the moment, so that better effects can be obtained by performing image clustering tasks by using the trained encoder later.

In addition, in the method, the discrimination degree of the training image and the reconstructed image is evaluated by arranging the discriminator (for example, a neural network), instead of adopting an image difference mode (for example, calculating the L2 distance between the training image and the reconstructed image) to perform loss calculation, so that the problems that the loss is difficult to converge and the model training is difficult are avoided.

In an implementation manner of the first aspect, the determining the class of the training image according to the distribution parameter of the feature vector includes: calculating the distance between the distribution of the feature vector and the distribution of each existing category according to the distribution parameters of the feature vector; determining the existing category with the smallest calculated distance as the category of the training image; and updating the distribution of the existing category by using the distribution parameters of the feature vector.

In the above implementation, the number of categories of the clusters is fixed, and by calculating the distance between the distribution of the feature vector and the distribution of each existing category, the current training image is necessarily divided into a certain existing category. Existing categories under such an implementation may be several categories that are preset before training begins.

In an implementation manner of the first aspect, the determining the class of the training image according to the distribution parameter of the feature vector includes: calculating the distance between the distribution of the feature vector and the distribution of each existing category according to the distribution parameters of the feature vector; judging whether a distance smaller than a preset threshold exists in the calculated distances or not; if the distance smaller than the preset threshold exists, determining the existing category corresponding to the minimum value in the distance smaller than the preset threshold as the category of the training image, and updating the distribution of the existing category by using the distribution parameters of the feature vector; if the distance smaller than the preset threshold value does not exist, a new category is allocated to the training image, and distribution of the new category is determined according to the distribution parameters of the feature vectors.

In the above implementation, the number of clusters is not fixed, and not only the distance between the distribution of the feature vector and the distribution of each existing class, but also the size relation between the calculated distance and the preset threshold is calculated, and according to the size relation, the current training image may be divided into a certain class, but a new class may be created for the current training image. In this way, before training starts, several categories may or may not be preset as existing categories.

In an implementation manner of the first aspect, the calculating a distance between the distribution of the feature vector and the distribution of each existing class according to the distribution parameter of the feature vector includes: calculating the distance between the distribution parameter of the feature vector and the distribution parameter of each existing category as the distance between the distribution of the feature vector and the distribution of each existing category; or determining the distribution of the feature vector according to the distribution parameter of the feature vector, and calculating the KL divergence between the distribution of the feature vector and the distribution of each existing category as the distance between the distribution of the feature vector and the distribution of each existing category.

In the above implementation, two methods of calculating the distance between the distribution of feature vectors and the distribution of each existing class are provided: firstly, the distance between the distribution parameters is calculated (because the distribution parameters can also take the form of vectors, which is equivalent to the distance between the calculation vectors); secondly, calculating KL divergence, also called relative entropy, which is used for evaluating the difference degree between two distributions. Of course, other calculation methods are not excluded.

In one implementation of the first aspect, the distribution parameters include a mean and a variance.

The probability density function of some distributions can be completely determined from the mean and variance only, e.g., gaussian distributions. In the alternative, the distribution of feature vectors, each class of distribution, can be assumed to follow a gaussian distribution.

In one implementation manner of the first aspect, the encoder, the decoder and the arbiter all use a neural network.

The neural network has good learning and generalization capability, and can extract deep features of images and be used for different purposes. For example, for two images with substantially identical content, one of the two images is shifted slightly, if the L2 distance is used to evaluate the distinction between the two images, since the L2 distance only characterizes the difference of the images at the pixel value level, the image shift may result in a large calculated L2 distance, and thus the evaluation result is misaligned; however, for the discriminators realized by the neural network, the discrimination evaluation is performed based on the deep features of the image (specific content of the representation image), and the obtained evaluation result is accurate because the content of the image is not changed basically by image translation.

In one implementation manner of the first aspect, the method further includes: and determining the category of the image to be processed by using an encoder in the trained image clustering model.

In a second aspect, an embodiment of the present application provides a model training apparatus, including: the coding module is used for acquiring a training image and inputting the training image to the coder to acquire the distribution parameters of the feature vectors output by the coder; the clustering module is used for determining the category of the training image according to the distribution parameters of the feature vectors; the sampling module is used for sampling from the distribution of the categories of the training images to obtain sampling vectors; the decoding module is used for inputting the sampling vector to a decoder to obtain a reconstructed image output by the decoder; the judging module is used for respectively inputting the training image and the reconstruction image to a discriminator to obtain a judging result output by the discriminator, wherein the judging result comprises the authenticity and the category identity of the input image; the iteration module is used for repeatedly acquiring the training image to acquire the judging result so as to train an image clustering model comprising the encoder, the decoder and the judging device; the training mode is countermeasure training, and the training target comprises the fact that the authenticity and the category of the training image and the reconstructed image cannot be distinguished according to the judging result output by the judging device.

In a third aspect, embodiments of the present application provide a computer-readable storage medium having stored thereon computer program instructions which, when read and executed by a processor, perform the method provided by the first aspect or any one of the possible implementations of the first aspect.

In a fourth aspect, embodiments of the present application provide an electronic device, including: a memory and a processor, the memory having stored therein computer program instructions which, when read and executed by the processor, perform the method of the first aspect or any one of the possible implementations of the first aspect.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the embodiments of the present application will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and should not be considered as limiting the scope, and other related drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 shows a schematic diagram of a variant self-encoder;

FIG. 2 shows a working schematic diagram of a model training method according to an embodiment of the present application;

FIG. 3 shows a flowchart of a model training method provided by an embodiment of the present application;

FIG. 4 shows a functional block diagram of a model training apparatus according to an embodiment of the present application;

fig. 5 shows a block diagram of an electronic device according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be described below with reference to the drawings in the embodiments of the present application. It should be noted that: like reference numerals and letters denote like items in the following figures, and thus once an item is defined in one figure, no further definition or explanation thereof is necessary in the following figures.

The terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

The model training method provided by the embodiment of the application is an unsupervised clustering method (namely, a trained model can be used for unsupervised clustering images) based on a Variational self-Encoder (VAE). Therefore, before describing the solution of the present application, the concept of the variable self-encoder is briefly described, and for implementation details of the variable self-encoder, reference may be made to the prior art, if not mentioned.

Fig. 1 shows a schematic structure of a variant self-encoder. Referring to fig. 1, the workflow of the variant self-encoder mainly includes three phases:

encoding: the input data is encoded by an encoder to obtain a mean value of the hidden variable and a variance of the hidden variable (refer to the mean and variance of the distribution to which the hidden variable is subject).

Sampling: sampling is carried out from the distribution obeyed by the hidden variable based on the mean value and the variance obtained in the coding stage, and a sampling variable is obtained.

Decoding: an output data consistent with the input data dimension is decoded based on the sampling variables.

By training the variational self-encoder, the output data can be made as close as possible to the input data, i.e. reconstruction of the input data is achieved. Thus, in some application scenarios, trained variations may be used for data generation from the encoder. In particular, in the field of image processing, a variation self-encoder may be used for image generation.

Fig. 2 shows an operation schematic diagram of a model training method according to an embodiment of the present application, where the model training method is used for training an image clustering model capable of performing image clustering tasks (in practice, only an encoder may be used to perform image clustering tasks, and the remaining components of the model are used only during training, which will be described later). Referring to fig. 2, it is apparent that the image clustering model also includes a variant self-encoder composed of an encoder and a decoder, and the model is mainly different from the variant self-encoder in that a discriminator is added. With respect to the other matters in fig. 2, the description will be given together with the description of fig. 3.

Fig. 3 shows a flowchart of a model training method according to an embodiment of the present application. The method may be performed by an electronic device, one possible configuration of which is shown in fig. 5, and reference may be made in particular to the description of fig. 5 below. Referring to fig. 3, the method includes:

step S100: and acquiring a training image, inputting the training image into an encoder, and acquiring the distribution parameters of the feature vectors output by the encoder.

The manner of acquiring the training image is not limited, and for example, the training image may be an unlabeled image collected from a network, an image in a certain data set, an image acquired in real time, or the like. The encoder is used for encoding the training image and outputting the distribution parameters of the feature vectors. The feature vector here corresponds to the hidden variable mentioned above when the variational self-Encoder is introduced, and for the self-Encoder (AE) the Encoder directly extracts the feature vector of the input image, but for the variational self-Encoder the feature vector of the training image is not directly extracted here, but the parameters of the distribution to which the feature vector is subjected are output, so that the feature vector is implicit in this sense.

The encoder may be implemented using a neural network, which may comprise a number of convolutional layers (although other layers are not excluded, such as pooling layers) for encoding an image as a distribution parameter of feature vectors by feature extraction. For example, the encoder may employ a network structure such as ResNet, VGG, leNet, googleNet.

In step S100, which distribution parameters need to be output depends on the distribution form of the feature vector, which needs to be determined in advance, and in turn, if the specific values of the distribution parameters of the feature vector are known, the specific distribution can be determined.

For example, if the feature vector is assumed to follow a gaussian distribution, the distribution parameters may include a mean and a variance from which a probability density function of a particular gaussian distribution can be uniquely determined; for another example, if it is assumed that the feature vector obeys an exponential distribution, the distribution parameters may include a rate parameter from which a probability density function of a specific exponential distribution can be uniquely determined.

Step S110: and determining the category of the training image according to the distribution parameters of the feature vectors.

The cluster categories determined in the step S110 are called existing categories, each of which corresponds to a probability distribution of itself, and each of the existing categories includes zero or more training images, which may be regarded as samples obtained by sampling the distribution of the category. Note that even if no training image is included under a certain category, the category may have a probability distribution, such a distribution being pre-specified. The distribution form of the clustering categories needs to be determined in advance, so that the distribution of each category can be maintained only by updating the values of the distribution parameters according to the category division condition of the training images in the process of clustering the training images, and the distribution of each category can be determined by adopting the distribution parameters.

Thus, determining the class of the training image according to the distribution parameters of the feature vectors in step S110 may refer to calculating the distance between the distribution of the feature vectors and the distribution of each existing class according to the distribution parameters of the feature vectors, and then determining which existing class the current training image should belong to or not belong to according to the calculated distance. The distance is generally referred to as a similarity measurement result, and is used for measuring the similarity between two distributions, where the value of the distance and the similarity of the distributions may be a positive correlation. Two distance calculation modes are listed below, and it will be appreciated that other distance calculation modes exist:

mode one: the distance between the distribution parameters of the feature vector and the distribution parameters of the existing class is calculated as the distance between the distribution of the feature vector and the distribution of the existing class. Since the distribution parameters can determine the probability density function of the distribution, if the parameters of two distributions are close, the similarity is higher and the distance between them is smaller. The distribution parameters may be expressed in the form of vectors (e.g., mean vectors, variance vectors, which have the same dimensions as the feature vectors), such that the distance at which the distribution parameters are calculated is the distance between the calculated vectors.

Mode two: and determining the distribution of the feature vectors according to the distribution parameters of the feature vectors, and calculating the KL divergence between the distribution of the feature vectors and the distribution of the existing category as the distance between the distribution of the feature vectors and the distribution of the existing category. The KL divergence is also called relative entropy, and the design purpose is to evaluate the difference degree between two distributions, and the larger the KL divergence value is, the larger the difference between two branches is, otherwise, the smaller the difference is.

There are mainly two cases in image clustering: one is to pre-designate the categories contained in the clustering result, and only divide the images to be clustered into the preset categories in the clustering process, for example, obtain some priori knowledge about the images to be clustered in advance, and determine the total categories contained in the images; the other is that the categories are not specified in advance, and the process of clustering allows new categories to be generated, for example, the fact that the generation clustered images can be divided into several categories is not known in advance. For the latter case, some preset categories may be designated as initial categories, or none may be designated as initial categories, and all the categories are generated during the clustering process.

For the first case, step S110 may be implemented as follows: firstly, calculating the distance between the distribution of the feature vector and the distribution of each existing category according to the distribution parameters of the feature vector; then, determining the existing category with the smallest calculated distance as the category of the training image; finally, the distribution parameters of the feature vectors are used for updating the distribution of the existing categories.

The existing category is preset, the distribution parameters of the initial distribution of the existing category can be randomly given, after the training images are divided into the category, the distribution parameters of the corresponding feature vectors are utilized to update the parameters of the initial distribution, and the updated distribution parameters have practical significance. In such an implementation, the current training image must be classified into some existing category.

For the second case, step S110 may be implemented as follows: firstly, calculating the distance between the distribution of the feature vector and the distribution of each existing category according to the distribution parameters of the feature vector; then, judging whether a distance smaller than a preset threshold exists in the calculated distances; if the distance smaller than the preset threshold exists, determining the existing category corresponding to the minimum value in the distance smaller than the preset threshold as the category of the training image, and updating the distribution of the existing category by using the distribution parameters of the feature vector; if the distance smaller than the preset threshold value does not exist, a new class is allocated to the training image, and distribution of the new class is determined according to the distribution parameters of the feature vectors.

In the above implementation, not only the distance between the distribution of the feature vector and the distribution of each existing class is calculated, but also the magnitude relation between the calculated distance and the preset threshold is considered, and according to this magnitude relation, the current training image may be divided into a certain class, but a new class may be created for it, specifically, if a certain distance is smaller than the preset threshold, it indicates that the current training image is likely to be the result of sampling the distribution of a certain existing class, so it should be divided into a certain existing class, otherwise, a new class should be created for it. In this way, before training starts, a plurality of categories may be preset as existing categories, or any category may not be preset, and the categories may be generated during the clustering process.

Of course, other image clustering methods are not excluded except the above two cases, for example, no preset category is designated for image clustering, but a threshold value of the maximum number of categories is designated, if the number of existing categories does not reach the threshold value in the clustering process, new categories are allowed to be generated, if the threshold value is reached, new categories are not allowed to be generated, and only the images are allowed to be divided into the existing categories.

Step S120: the sampling vector is obtained from a distribution of categories of the training image.

Since the class of the training image has been determined in step S110, only the distribution of the class needs to be sampled to obtain a sample vector, where the sample vector corresponds to the sample variable mentioned above when the variation from the encoder is introduced. Sampling based on a known distribution is known in the art and not specifically described herein, the vector generated by the sampling may be a random vector.

Step S130: the sample vector is input to a decoder to obtain a reconstructed image output by the decoder.

The decoder may be implemented using a neural network, which may comprise several deconvolution layers (of course not excluding other layers, such as a pooling layer) for decoding (or weighing) a vector into an image.

Step S140: the training image and the reconstruction image are respectively input into a discriminator to obtain a discrimination result output by the discriminator, wherein the discrimination result comprises the authenticity and the category identity of the input image.

The authenticity of an input image (may correspond to a score or probability) is whether the input image is an original training image or a reconstructed training image, and the category identity of the input image (may correspond to a score or probability) is whether the input training image and the reconstructed image belong to the same category.

The discriminant may be implemented using a neural network, although it is not precluded from using certain inherent rules to perform the discriminant, such as calculating the similarity of training images and reconstructed images. However, because the neural network has good learning and generalization capability, the neural network is more reliable in outputting the discrimination result after training compared with the discrimination result calculated by only using the preset rule.

Step S150: and (5) countertraining the image clustering model until the training ending condition is met.

The image cluster model includes at least an encoder, a decoder, and a arbiter, but other structures are not excluded. Wherein the codec is regarded as one part and the arbiter is regarded as another part, and the countermeasure training is the countermeasure of the training target between the two parts. The goal of the codec training is to have the decoder reconstruct as much as possible the exact same image as the training image, enough to "fool" the discriminator; the training target of the discriminator is to distinguish the reconstructed image and the training image as far as possible, and the method is not deceptively used by a coder and a decoder. In general terms, the training objectives of the image clustering model include: after training is completed, the authenticity and the category of the training image and the reconstructed image cannot be distinguished according to the judging result output by the trained judging device, namely, the training image or the reconstructed image is input to the judging device, whether the training image or the reconstructed image is the reconstructed image or the original training image is difficult to determine according to the judging result, the training image and the reconstructed image are input to the judging device, and the training image and the reconstructed image belong to the same category according to the judging result, namely, the reconstructed image of the decoder is close to the real training image.

Regarding the principle of countermeasure training, reference may be made to the content of interest in the current generation type countermeasure network (Generative Adversarial Networks, GAN for short), wherein the codec may be regarded as the generation network (Generator) in GAN, and the Discriminator may be regarded as the discrimination network (Discriminator) in GAN.

Step S100 to step S140 can be regarded as an iteration process (the steps of calculating the loss and updating the network parameters are omitted) in the training process, and step S150 is an iteration step, and after each training pass, it is determined whether the training ending condition is satisfied, if so, the training is ended, otherwise, the process returns to step S100 to start the next training pass. The training ending condition may have various setting modes, for example, may be ending after training for a certain round, may be ending after training for a certain time, may be ending after the convergence of the discriminator, or may be a combination of the above various conditions, and so on.

After training the image clustering model, the encoder can be used for executing the image clustering task, and the process is similar to the steps S100 and S110: and acquiring an image to be processed (namely an image to be clustered), inputting the image to be processed into a trained encoder, acquiring the distribution parameters of the feature vectors output by the encoder, and determining the category of the image to be processed according to the distribution parameters of the feature vectors. Reference may also be made to step S100 and step S110 for specific implementation, and are not repeated here. As for other components in the image clustering model, they may not be used in performing the actual image clustering task.

In summary, in the model training method provided in the embodiment of the present application, by performing the countermeasure training on the image clustering model, it is difficult for the discriminator to distinguish between the reconstructed image output by the decoder and the actual training image after the training is finished, and since the image reconstruction performed by the decoder is performed based on the distribution parameters of the feature vectors extracted by the encoder, this means that the encoder can effectively extract the features of the image (only if the extracted features are good enough, it is difficult for the discriminator to distinguish between the reconstructed image and the training image), so that the image clustering task performed by the trained encoder can obtain a better effect.

In some comparative embodiments, instead of using a discriminant, clustering is performed using only the variational self-encoder and the reconstruction penalty is added to the penalty function of the variational self-encoder (e.g., calculating the difference between the training image and the reconstructed image) such that the reconstructed image is sufficiently close to the training image. Taking the L2 distance to calculate the reconstruction loss as an example, long-term researches of the inventor find that, because the L2 distance only characterizes the difference of the images at the pixel value level, some small changes (such as translation operation) of the images at the pixel level may cause that the calculated L2 distance is very large, especially when the image size is large, the problem is more obvious, so that the reconstruction loss is difficult to converge, and the model training is unfavorable. In the scheme of the application, the discrimination degree of the training image and the reconstructed image is evaluated by setting the discriminator, and the loss calculation is replaced by adopting an image difference mode, so that the problems in the comparison embodiment are avoided, the image clustering model is easy to train, and the image clustering model can be used for executing the clustering task of the large-size image.

Further, in some implementations of the method, the encoder, decoder, and arbiter in the image clustering network are implemented using a neural network. Because the neural network has good learning and generalization capability, deep features of the image can be extracted and used for different purposes. For example, for two images with substantially the same content, one of the two images is only slightly shifted relative to the other, if the L2 distance is used to evaluate the distinction between the two images, the calculated L2 distance may be large due to the influence of image shifting, so that the evaluation result is misaligned; however, for the discriminators realized by the neural network, the discrimination evaluation is performed based on the deep features of the image (specific content of the representation image), and the obtained evaluation result is accurate because the content of the image is not changed basically by image translation.

Fig. 4 shows a functional block diagram of a model training apparatus 200 according to an embodiment of the present application. Referring to fig. 4, the model training apparatus 200 includes:

the encoding module 210 is configured to obtain a training image and input the training image to an encoder, so as to obtain a distribution parameter of a feature vector output by the encoder;

a clustering module 220, configured to determine a class of the training image according to the distribution parameter of the feature vector;

a sampling module 230, configured to sample from the distribution of the categories of the training image to obtain a sampling vector;

a decoding module 240, configured to input the sampling vector to a decoder, and obtain a reconstructed image output by the decoder;

the distinguishing module 250 is configured to input the training image and the reconstructed image to a discriminator respectively, and obtain a distinguishing result output by the discriminator, where the distinguishing result includes authenticity and category identity of the input image;

an iteration module 260 for repeating the steps from obtaining the training image to obtaining the discrimination result to train the image clustering model including the encoder, the decoder and the discriminator; the training mode is countermeasure training, and the training target comprises the fact that the authenticity and the category of the training image and the reconstructed image cannot be distinguished according to the judging result output by the judging device.

In one implementation of the model training apparatus 200, the clustering module 220 determines a class of the training image according to the distribution parameter of the feature vector, including: calculating the distance between the distribution of the feature vector and the distribution of each existing category according to the distribution parameters of the feature vector; determining the existing category with the smallest calculated distance as the category of the training image; and updating the distribution of the existing category by using the distribution parameters of the feature vector.

In one implementation of the model training apparatus 200, the clustering module 220 determines a class of the training image according to the distribution parameter of the feature vector, including: calculating the distance between the distribution of the feature vector and the distribution of each existing category according to the distribution parameters of the feature vector; judging whether a distance smaller than a preset threshold exists in the calculated distances or not; if the distance smaller than the preset threshold exists, determining the existing category corresponding to the minimum value in the distance smaller than the preset threshold as the category of the training image, and updating the distribution of the existing category by using the distribution parameters of the feature vector; if the distance smaller than the preset threshold value does not exist, a new category is allocated to the training image, and distribution of the new category is determined according to the distribution parameters of the feature vectors.

In one implementation of the model training apparatus 200, the clustering module 220 calculates a distance between the distribution of the feature vector and the distribution of each existing class according to the distribution parameter of the feature vector, including: calculating the distance between the distribution parameter of the feature vector and the distribution parameter of each existing category as the distance between the distribution of the feature vector and the distribution of each existing category; or determining the distribution of the feature vector according to the distribution parameter of the feature vector, and calculating the KL divergence between the distribution of the feature vector and the distribution of each existing category as the distance between the distribution of the feature vector and the distribution of each existing category.

In one implementation of model training apparatus 200, the distribution parameters include a mean and a variance.

In one implementation of model training apparatus 200, the encoder, decoder, and arbiter all employ a neural network.

In one implementation of model training apparatus 200, the apparatus further comprises: and the application module is used for determining the category of the image to be processed by utilizing the encoder in the trained image clustering model.

The model training apparatus 200 provided in the embodiment of the present application has been described in the foregoing method embodiment, and for brevity, reference may be made to the corresponding content in the method embodiment where the apparatus embodiment is not mentioned.

Fig. 5 shows one possible structure of an electronic device 300 provided in an embodiment of the present application. Referring to fig. 5, the electronic device 300 includes: processor 310, memory 320, and communication interface 330, which are interconnected and communicate with each other by a communication bus 340 and/or other forms of connection mechanisms (not shown).

The Memory 320 includes one or more (Only one is shown in the figure), which may be, but is not limited to, a random access Memory (Random Access Memory, RAM), a Read Only Memory (ROM), a programmable Read Only Memory (Programmable Read-Only Memory, PROM), an erasable programmable Read Only Memory (Erasable Programmable Read-Only Memory, EPROM), an electrically erasable programmable Read Only Memory (Electrically Erasable Programmable Read-Only Memory, EEPROM), and the like. The processor 310, as well as other possible components, may access, read, and/or write data from, the memory 320.

The processor 310 includes one or more (only one shown) which may be an integrated circuit chip having signal processing capabilities. The processor 610 may be a general-purpose processor, including a central processing unit (Central Processing Unit, CPU), a micro control unit (Micro Controller Unit, MCU), a network processor (Network Processor, NP), or other conventional processor; but may also be a special purpose processor including a digital signal processor (Digital Signal Processor, DSP for short), an application specific integrated circuit (Application Specific Integrated Circuits, ASIC for short), a field programmable gate array (Field Programmable Gate Array, FPGA for short), or other programmable logic device, discrete gate or transistor logic device, discrete hardware components.

The communication interface 330 includes one or more (only one shown) that may be used to communicate directly or indirectly with other devices for data interaction. Communication interface 330 may include an interface for wired and/or wireless communication.

One or more computer program instructions may be stored in memory 320 that may be read and executed by processor 310 to implement the model training methods and other desired functions provided by embodiments of the present application.

It is to be understood that the configuration shown in fig. 5 is merely illustrative, and that electronic device 300 may also include more or fewer components than those shown in fig. 5, or have a different configuration than that shown in fig. 5. The components shown in fig. 5 may be implemented in hardware, software, or a combination thereof. The electronic device 300 may be a physical device such as a server, a PC, a notebook, a tablet, a cell phone, a wearable device, an image capturing device, a vehicle-mounted device, a drone, a robot, etc., or may be a virtual device such as a virtual machine, a virtualized container, etc. The electronic device 300 is not limited to a single device, and may be a combination of a plurality of devices or one or more clusters formed by a large number of devices.

The embodiment of the application also provides a computer readable storage medium, and the computer readable storage medium stores computer program instructions, and when the computer program instructions are read and executed by a processor of a computer, the model training method provided by the embodiment of the application is executed. For example, the computer-readable storage medium may be implemented as memory 320 in electronic device 300 in FIG. 5.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other manners. The above-described apparatus embodiments are merely illustrative, for example, the division of the units is merely a logical function division, and there may be other manners of division in actual implementation, and for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be through some communication interface, device or unit indirect coupling or communication connection, which may be in electrical, mechanical or other form.

Further, the units described as separate units may or may not be physically separate, and units displayed as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

Furthermore, functional modules in various embodiments of the present application may be integrated together to form a single portion, or each module may exist alone, or two or more modules may be integrated to form a single portion.

The foregoing is merely exemplary embodiments of the present application and is not intended to limit the scope of the present application, and various modifications and variations may be suggested to one skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principles of the present application should be included in the protection scope of the present application.

Claims

1. A method of model training, comprising:

acquiring a training image, inputting the training image to an encoder, and acquiring distribution parameters of feature vectors output by the encoder;

determining the category of the training image according to the distribution parameters of the feature vectors;

sampling from the distribution of the categories of the training image to obtain a sampling vector;

inputting the sampling vector to a decoder to obtain a reconstructed image output by the decoder;

respectively inputting the training image and the reconstruction image to a discriminator to obtain a discrimination result output by the discriminator, wherein the discrimination result comprises the authenticity and the category identity of the input image;

repeating the steps from obtaining training images to obtaining discrimination results to train an image clustering model including the encoder, the decoder and the discriminator; the training mode is countermeasure training, and the training target comprises the fact that the authenticity and the category of the training image and the reconstructed image cannot be distinguished according to the judging result output by the judging device; the countermeasure training refers to countermeasure of training targets between a codec and the arbiter;

the determining the category of the training image according to the distribution parameters of the feature vectors comprises the following steps: calculating the distance between the distribution of the feature vector and the distribution of each existing category according to the distribution parameters of the feature vector; judging whether a distance smaller than a preset threshold exists in the calculated distances or not; if the distance smaller than the preset threshold exists, determining the existing category corresponding to the minimum value in the distance smaller than the preset threshold as the category of the training image, and updating the distribution of the existing category by using the distribution parameters of the feature vector; if the distance smaller than the preset threshold value does not exist, a new category is allocated to the training image, and distribution of the new category is determined according to the distribution parameters of the feature vectors.

2. The model training method according to claim 1, wherein the determining the category of the training image according to the distribution parameter of the feature vector includes:

calculating the distance between the distribution of the feature vector and the distribution of each existing category according to the distribution parameters of the feature vector;

determining the existing category with the smallest calculated distance as the category of the training image;

and updating the distribution of the existing category by using the distribution parameters of the feature vector.

3. Model training method according to claim 1 or 2, characterized in that the calculating the distance between the distribution of the feature vector and the distribution of each existing class from the distribution parameters of the feature vector comprises:

calculating the distance between the distribution parameter of the feature vector and the distribution parameter of each existing category as the distance between the distribution of the feature vector and the distribution of each existing category; or,

and determining the distribution of the feature vector according to the distribution parameters of the feature vector, and calculating the KL divergence between the distribution of the feature vector and the distribution of each existing category as the distance between the distribution of the feature vector and the distribution of each existing category.

4. The model training method of claim 1, wherein the distribution parameters include a mean and a variance.

5. The model training method of claim 1, wherein the encoder, the decoder, and the arbiter each employ a neural network.

6. The model training method of claim 1, wherein the method further comprises:

and determining the category of the image to be processed by using an encoder in the trained image clustering model.

7. A model training device, comprising:

the coding module is used for acquiring a training image and inputting the training image to the coder to acquire the distribution parameters of the feature vectors output by the coder;

the clustering module is used for determining the category of the training image according to the distribution parameters of the feature vectors;

the sampling module is used for sampling from the distribution of the categories of the training images to obtain sampling vectors;

the decoding module is used for inputting the sampling vector to a decoder to obtain a reconstructed image output by the decoder;

the judging module is used for respectively inputting the training image and the reconstruction image to a discriminator to obtain a judging result output by the discriminator, wherein the judging result comprises the authenticity and the category identity of the input image;

the iteration module is used for repeatedly acquiring the training image to acquire the judging result so as to train an image clustering model comprising the encoder, the decoder and the judging device; the training mode is countermeasure training, and the training target comprises the fact that the authenticity and the category of the training image and the reconstructed image cannot be distinguished according to the judging result output by the judging device; the countermeasure training refers to countermeasure of training targets between a codec and the arbiter;

the clustering module is further used for calculating the distance between the distribution of the feature vector and the distribution of each existing category according to the distribution parameters of the feature vector; judging whether a distance smaller than a preset threshold exists in the calculated distances or not; if the distance smaller than the preset threshold exists, determining the existing category corresponding to the minimum value in the distance smaller than the preset threshold as the category of the training image, and updating the distribution of the existing category by using the distribution parameters of the feature vector; if the distance smaller than the preset threshold value does not exist, a new category is allocated to the training image, and distribution of the new category is determined according to the distribution parameters of the feature vectors.

8. A computer readable storage medium, having stored thereon computer program instructions which, when read and executed by a processor, perform the method of any of claims 1-6.

9. An electronic device, comprising: a memory and a processor, the memory having stored therein computer program instructions which, when read and executed by the processor, perform the method of any of claims 1-6.