CN114005005A

CN114005005A - Double-batch standardized zero-instance image classification method

Info

Publication number: CN114005005A
Application number: CN202111643490.3A
Authority: CN
Inventors: 刘国清; 杨广; 王启程; 郑伟; 张见阳; 杨国武
Original assignee: Shenzhen Minieye Innovation Technology Co Ltd
Current assignee: Shenzhen Youjia Innovation Technology Co ltd
Priority date: 2021-12-30
Filing date: 2021-12-30
Publication date: 2022-02-01
Anticipated expiration: 2041-12-30
Also published as: CN114005005B

Abstract

The application provides a double-batch standardized zero-instance image classification method, which comprises the steps of inputting an image data set to be trained into a feature extraction module to obtain a feature vector of an image; mapping semantic information of the categories to a visual space by using a full-connection module to obtain category expression vectors; splicing the characteristic vector of the labeled image and the characteristic vector of the unlabeled image with the category expression vector to obtain a plurality of pairs of first expression pair vectors and a plurality of pairs of second expression pair vectors; carrying out batch standardization processing on the first representation pair vector and the second representation pair vector to obtain a first standard representation pair vector and a second standard representation pair vector; learning a first standard representation pair vector by using a full-connection module, and inputting a second standard representation pair vector to calculate to obtain a second standard representation pair vector pairing probability; and selecting the second representation with the highest pairing probability in each unlabeled image to output the class of the unlabeled image in the vector. The method and the device can calibrate the distribution difference, so that high-performance zero-instance image classification is realized.

Description

Double-batch standardized zero-instance image classification method

Technical Field

The application relates to the technical field of machine learning, in particular to a double-batch standardized zero-instance image classification method, a classification model learning method and a terminal.

Background

In the field of image classification, a machine learning model needs to learn training images of various classes to correctly identify the class to which a test image belongs. When classes that need to be identified lack training data, conventional machine learning models cannot effectively train image classifiers. The zero-instance image classification method aims at training a model to correctly identify invisible class images that do not appear in the training dataset. The existing zero-instance image classification methods are roughly as follows.

The first method comprises the following steps: zero instance image classification method based on generation of countermeasure network. The general working principle of the zero-instance image classification method based on the generation countermeasure network is as follows: firstly, extracting the characteristics of a training image through a convolutional neural network; then, simulating and generating the characteristics of the training image according to the description information of the category by generating a network, and enabling the generated characteristics to simulate the real characteristics learned by the convolutional neural network by using a confrontation discriminator; inputting description information of invisible categories through a well-learned generation network to generate simulated invisible category image features; and finally, training a zero-instance image classifier by utilizing the simulated invisible class image features and the real visible class image features, thereby realizing the image recognition of the invisible class. The disadvantages of this method are as follows: because the class description information is usually not enough to fully describe the visual image of the class, compared with the image features learned by the convolutional neural network, the invisible class image features generated by the countermeasure network through the class description information are not complete, so that the simulated features and the real features have a large difference, and the zero-instance image classifier trained by using the simulated invisible class image features cannot achieve good performance in a real zero-instance scene. Meanwhile, the zero-instance image classifier is trained by utilizing simulated image features, the classification boundary of the zero-instance image classifier is greatly influenced by the sample class proportion, and the distribution information of invisible classes cannot be obtained by the zero-instance image classification method based on the generation countermeasure network, so that the error of the classification boundary is caused.

And the second method comprises the following steps: zero instance image classification method based on discriminant network. The approximate working principle of the zero instance image classification method based on the discrimination network is as follows: firstly, extracting the characteristics of a training image through a convolutional neural network; then measuring the distance between the image feature and the category semantic representation through a measurement model; and finally, selecting the category with the closest distance as the category label of the image. The disadvantages of this method are as follows: since the class semantic representation is usually obtained by text training without invisible class training samples or professional attributes collected by experts, the distribution of the class semantic representation is significantly different from the distribution of the class features in the visual information, resulting in great difficulty in matching the image features with the class semantics. Secondly, since the training images belong to the visible category and the test images belong to the invisible category, distribution differences exist between the training images and the test images, and negative effects are caused on zero-instance image classification performance.

Disclosure of Invention

In view of the above, it is actually necessary to provide a double-batch standardized zero-instance image classification method, a classification model learning method, and a terminal, which can realize high-performance zero-instance image classification.

Inputting a to-be-trained image dataset into the feature extraction module to calculate a feature vector of each image in the to-be-trained image dataset, wherein the to-be-trained image dataset comprises labeled images with labels and unlabeled images without labels, the to-be-trained image belongs to a plurality of preset classes, the classes comprise visible classes and invisible classes, the visible classes exist in the classes represented by the labels, and the invisible classes do not exist in the classes represented by the labels; inputting preset semantic information of a plurality of categories into the full-connection module, and calculating to map the preset semantic information of the plurality of categories into a visual space to obtain a plurality of category expression vectors; splicing the feature vector of each labeled image with the plurality of category representation vectors to obtain a plurality of pairs of first representation pair vectors of each labeled image; splicing the feature vector of each unlabeled image with the plurality of category representation vectors to obtain a plurality of pairs of second representation pair vectors of each unlabeled image; carrying out batch standardization processing on the first representation pair vector and the second representation pair vector to obtain a first standard representation pair vector and a second standard representation pair vector with the same dimension; inputting each first standard representation pair vector into the full-connection module for training, and then inputting the second standard representation pair vector into the full-connection module for calculating to obtain the pairing probability of each second standard representation pair vector; and selecting the category corresponding to the category vector corresponding to the second representation pair vector with the highest pairing probability in each unlabeled image as the category of the unlabeled image and outputting the category.

Inputting an image to be classified into a double-batch standardized zero-instance image classification model, wherein the image to be classified is an unlabeled image, and the double-batch standardized zero-instance image classification model is obtained by the learning method of the double-batch standardized zero-instance image classification model; the double-batch standardized zero-instance image classification model performs class analysis on the image to be classified to obtain the class of the image to be classified; and the double batch standardized zero-instance image classification model outputs the category of the image to be classified.

In a third aspect, an embodiment of the present application provides a terminal, where the terminal device includes:

a computer readable storage medium for storing program instructions, a processor for executing the program instructions to implement a method for learning a dual batch normalized zero instance image classification model.

The learning method, the classification method and the terminal of the double-batch standardized zero-instance image classification model are used for mapping preset semantic information of multiple classes into a visual space to obtain multiple class expression vectors; therefore, a bridge from the visible class to the invisible class is established, and the knowledge migration from the visible class to the invisible class is realized. In addition, the characteristic vectors of the labeled images and the unlabeled images are respectively spliced with the plurality of class representation vectors to obtain a first representation pair vector and a second representation pair vector, so that the distribution difference between the first representation pair vector and the second representation pair vector is calibrated, the performance of double-batch standardized zero-instance image classification is remarkably improved, the untrained semantic classes can be reliably identified when various training is carried out, and the classification performance of the zero-instance images of the double-batch standardized zero-instance image classification model is improved.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the structures shown in the drawings without creative efforts.

Fig. 1 is a flowchart of a learning method of a dual batch normalized zero-instance image classification model according to an embodiment of the present application.

Fig. 2 is a first sub-flowchart of a learning method of a dual batch normalized zero-instance image classification model according to an embodiment of the present application.

Fig. 3 is a second sub-flowchart of a learning method of a dual batch normalized zero-instance image classification model according to an embodiment of the present application.

Fig. 4 is a third sub-flowchart of a learning method of a dual batch normalized zero-instance image classification model according to an embodiment of the present application.

Fig. 5 is a schematic internal structural diagram of an initial neural network of a learning method of a double batch normalized zero-instance image classification model according to an embodiment of the present application.

Fig. 6 is a schematic diagram of a pairing method of a learning method of a double-batch normalized zero-instance image classification model according to an embodiment of the present application.

Fig. 7 is a schematic diagram illustrating a splicing of a labeled image feature vector and a plurality of class representation vectors in a learning method of a double-batch normalized zero-instance image classification model according to an embodiment of the present application.

Fig. 8 is a schematic diagram illustrating a splicing of a unlabeled image feature vector and a plurality of class representation vectors in a learning method of a double-batch normalized zero-instance image classification model according to an embodiment of the present application.

Fig. 9 is a flowchart of a double batch normalization zero instance classification method according to an embodiment of the present application.

Fig. 10 is a schematic internal structure diagram of a terminal according to an embodiment of the present application.

The implementation, functional features and advantages of the objectives of the present application will be further explained with reference to the accompanying drawings.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

The terms "first," "second," "third," "fourth," and the like in the description and in the claims of the present application and in the above-described drawings (if any) are used for distinguishing between similar items and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances, in other words that the embodiments described are to be practiced in sequences other than those illustrated or described herein. Moreover, the terms "comprises," "comprising," and any other variation thereof, may also include other things, such as processes, methods, systems, articles, or apparatus that comprise a list of steps or elements is not necessarily limited to only those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such processes, methods, articles, or apparatus.

It should be noted that the descriptions in this application referring to "first", "second", etc. are for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include one or more of that feature. In addition, technical solutions between various embodiments may be combined with each other, but must be realized by a person skilled in the art, and when the technical solutions are contradictory or cannot be realized, such a combination should not be considered to exist, and is not within the protection scope of the present application.

Referring to fig. 1 and fig. 5 in combination, fig. 1 is a schematic flowchart of a learning method of a dual batch normalized zero-instance image classification model according to an embodiment of the present disclosure, and fig. 5 is a schematic structural diagram of an initial neural network of the dual batch normalized zero-instance image classification method according to the embodiment of the present disclosure. The initial neural network 502 includes a feature extraction module 504 and a fully connected module 506. In this embodiment, the initial neural network 502 is trained by using training image data to obtain a double-batch standardized zero-instance image classification model, and it can be understood that in this embodiment, each module of the initial neural network 502 is trained by using training image data to determine parameters of each module, so as to obtain a trained neural network, that is, to obtain the double-batch standardized zero-instance image classification model. Wherein the image dataset to be trained comprises labeled images with labels and unlabeled images without labels, the image dataset to be trained belongs to a plurality of preset classes, the plurality of classes comprise visible classes and invisible classes, the visible classes exist in the classes represented by the labels, and the invisible classes do not exist in the classes represented by the labels. Specifically, the dual batch normalized zero instance image classification model learning method includes steps S102-S114.

Step S102, inputting the image data set to be trained into a feature extraction module to calculate and obtain the feature vector of each image in the image data set to be trained. It can be understood that the present embodiment extracts image feature vectors of both the labeled image and the unlabeled image. The feature extraction module 504 adopts a Resnet101 convolutional neural network architecture. In some possible embodiments, the feature extraction module 504 may also adopt other convolutional neural network architectures that can extract image features, such as: resnet50 convolutional neural network architecture, Resnet34 convolutional neural network architecture, and the like.

And step S104, inputting preset semantic information of multiple categories into the full-connection module for calculation, and mapping the preset semantic information of multiple categories into a visual space to obtain multiple category expression vectors. In the present embodiment, semantic information of a plurality of categories is preset, and is information for describing each category. The semantic information comprises semantic information of visible categories and semantic information of invisible categories. In the above embodiment, the full-connection module 506 is utilized to map the preset semantic information of multiple categories to the visual space to obtain multiple category representation vectors, so as to establish a bridge from the visible category to the invisible category, thereby implementing knowledge migration from the visible category to the invisible category.

And step S106, splicing the feature vector of each labeled image and the plurality of category expression vectors to obtain a plurality of pairs of first expression pair vectors of each labeled image. Specifically, there is one feature vector for each labeled image, and one representation vector for each category. The plurality of pairs of first expression vectors means that each tag image includes a plurality of pairs of paired feature vectors and category expression vectors, that is, each tagged image is to be labeledThe feature vector of the label image is associated with the expression vectors of a plurality of categories, that is, the feature vector of each labeled image is paired with each category expression vector once, so that a plurality of pairs of paired feature vectors and category expression vectors are obtained. For example, pairs of first representation pair vectors corresponding to a tagged image are represented by Xs-Es₁，Xs-Es_2...Xs-Es_nRepresents, where Xs represents the feature vector of the tagged image, Es₁-Es_nA labeled category representation vector is represented. How to stitch the feature vectors of the labeled image with the labeled class representation vectors to obtain a plurality of pairs of first representation vectors will be described in detail below.

And step S108, splicing the feature vector of each unlabeled image and the plurality of class representation vectors to obtain a plurality of pairs of second representation pair vectors of each unlabeled image. Specifically, there is one feature vector for each unlabeled image, and one representation vector for each class. The plurality of pairs of second expression vectors means that each label image includes a plurality of pairs of paired feature vectors and category expression vectors, that is, the feature vectors of each non-label image are associated with the expression vectors of a plurality of categories, that is, the feature vectors of each non-label image are paired with the category expression vectors once, so that a plurality of pairs of paired feature vectors and category expression vectors are obtained. For example, pairs of second representation pair vectors corresponding to an unlabeled image are Xu-Eu₁，Xu-Eu_2...Xu-Eu_nRepresenting, wherein Xu represents the feature vector of the unlabeled image, Eu₁-Eu_nRepresenting unlabeled classes represents a vector. How to stitch the feature vector of the unlabeled image with the unlabeled class representation vector to obtain a plurality of pairs of second representation vectors will be described in detail below.

And step S110, carrying out batch standardization on the first representation pair vector and the second representation pair vector to obtain a first standard representation pair vector and a second standard representation pair vector with the same dimension. In the embodiment, the vectors are subjected to batch standardization on the vectors of the first representation and the vectors of the second representation, double batch standardization processing of the labeled images and the unlabeled images is realized, distribution of the vectors of the first representation and the vectors of the second representation can be normalized, and information which is easier to migrate is obtained. Specifically, the small batch in this embodiment adopts a mini-batch algorithm to perform batch normalization on the first representation pair vector and the second representation pair vector, and how to implement the batch normalization of the first representation pair vector and the second representation pair vector will be described in detail below.

Step S112, inputting each first standard representation pair vector into the fully-connected module for training, and then inputting the second standard representation pair vector into the fully-connected module to calculate the pairing probability of each second standard representation pair vector. Referring to fig. 6 in combination, for example, the image data set to be trained is an animal image data set, that is, the first standard representation pair vector of training is a standard representation pair vector for representing animals, where the first standard representation pair vector is an image feature vector representing a cat, and the category representation vector is a category representation vector representing a dog, that is, the cat and the dog are spliced together to obtain an image feature of the cat and a category representation vector of the animal, that is, the first standard representation pair vector of the dog is the image feature of the cat. It can also be said that the pair of first criteria represents a subtended quantity representing a paired vector error. For another example, the image feature vector in the first standard representation pair vector is a feature vector representing a dog, and the category representation vector is a category representation vector representing the dog, that is, the dog with the tagged animal image feature vector and the dog with the tagged animal category representation vector are spliced together to obtain the first standard representation pair vector of the image feature of the dog-the category representation vector of the dog. It can also be said that the pair of first criteria indicates that the subtended quantity indicates that the pair of vectors is correctly paired. In this embodiment, the matching probability of the full-connection module 506 is calculated by using the labeled image, so that the full-connection network has a certain prediction capability on the matching probability, and then is used for calculating the matching probability of the second representation pair vector, so that the matching probability of the more accurate feature vector and the more accurate category representation vector of the unlabeled image can be obtained. For example, the second criterion indicates that the pair vectors are animal criterion pair vectors, one of the second criteria indicates that the feature vector of the unlabeled image in the pair vectors is a elephant, the category representation vector of the unlabeled image is a elephant, a first probability is obtained when the feature vector of the unlabeled image in the pair vectors is paired with the unlabeled animal category representation vector of the elephant, one of the second criteria indicates that the feature vector of the unlabeled image in the pair vectors is a elephant, the category representation vector of the unlabeled image is a zebra, and a second probability is obtained when the feature vector of the unlabeled image in the pair vectors is paired with the category representation vector of the zebra, where the first probability is greater than the second probability.

Step S114, selecting the category corresponding to the category representation vector corresponding to the second representation pair vector with the highest pairing probability in each non-label image as the category of the non-label image and outputting the category. It can be understood that the second expression pair vector with the highest pairing probability in one unlabeled image refers to one pair of second expression pair vectors in the plurality of pairs of second expression vectors of the unlabeled image, in other words, the class expression vector is the closest match to the image feature, and therefore, the class expression vector is most likely to be the exact class expression vector of the unlabeled image, and therefore, the class corresponding to the class expression vector is taken as the class of the unlabeled image. For example, pairs of second criteria for an unlabeled image represent pair vectors Xs-Es_{1。。。。。。}Xs-Es_n。The second standard representing the pair vector Xs-Es with the highest probability of medium pairing_2，The second criterion representation pair vector Xs-Es is output₂The category in (2) represents the category to which the vector corresponds.

Referring to fig. 2 and fig. 7 in combination, fig. 2 is a schematic view of a first sub-flow of a dual batch normalized zero-instance image classification method according to an embodiment of the present application, and fig. 7 is a schematic view of a concatenation of a labeled image feature vector and a plurality of class representation vectors in the dual batch normalized zero-instance image classification method according to the embodiment of the present application. Step S106 specifically includes steps S202-S204.

And S202, copying N parts of the feature vectors of each labeled image, wherein the N parts represent the number of the vectors in the visible class. It is understood that the visible class category indicates that the number of vectors is 3, and that each feature vector of the labeled image is copied 3. For example, the visible class type representation vector is a cat, dog and pig type representation vector, and the feature vector of the labeled image is cat a. Copy 3 copies of the feature vector with cat a tagged image.

And step S204, splicing the N characteristic vectors of each labeled image and the plurality of category expression vectors one by one to obtain a plurality of pairs of first expression pair vectors. For example, if there are 3 feature vectors with cat a labeled images, the visible category representation vectors are represented by three categories, namely cat, dog and pig. Splicing the category representation vector of the cat behind the first characteristic vector with a cat A1 label image to form an image cat representation pair vector; stitching the category representation vector of the dog behind the feature vector of the second cat A2 label image to form an image cat and dog representation pair vector; stitching the pig category representation vector behind the third feature vector with cat a3 image forms an image cat pig representation pair vector.

Referring to fig. 3 and fig. 8 in combination, fig. 3 is a schematic diagram of a second sub-flow of a dual batch normalized zero-instance image classification method according to an embodiment of the present application, and fig. 8 is a schematic diagram of a concatenation of a non-label image feature vector and a plurality of class representation vectors in the dual batch normalized zero-instance image classification method according to the embodiment of the present application. Step S108 specifically includes steps S302-S304.

Step S302, copying M parts of the feature vectors of each unlabeled image, wherein M is the number of vectors represented by invisible classes. It is understood that the invisible class category indicates that the number of vectors is 2, and the feature vector of each unlabeled image is copied by 2. For example, the invisible class type representation vector includes two types of class representation vectors, namely fox and wolf, and the feature vectors of the unlabeled image are wolf B and fox C. Copying 2 parts of the characteristic vector of the wolf B non-label image, and copying 2 parts of the characteristic vector of the fox C non-label image.

Step S304, splicing the M characteristic vectors of each label-free image and the plurality of category expression vectors one by one to obtain a plurality of pairs of second expression pair vectors. For example, if there are feature vectors of 2 parts of wolf B no-label images and feature vectors of 2 parts of fox C no-label images, the invisible class representation vector has two classes of fox and wolf representation vectors. Splicing the class representation vector of the fox behind the feature vector of the first wolf B1 unlabeled image to form an image fox representation pair vector; stitching the category representation vector of the wolf behind the unlabeled image feature vector of the second wolf B2 to form an image wolf representation pair vector; splicing the class representation vector of the fox behind the feature vector of the first fox C1 unlabeled image to form an image fox representation pair vector; stitching the class representation vector of wolf behind the feature vector of the second fox C2 unlabeled image forms the image fox representation pair vector.

Please refer to fig. 4, which is a third sub-flowchart of a dual batch normalization zero-instance image classification method according to an embodiment of the present application. Step S110 specifically includes steps S402-S410.

Step S402, calculating the mean value and the variance of each dimension feature of the first representation pair vector to obtain a first mean value and a first variance. It can be understood that, in this embodiment, the first representative pair vector of the small batch is extracted, and the average value of the features of each dimension of the first representative pair vector of the small batch is calculated to obtain the first average value. And calculating the variance of each dimension feature of the first representation pair vector of the small batch to obtain a first variance.

In step S404, the normalized first representative paired vector is obtained by subtracting the first mean value from the first representative paired vector and dividing the result by the first variance. It is understood that the first mean and the first variance are obtained by calculating the first representation pair vector of the small lot in the present embodiment. And subtracting a first mean value obtained by calculating the small-batch first representation pair vectors from the first representation pair vectors, and dividing the first mean value by a first variance obtained by calculating the small-batch first representation pair vectors to obtain the normalized first representation pair vectors. The process can be formulated as

(formula one). Where X1 is the normalized first representation pair vector, is a single first representation pair vector,

is the first average value of the first average value,

is the first variance.

Step S406, calculating the mean and variance of the feature of each dimension of the second representation pair vector to obtain a second mean and a second variance. It can be understood that, in this embodiment, the second representative pair vector of the small batch is extracted, and the average value of the features of each dimension of the second representative pair vector of the small batch is calculated to obtain the second average value. And calculating the variance of the second representation pair vector of the small batch to obtain a second variance.

In step S408, the normalized second expression pair vector is obtained by subtracting the second mean from the second expression pair vector and dividing by the second variance. It is to be understood that the second mean and the second variance are obtained by calculating the second representation pair vector of the small lot in the present embodiment. And subtracting a second mean value obtained by calculating the small batch of second representation pair vectors from the second representation pair vectors, and dividing the second mean value by a second variance obtained by calculating the small batch of second representation pair vectors to obtain normalized second representation pair vectors. The process can be formulated as

(formula two). Where X2 is the normalized second representation pair vector, is a single second representation pair vector,

is the second average value of the first average value,

is the second variance.

In step S410, the normalized first expression pair vector and the normalized second expression pair vector are linearly transformed to obtain a batch normalized first standard expression pair vector and a batch normalized second standard expression pair vector. It is to be understood that in the present embodiment, normalizing the first representation pair vector with the second pair representation pair vector is performed by linear transformation at the fully-connected module 506 to obtain a batch normalized first canonical representation pair vector and a batch normalized second canonical representation pair vector.

In the above embodiment, the first mean and the first variance of each dimension of the first representation pair vector and the second mean and the second variance of each dimension of the second representation pair vector are calculated, and the first representation pair vector and the second representation pair vector are normalized. And then the normalized first representation pair vector and the normalized second representation pair vector are subjected to linear transformation to obtain a batch normalized first standard representation pair vector and a batch normalized second standard representation pair vector. The distribution difference between the first representation pair vector and the second representation pair vector is calibrated, and the performance of the double-batch standardized zero-instance image classification is remarkably improved.

Please refer to fig. 9, which is a flowchart illustrating a dual batch normalization zero-instance image classification method. Specifically, the dual batch normalized zero instance image classification method includes steps S902-S906.

Step S902, inputting the image to be classified into a double batch standardized zero-instance image classification model. The double-batch standardized zero-instance image classification model is obtained by training a preset neural network by adopting the learning method of the double-batch standardized zero-instance image classification model. The image to be classified is a label-free image.

And step S904, identifying the classified images by using a double batch standardized zero-instance image classification model to obtain the types of the images to be classified. It can be understood that the images to be classified are paired by the above-mentioned pairing method of the representative pairs and the type of the image to be classified with the highest pairing probability is identified.

Step S906, the category of the image to be classified is output.

Please refer to fig. 10 in combination, which is a schematic diagram of an internal structure of a terminal for performing the zero-instance image classification model learning method for dual batch normalization according to an embodiment of the present application. The terminal 10 includes a computer-readable storage medium 11, a processor 12, and a bus 13. The computer-readable storage medium 11 includes at least one type of readable storage medium, which includes a flash memory, a hard disk, a multimedia card, a card-type memory (e.g., SD or DX memory, etc.), a magnetic memory, a magnetic disk, an optical disk, and the like. The computer readable storage medium 11 may in some embodiments be an internal storage unit of the terminal 10, such as a hard disk of the terminal 10. The computer readable storage medium 11 may in other embodiments be an external terminal 10 storage device, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), etc. provided on the terminal 10. Further, the computer-readable storage medium 11 may also include both an internal storage unit and an external storage device of the terminal 10. The computer-readable storage medium 11 may be used not only to store application software and various types of data installed in the terminal 10 but also to temporarily store data that has been output or will be output.

The bus 13 may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown in FIG. 10, but this is not intended to represent only one bus or type of bus.

Further, the terminal 10 may also include a display assembly 14. The display component 14 may be a Light Emitting Diode (LED) display, a liquid crystal display, a touch-sensitive liquid crystal display, an Organic Light-Emitting Diode (OLED) touch panel, or the like. The display component 14 may also be referred to as a display device or display unit, as appropriate, for displaying information processed in the terminal 10 and for displaying a visual user interface, among other things.

Further, the terminal 10 may also include a communication component 15. The communication component 15 may optionally include a wired communication component and/or a wireless communication component, such as a WI-FI communication component, a bluetooth communication component, etc., typically used to establish a communication connection between the terminal 10 and other intelligent control devices.

The processor 12 may be, in some embodiments, a Central Processing Unit (CPU), controller, microcontroller, microprocessor or other data Processing chip for executing program codes stored in the computer-readable storage medium 11 or Processing data. Specifically, the processor 12 executes a process to control the terminal 10 to perform a dual batch normalized zero-instance image classification model learning method. It is to be understood that fig. 10 illustrates only the terminal 10 having the components 11-15 and the dual batch normalized zero instance image classification model learning method, and those skilled in the art will appreciate that the configuration illustrated in fig. 10 is not intended to be limiting of the terminal 10 and may include fewer or more components than illustrated, or some components in combination, or a different arrangement of components.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present application without departing from the spirit and scope of the application. Thus, to the extent that such modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, it is intended that the present application also encompass such modifications and variations.

The above-mentioned embodiments are only examples of the present invention, and the scope of the claims of the present invention should not be limited by these examples, so that the claims of the present invention should be construed as equivalent and still fall within the scope of the present invention.

Claims

1. A learning method of a double-batch standardized zero-instance image classification model is used for learning an initial neural network to obtain the double-batch standardized zero-instance image classification model, and is characterized in that the initial neural network comprises a feature extraction module and a full connection module, and the learning method of the double-batch standardized zero-instance image classification model comprises the following steps:

inputting an image data set to be trained into the feature extraction module to calculate a feature vector of each image in the image data set to be trained, wherein the image data set to be trained comprises labeled images with labels and unlabeled images without labels, the images in the image data set to be trained belong to a plurality of preset classes, the classes comprise visible classes and invisible classes, the visible classes exist in the classes represented by the labels, and the invisible classes do not exist in the classes represented by the labels;

inputting preset semantic information of multiple categories into the full-connection module for calculation, and mapping the preset semantic information of the multiple categories into a visual space to obtain multiple category expression vectors;

splicing the feature vector of each labeled image with the plurality of category representation vectors to obtain a plurality of pairs of first representation pair vectors of each labeled image;

splicing the feature vector of each unlabeled image with the plurality of category representation vectors to obtain a plurality of pairs of second representation pair vectors of each unlabeled image;

carrying out batch standardization processing on the first representation pair vector and the second representation pair vector to obtain a first standard representation pair vector and a second standard representation pair vector with the same dimension;

inputting each first standard representation pair vector into the full-connection module for training, and then inputting the second standard representation pair vector into the full-connection module for calculating to obtain the pairing probability of each second standard representation pair vector; and

and selecting the category corresponding to the category vector corresponding to the second representation pair vector with the highest pairing probability in each unlabeled image as the category of the unlabeled image and outputting the category.

2. The method for learning the dual batch normalized zero-instance image classification model according to claim 1, wherein the stitching the feature vector of each labeled image with the plurality of class representation vectors to obtain a plurality of pairs of first representation vectors of each labeled image comprises:

copying N parts of the feature vectors of each labeled image, wherein the N parts are the number of the visible class type representation vectors; and

and splicing the N characteristic vectors of each labeled image and the plurality of category representation vectors one by one to obtain a plurality of pairs of first representation pair vectors.

3. The method for learning the dual batch normalized zero-instance image classification model according to claim 1, wherein the stitching the feature vector of each unlabeled image with the class representation vectors to obtain pairs of second representation vectors of each unlabeled image comprises:

copying M parts of the feature vectors of each unlabeled image, wherein M is the number of vectors represented by invisible classes; and

and splicing the M characteristic vectors of each label-free image and the plurality of category representation vectors one by one to obtain a plurality of pairs of second representation pair vectors.

4. The method of learning the dual batch normalized zero-instance image classification model as claimed in claim 1, further comprising:

and carrying out batch standardization on the first representation pair vector and the second representation pair vector to obtain a first standard representation pair vector and a second standard representation pair vector with the same dimension.

5. The method for learning the dual-batch normalized zero-instance image classification model as claimed in claim 4, wherein the step of performing the batch normalization on the first representation pair vector and the second representation pair vector to obtain the first standard representation pair vector and the second standard representation pair vector with the same dimension specifically comprises:

calculating the mean value and the variance of each dimension characteristic of the first representation pair vector to obtain a first mean value and a first variance;

subtracting the first mean value from the first representation vector, and dividing the first mean value by the first variance to obtain a normalized first representation vector;

calculating the mean value and the variance of each dimension characteristic of the second representation pair vector to obtain a second mean value and a second variance;

subtracting the second mean value from the second expression pair vector, and dividing the second mean value by the second variance to obtain a normalized second expression pair vector; and

and linearly transforming the normalized first representation pair vector and the normalized second representation pair vector to obtain a batch of normalized first standard representation pair vectors and second standard representation pair vectors.

6. The method as claimed in claim 1, wherein the feature extraction module includes a plurality of hidden layers, and the step of inputting the image data set to be trained into the feature extraction module to calculate the feature vector of each image in the image data set to be trained specifically includes:

obtaining an output of a last hidden layer of the plurality of hidden layers; and

and obtaining the characteristic vector of each image in the image data set to be trained according to the output statistics of the last hidden layer.

7. The method of learning the dual batch normalized zero-instance image classification model of claim 6, wherein the feature extraction module employs a Resnet101 convolutional neural network architecture.

8. The method of learning the dual batch normalized zero-instance image classification model of claim 1, wherein the fully-connected module comprises two fully-connected layers with activation functions.

9. A double-batch standardized zero-instance image classification method is characterized by comprising the following steps:

inputting an image to be classified into a double-batch standardized zero-instance image classification model, wherein the image to be classified is a label-free image, and the double-batch standardized zero-instance image classification model is obtained by the learning method of the double-batch standardized zero-instance image classification model according to any one of claims 1 to 8;

the double-batch standardized zero-instance image classification model performs class analysis on the image to be classified to obtain the class of the image to be classified; and

and the double batch standardized zero-instance image classification model outputs the category of the image to be classified.

10. A terminal, characterized in that the terminal device comprises:

a computer readable storage medium for storing program instructions; and

a processor for executing the program instructions to implement the learning method of the dual batch normalized zero-instance image classification model of any one of claims 1 to 8.