CN112069981A

CN112069981A - Image classification method and device, electronic equipment and storage medium

Info

Publication number: CN112069981A
Application number: CN202010916955.7A
Authority: CN
Inventors: 孔翰; 程文龙; 叶志凌
Original assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Current assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Priority date: 2020-09-03
Filing date: 2020-09-03
Publication date: 2020-12-11

Abstract

The application discloses an image classification method, an image classification device, electronic equipment and a storage medium, and relates to the technical field of image processing. Wherein, the method comprises the following steps: acquiring a graph group feature vector and an environment feature vector of a graph group to be classified, wherein the graph group comprises a plurality of images, the graph group feature vector is obtained by fusing image feature vectors of all the images in the graph group, and the environment feature vector is obtained by fusing environment information of all the images in the graph group during shooting; fusing the image group feature vector and the environment feature vector to obtain a fused feature vector; and determining the category of the graph group according to the graph group feature vector, the environment feature vector and the fusion feature vector to obtain a more accurate classification result.

Description

Image classification method and device, electronic equipment and storage medium

Technical Field

The present application relates to the field of image processing technologies, and in particular, to an image classification method and apparatus, an electronic device, and a storage medium.

Background

When a plurality of images exist, some images may belong to a category because of certain similarity, so that the images can be classified, and the user can conveniently view the images. The general classification method does not fully utilize the relevant characteristics of the images, and the classification accuracy is not high.

Disclosure of Invention

In view of the above problems, the present application provides an image classification method, apparatus, electronic device and storage medium to improve the above problems.

In a first aspect, an embodiment of the present application provides an image classification method, where the method includes: acquiring a graph group feature vector and an environment feature vector of a graph group to be classified, wherein the graph group comprises a plurality of images, the graph group feature vector is obtained by fusing image feature vectors of all the images in the graph group, and the environment feature vector is obtained by fusing environment information of all the images in the graph group during shooting; fusing the image group feature vector and the environment feature vector to obtain a fused feature vector; and determining the category of the graph group according to the graph group feature vector, the environment feature vector and the fusion feature vector.

In a second aspect, an embodiment of the present application provides an image classification apparatus, including: the image classification device comprises a vector acquisition module, a classification module and a classification module, wherein the vector acquisition module is used for acquiring image group feature vectors and environment feature vectors of an image group to be classified, the image group comprises a plurality of images, the image group feature vectors are obtained by fusing image feature vectors of all the images in the image group, and the environment feature vectors are obtained by fusing environment information when all the images in the image group are shot; the fusion module is used for fusing the image group feature vector and the environment feature vector to obtain a fusion feature vector; and the classification module is used for determining the category of the graph group according to the graph group feature vector, the environment feature vector and the fusion feature vector.

In a third aspect, an embodiment of the present application provides an electronic device, including: one or more processors; a memory; one or more programs stored in the memory and configured to be executed by the one or more processors, the one or more programs being executed by the processors for performing the methods described above.

In a fourth aspect, the present application provides a computer-readable storage medium, in which a program code is stored, and the program code can be called by a processor to execute the above method.

According to the image classification method, the image classification device, the electronic equipment and the storage medium, for the image group to be classified, the image group feature vector obtained by fusing the feature vectors of the images in the image group and the environment feature vector obtained by fusing the environment information when the images in the image group are shot are obtained. The image group feature vector and the environment feature vector are fused to obtain a fusion feature vector, and the class of the group is determined according to the environment feature vector, the image group feature vector and the fusion feature vector, so that the features of the image and the features related to the environment condition when the image is shot are fully mined, and the classification of the image group is determined by utilizing the features and the fusion of the features, so that the classification result is more accurate.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 shows a flowchart of an image classification method according to an embodiment of the present application.

Fig. 2 shows a flowchart of an image classification method according to another embodiment of the present application.

Fig. 3 illustrates an album display diagram provided in an embodiment of the present application.

Fig. 4 shows a network structure for implementing the image classification method provided in the embodiment of the present application.

Fig. 5 shows a flowchart of an image classification method according to another embodiment of the present application.

Fig. 6 shows a clustering diagram provided in an embodiment of the present application.

Fig. 7 is a functional block diagram of an image classification apparatus according to an embodiment of the present application.

Fig. 8 shows a block diagram of an electronic device according to an embodiment of the present application.

Fig. 9 is a storage unit for storing or carrying program code for implementing a method according to an embodiment of the present application.

Detailed Description

In order to make the technical solutions better understood by those skilled in the art, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application.

Classifying images facilitates the viewing of images when there are a large number of images. The classification is to determine images of respective classes from a large number of images and to identify the class to which the images of the respective classes specifically belong. For example, in an album of an electronic device such as a mobile phone or a tablet computer, a large number of photos may be stored, and it is inconvenient for a user to find the photos from a messy list of photos if the user wants to view the photos of a particular event. However, if the stored photos are classified, the photos in each category are distinguished, and the photos are classified and displayed, the user wants to check the photos in a certain category and directly check the photos according to the classification result, which is convenient and fast. If different events are taken as different categories of photos, when a user wants to view photos of a certain event, the user can directly view the photos corresponding to the classified event.

Generally, a category of photographs, from different aspects, have common features; there may be some differences between photos of the same category for one of the aspects. Therefore, when determining the specific category to which a photo of a certain category belongs, the related features of the image can be fully utilized, and the related features can be features with larger similarity of the same category and larger difference of different categories.

The inventors found through research that images of the same category obtained by shooting have similarity in image characteristics and also in environmental information at the time of shooting. Therefore, the image classification method, the image classification device, the electronic device and the storage medium provided by the embodiment of the application are provided, the specific class of the image group to be classified is identified according to the image group feature vector and the environment feature vector of the image group to be classified and the combination of the image group feature vector and the environment feature vector, and the accuracy of image classification is improved. The following describes in detail an image classification method, an apparatus, an electronic device, and a storage medium provided in embodiments of the present application with specific embodiments.

Referring to fig. 1, an image classification method provided in the embodiment of the present application is shown. The method can be used for electronic equipment which can be terminal equipment such as a mobile phone, a computer, a tablet personal computer and intelligent wearable equipment and can also be cloud equipment such as a server. If the terminal equipment is the terminal equipment, the terminal equipment can classify the images shot by the terminal equipment and can also acquire the images shot by other equipment for classification; if the cloud equipment is adopted, the images shot by the terminal equipment can be obtained and classified. The embodiments of the present application mainly use a terminal device as an example for description. Specifically, the method may include the following steps.

Step S110: and acquiring a graph group feature vector and an environment feature vector of a graph group to be classified. The image group comprises a plurality of images, the image group feature vectors are obtained by fusing the image feature vectors of the images in the image group, and the environment feature vectors are obtained by fusing the environment information of the images in the image group when the images are shot.

In the embodiment of the application, one graph group comprises a plurality of images, and the images in the graph group are images of the same category. The image group to be classified is a group to which a specific category needs to be determined, and the specific category of the image group is also a specific category of all images in the image group. The specific category can be represented by the unique identifier of the category, for example, the unique identifier of the category is represented by the name of the category, and the specific category of the graph group is determined, that is, the name of the category of the graph group is determined. The embodiments of the present application take the unique id whose name represents a category as an example for explanation.

For the graph group to be classified, vectors obtained by fusing image feature vectors of the images in the graph group can be obtained, and the vectors obtained by fusing the image feature vectors of the images are defined as the graph group feature vectors.

In addition, in the embodiment of the present application, the type of the image obtained by shooting is determined, and each image in the group of images is the image obtained by shooting, and the environment information at the time of shooting each image is correspondingly recorded. For the graph groups to be classified, vectors obtained by fusing the environmental information of the images during shooting can be obtained, and the vectors obtained by fusing the environmental information of the images during shooting are defined as environmental feature vectors.

Step S120: and fusing the image group feature vector and the environment feature vector to obtain a fused feature vector.

And obtaining a graph group feature vector and an environment feature vector of the graph group, and obtaining the two vectors to obtain a new vector. In the embodiment of the present application, for convenience of description, a new vector obtained by fusing a feature vector of a set of images and an environment feature vector is defined as a fused feature vector.

Step S130: and determining the category of the graph group according to the graph group feature vector, the environment feature vector and the fusion feature vector.

In the embodiment of the application, the image group feature vector reminds the features of each image in the image group, the environment feature vector reflects the environment features of each image during shooting, and the fusion feature vector reflects the combined action of the image features and the environment features during shooting. The specific category of the image group can be determined through the three acquired vectors, so that the related information of the image is fully utilized, the basis for determining the category of the image group is enriched, and the accuracy of classification is improved.

The image classification method provided by another embodiment of the present application describes a specific determination manner of the category. Referring to fig. 2, the image classification method provided in this embodiment includes the following steps.

Step S210: the method comprises the steps of obtaining a graph group feature vector and an environment feature vector of a graph group to be classified, wherein the graph group comprises a plurality of images, the graph group feature vector is obtained by fusing image feature vectors of all the images in the graph group, and the environment feature vector is obtained by fusing environment information when all the images in the graph group are shot.

In the embodiment of the present application, the manner of obtaining the image group feature vector of the image group may be to extract image features of each image in the image group to obtain image feature vectors respectively corresponding to each image; and fusing the obtained image feature vectors to obtain the image group feature vectors.

That is, image features may be extracted for each image in the group of images, and an image feature vector may be generated. The image group comprises a plurality of images, a plurality of image feature vectors can be obtained, the obtained image feature vectors are fused into one vector, and the fused vector is used as the feature vector corresponding to the image group and named as the image group feature vector.

Optionally, in order to make the classification more accurate, image feature vectors of all images in the image group may be fused to obtain an image group feature vector.

Optionally, in this embodiment of the application, a partial image may also be selected from the group of images for determining the group of images. Specifically, feature vectors of the selected image can be obtained, and the obtained feature vectors are fused into one feature vector to serve as a map group feature vector.

In the embodiment of the present application, image features may be extracted through a feature extraction algorithm, and the extracted image features are represented by vectors, for example, feature vectors described by histograms or statistical features thereof; a feature vector described by a gray co-occurrence matrix.

In the embodiment of the present application, the image features may also be extracted by a neural network model, such as a convolutional neural network (CNN network). The neural network model has the capability of extracting image characteristics through training, for example, the neural network model is trained through images in a large visual database ImageNet used for visual object recognition software research and characteristic vectors corresponding to image labels.

And after the characteristic vectors of all the images in the image group are obtained, fusing the characteristic vectors of all the images through a fusion algorithm.

Optionally, in this embodiment of the present application, the feature vectors of the respective images may be spliced together for fusion.

Optionally, in this embodiment of the present application, the feature vectors may be fused through a neural network. In addition, the atlas feature vector output in this step may be used as the input of the step that needs to input the atlas feature vector, so that a guided fusion method may be adopted to implement end-to-end training and identification, such as by max pooling (maxporoling), Recurrent Neural Network (RNN), NetVLAD, and the like.

In addition, the images in the group of images may be images obtained by shooting, such as images in an album of the electronic device, or images captured by other devices acquired by the electronic device.

The category corresponding to the acquisition graph group may be an event to which the acquisition graph group belongs. For each image of the same event, the environmental information of the shooting place has similarity when shooting, for example, the time difference of shooting the images is small, the geographic positions of the shooting images are close, the climate conditions when shooting the images are similar, for example, the humidity, the temperature, the illumination intensity and the like are similar, and the sound characteristics of the surrounding sound are similar when shooting. So that images belonging to the same category can be determined by the similarity of the environmental information.

Correspondingly, when the image is shot, the environment information at the time of shooting, such as one or more of the geographical position at the time of shooting, the time at the time of shooting, the ambient temperature, the humidity, the illumination intensity, the sound characteristics and the like, can be recorded corresponding to the image. Various environment information can be obtained through corresponding detection equipment in the electronic equipment, such as obtaining the geographic position and the time through a positioning system of the electronic equipment, such as obtaining the geographic position and the time through a GPS (global positioning system); acquiring temperature through a temperature sensor; acquiring the humidity in the environment through a humidity sensor; acquiring the illumination intensity of the environment through an illumination sensor; sounds in the environment are picked up by a microphone to extract sound features and the like.

In determining the category of the group of images, the environmental information of the respective images in the group of images may be combined. In the embodiment of the application, the environment information of each image in the image group can be acquired during shooting, and the acquired environment information is fused into one feature vector to serve as the environment feature vector.

Optionally, in this embodiment of the application, different types of environment information of each image in the group for obtaining the environment information vector may be arranged in the same manner, and then the environment information of each image is spliced to obtain the environment information vector.

Optionally, in this embodiment of the present application, Principal Component Analysis (PCA) may be performed on the environmental information of each image in the graph group, which is used for obtaining the environmental information vector, to implement compression and dimension reduction of the environmental information, remove useless noise, and extract statistical information of each image, for example, the environmental information of the images is multiple and is distributed in different dimensions, and entropy distributed in different dimensions, a range of different dimensions, a product of dimension axes of different dimensions, a ratio of dimension axes of different dimensions, a variance of different dimensions, and the like may be counted, and the statistical information is combined into a statistical feature vector as the environmental information vector. For example, the environment information is time and a geographic position represented by the GPS information, the GPS information includes longitude and latitude, the environment information is distributed in three dimensions respectively corresponding to the longitude, the latitude and the time, and the statistical information may be obtained by one or more of entropy of three-dimensional spatial distribution, a range of the three dimensions, a product of three dimensional axes, a ratio of the three dimensional axes, a variance of the three dimensional axes, and the like, or other more features.

In this embodiment of the present application, all the environment information of the recorded image may be acquired as the environment information for acquiring the environment feature vector, and some types of environment information may also be acquired, which is not limited in this embodiment of the present application.

In the embodiment of the present application, all images in the group of images may be used as images for obtaining the environment information vector; a partial image may also be selected from the set of images as the image from which the context information vector is obtained. Wherein the image used for obtaining the environment information vector is the same as the image used for obtaining the atlas feature vector.

Step S220: and fusing the image group feature vector and the environment feature vector to obtain a fused feature vector.

The image feature vector and the environment feature vector may affect the determination of the category together, and in the embodiment of the present application, the image group feature vector and the environment feature vector may be fused to obtain a new feature vector.

In the embodiment of the present application, the specific fusion manner is not limited, for example, the feature vectors of the graph group and the feature vectors of the environment may be directly spliced together; or by other fusion algorithms or neural network algorithms.

Step S230: and inputting the atlas feature vector, the environment feature vector and the fusion feature vector into a classification model, wherein the classification model is used for determining the category of the image information input into the classification model.

Step S240: and acquiring the class output by the classification model as the class corresponding to the graph group.

In the embodiment of the application, when the class to which the graph group belongs is determined according to the graph group feature vector, the environment feature vector and the fusion feature vector, the class to which the similar graph group belongs may be determined through a classification model. The classification model may be a trained neural network model having a classification capability, and may determine a category to which the image information input thereto belongs, and use the category of the image information as a category of the image group corresponding to the image information. The image information input into the classification model may include a atlas feature vector obtained by fusing a plurality of image feature vectors, an environment feature vector obtained by fusing a plurality of environment information, and a fused feature vector obtained by fusing the image feature vector and the environment feature vector.

In this embodiment, when the classification model determines the class to which the graph group belongs, the image information input to the classification model may include a graph group feature vector, an environment feature vector, and a fusion feature vector in the graph group, and the classification model is used to determine the class to which the image information input thereto belongs. If the image group to be classified comprises an image A, an image B, an image C and an image D, extracting image features of the image A, the image B, the image C and the image D and fusing the image features to obtain a group feature vector; fusing environment information obtained when the image A, the image B, the image C and the image D are shot to obtain an environment characteristic vector; fusing the image group feature vector and the environment feature vector to obtain a fused feature vector; and inputting the image group feature vector, the environment feature vector and the fusion feature vector obtained according to the image A, the image B, the image C and the image D into a classification model, obtaining output corresponding to the classification model, and taking the class determined by the output as the class to which the image group comprising the image A, the image B, the image C and the image D belongs.

When the classification model determines a class to which image information input thereto belongs, the class of the image information is determined as one of a plurality of classes learned when the classification model is trained in advance, or as not belonging to any one of the classes, that is, classification is not achieved. For example, when the classification model is trained, the learned categories thereof include a category a, a category B, a category C, and a category D, and when the classification model is used to determine the category of the image information, the category of the image information is determined as one of the 4 categories or does not belong to any of the four categories.

That is, when the specific class of the atlas is determined by the trained classification model, the image information corresponding to the atlas may be input into the classification model. And then acquiring the class output by the classification model as the class corresponding to the graph group. If the classification model outputs a classification of wedding, the classification represents that the specific belonging classification of the map group is wedding; the classification model outputs a classification of "travel", which indicates that the classification to which the graph group specifically belongs is travel.

In the embodiment of the present application, training the classification model may be further included. Specifically, a graph group of known classes may be used as a training sample, the class name of the training sample is labeled, a plurality of training samples are labeled with corresponding class names, and the class of the image information of each training sample is the class of the training sample. When the classification model is trained, the training sample is used as a graph group to be classified, and a graph group characteristic vector and an environment characteristic vector of the training sample are obtained; and fusing the image group feature vector and the environment feature vector to obtain a fused feature vector, thereby obtaining the image information of the training sample, and inputting the image information into a classification model which is a classification model to be trained. The classification model outputs the category to which the image information belongs, and the category output by the classification model is obtained.

After the class output by the classification model is obtained, whether the class output by the classification model is the same as the labeled class can be judged, if the class output by the classification model is different from the class labeled by the training sample, the parameters of the classification model are adjusted, and the class output by the classification model is close to the class labeled by the training sample. After the parameters of the classification model are adjusted, the image information input classification model can be obtained again, whether the class output by the classification model is the same as the labeled class or not is judged, if the class output by the classification model is different from the labeled class, the parameters of the classification model are adjusted again, the class output by the classification model is close to the class labeled by the training sample, the image information input classification model of the training sample is obtained, and whether the class output by the classification model is the same as the labeled class or not is judged. And by analogy, the comparison between the output class and the labeled class and the parameter adjustment are carried out for many times until the class output by the classification model is the same as the class of the training sample.

In the embodiment of the application, the classification model can be trained through a plurality of training samples, and the parameters of the classification model are adjusted until the classification accuracy of the classification model reaches the preset accuracy, so that the classification model which can be used for classifying the graph group is obtained. The classification model inputs a first quantity of image information, the quantity of the image information which is accurately classified is a second quantity, and the accuracy of the classification model is represented by the proportion of the second quantity in the first quantity.

Optionally, in this embodiment of the application, in the training process, all parameters involved in the process of determining the class to which the training sample belongs may be adjusted synchronously with the parameters of the classification model. For example, when it is determined that the classification output by the classification model is different from the labeled classification, the parameters of the classification model are required to be adjusted, and the parameters involved in the process of acquiring the image information are adjusted at the same time, so that the acquired image information is more favorable for accurate classification, and the classification output by the classification model is close to the labeled classification.

Optionally, in this embodiment of the present application, the output of the classification model may be a vector, where the vector includes values corresponding to different dimensions, each dimension represents a category learned by the classification model, and the value corresponding to each dimension represents a probability that the graph group is the category corresponding to the dimension. If the maximum value in the vector is larger than the first preset value and other values are smaller than the second preset value, the classification output by the classification model is the classification corresponding to the dimension of the maximum value, and the first preset value is larger than or equal to the second preset value. For example, if the vector output by the classification model is (x, y, z), the probability that the dimension in which x is located corresponds to the category a, the probability that the dimension in which y is located corresponds to the category B, and the probability that the dimension in which z is located corresponds to the category C, and if the value of x is greater than the first preset value and the values of y and z are less than the second preset value, the category represented by the output can be determined to be the category a. And adjusting the parameters of the classification model if the output of the classification model is adjusted to be close to a certain class, so that the numerical value corresponding to the dimension of the class in the vector output by the classification model is increased to be larger than a first preset value, and the numerical values corresponding to other classes are reduced to be smaller than a second preset value.

Optionally, in this embodiment of the present application, a category corresponding to the graph group is determined, that is, a category to which the graph group specifically belongs, or a category name of the graph group is determined. For convenience of viewing, the respective drawing groups may be displayed in different folders, respectively, each folder storing therein an image in one drawing group, each folder being named as a name of the drawing group in which it is stored. For example, as shown in fig. 3, in the folder corresponding to "event 1" in the category album, images in the group of pictures having the category "event 1" are stored, and the folder corresponding to "event 1" is opened to see the images having the category "event 1". Similarly, the folder corresponding to "event 2" in fig. 3 stores the images in the family group whose category name is "event 2"; the folder corresponding to "event 3" in fig. 3 stores images in the family group whose category name is "event 3".

In the embodiment of the application, the classification of the graph group is determined according to the graph group feature vector, the environment feature vector and the fusion feature vector in the graph group through the trained classification model, so that the accuracy of class determination is improved.

In the embodiment of the present application, the classification model may be a neural network model, such as a Bayesian (Bayesian) model, a Decision Tree (Decision Tree) model, or the like.

In the embodiments of the present application, the classification model may also be formed by combining a plurality of models or networks with classification capability. Specifically, as shown in fig. 4, a network structure for identifying a graph group in the embodiment of the present application is shown, and as shown in fig. 4, the classification model may include a first sub-classification model, a second sub-classification model, a third sub-classification model, and a proportion calculation model, which are used for implementing classification of the graph group. Referring to fig. 4, a graph group including N images is taken as an example, where N is a positive integer greater than 1. When the type of the image group is identified according to the network structure, image feature extraction is carried out on N images in the image group, and the extracted N image feature vectors are fused to obtain the image group feature vectors. And fusing N pieces of environment information in the graph group to obtain environment feature vectors. And fusing the image group feature vector and the environment feature vector to obtain a fused feature vector. Inputting the fusion feature vector into the proportion calculation model, inputting the image group feature vector into the first sub-classification model, inputting the environment feature vector into the second sub-classification model, and inputting the fusion feature vector into the third sub-classification model. And multiplying the output of the first sub-classification model, the output of the second sub-classification model and the output of the third sub-classification model by the corresponding proportion calculated by the proportion calculation model respectively, adding the multiplication results, and determining the class of the graph group according to the addition result.

Optionally, in the network structure, the image of the graph group is used as an input, the category of the graph group is output, and the processing in the intermediate process is continuous, so that each processing algorithm, model, and the like in the intermediate process can be guided, for example, when image feature vectors are fused, a guided fusion method can be adopted to realize end-to-end processing.

As shown in fig. 5, a flowchart of an image classification method implemented by the network structure may include the following steps.

Step S310: the method comprises the steps of obtaining a graph group feature vector and an environment feature vector of a graph group to be classified, wherein the graph group comprises a plurality of images, the graph group feature vector is obtained by fusing image feature vectors of all the images in the graph group, and the environment feature vector is obtained by fusing environment information when all the images in the graph group are shot.

In this embodiment of the application, a specific sequence of obtaining the group of images feature vector and the environment feature vector is not limited, and may be determined according to an execution situation in specific execution, for example, as shown in fig. 4, the group of images feature vector and the environment feature vector may be obtained simultaneously. For a specific manner of obtaining the group feature vector and the environment feature vector, reference may be made to descriptions of the same or similar parts in other embodiments of the present application, which are not described herein again.

Step S320: and fusing the image group feature vector and the environment feature vector to obtain a fused feature vector.

For specific description of this step, reference may be made to descriptions of the same or similar parts in other embodiments of the present application, which are not repeated herein.

Step S330: inputting the fusion characteristic vector into a proportion calculation model, and acquiring the proportion corresponding to the image group characteristic vector, the environment characteristic vector and the fusion characteristic vector output by the proportion calculation model respectively. The proportion calculation model is used for determining the proportion of the graph group feature vector, the environment feature vector and the fusion feature vector according to the fusion feature vector input into the proportion calculation model, and the sum of the proportion of the graph group feature vector, the environment feature vector and the fusion feature vector is 1.

The method comprises the steps of inputting a fusion feature vector of a proportion calculation model, and simultaneously having the characteristics of a graph group feature vector, an environment feature vector and the fusion feature vector, wherein the proportion of the graph group feature vector, the proportion of the environment feature vector and the proportion of the fusion feature vector can be calculated by the proportion calculation model according to the input fusion feature vector.

Specifically, the proportion calculation model can output three proportional values, wherein the three proportional values respectively represent the proportion of the graph group feature vectors and represent the correlation between the graph group feature vectors and the classification result; the proportion of the environment feature vector represents the correlation between the environment feature vector and the classification result; and the proportion of the fusion feature vector represents the correlation between the fusion feature vector and the classification result. The three proportion values are numbers between 0 and 1 respectively, the sum of the three proportion values is 1, and the larger the proportion value is, the higher the correlation between the feature vector corresponding to the proportion and the classification result is, and the larger the influence on the classification result is.

In the embodiment of the present application, the specific reason why the ratio calculation model is a neural network model is not limited in the embodiment of the present application, such as a gate (gate) network.

Step S340: inputting the characteristic vector of the graph group into a first sub-classification model to obtain a first classification vector output by the first sub-classification model; inputting the environment feature vector into a second sub-classification model to obtain a second classification vector output by the second sub-classification model; inputting the fusion feature vector into a third sub-classification model to obtain a third classification vector output by the third sub-classification model; the first sub-classification model, the second sub-classification model and the third sub-classification model are used for determining the probability that the image information belongs to different classes according to the image information input into the first sub-classification model, the second sub-classification model and the third sub-classification model.

As shown in fig. 4, the obtained atlas feature vector, the environment feature vector, and the fusion feature vector are respectively input into three sub-classification models, and the output of each sub-classification model is obtained. Each sub-classification model is used for determining the probability of the image information belonging to different categories according to the input image information, the output of the sub-classification model can be a vector, the vector comprises numerical values corresponding to different dimensions, each dimension represents a category, and the numerical value of each dimension represents the probability of the category corresponding to the dimension determined by the sub-classification model.

Defining a sub-classification model input by the characteristic vector of the graph group as a first sub-classification model, and defining a vector output by the first sub-classification model as a first classification vector; the sub-classification model input by the environment feature vector is a second sub-classification model, and the vector output by the second sub-classification model is a second classification vector; and the sub-classification model input by the fusion feature vector is a third sub-classification model, and the vector output by the third sub-classification model is a third classification vector.

In this embodiment, the three sub-classification models may be the same classification model, and may be a neural network model with classification capability, and a specific reason why the classification model is not limited in this embodiment is, for example, each sub-classification model may be a full connection layer.

Step S350: multiplying the proportion of the feature vector of the image group with the first classification vector to obtain a first product; multiplying the proportion of the environment feature vector by the second classification vector to obtain a second product; and multiplying the proportion of the fusion feature vector by the third classification vector to obtain a third product.

Since the proportion representation map group feature vectors, the environment feature vectors and the fusion feature vectors output by the proportion calculation model respectively represent the relevance with the classification result, as shown in fig. 4, the classification vectors output by each sub-classification model can be multiplied by the corresponding proportion, so that the influence of the feature vectors with high relevance with the classification result in the map group feature vectors, the environment feature vectors and the fusion feature vectors on the classification result is improved, and the influence of the feature vectors with low relevance with the classification result on the classification result is reduced.

Step S360: and adding the first product, the second product and the third product to obtain a classification vector.

Step S370: and taking the class with the highest probability in the classification vector as the class of the graph group.

The first product is a vector of the first classification vector reduced by the proportion of the feature vector of the graph group, the second product is a vector of the second classification vector reduced by the proportion of the feature vector of the environment, and the third product is a vector of the third classification vector reduced by the proportion of the fused feature vector. The result of adding the first product, the second product, and the third product may be used as a classification vector for determining a class to which the group of maps belongs.

And when the first product, the second product and the third product are added, adding the numerical values of the dimensionalities corresponding to the same category. If the first product, the second product and the third product are all three-dimensional vectors and are all the first dimension representing the class A, and the second dimension representing the class B and the third dimension representing the class C, the numerical values of the first dimension of the first product, the second product and the third product are added to be used as the first dimension of the classification vector to represent the probability of the class A; the sum of the numerical values of the second dimension of the first product, the second product and the third product is taken as the second dimension of the classification vector, and the probability of the class B is represented; the sum of the numerical values of the third dimension of the first product, the second product, and the third product is taken as the third dimension of the classification vector, representing the probability of the class C.

The class corresponding to the one-dimensional highest in numerical value in the classification vector represents the class with the highest probability, and may be the class of the graph group. As in the foregoing example, if the value of the third dimension is greater than the values of the first dimension and the second dimension, the category of the graph group may be determined to be category C.

In the embodiment of the present application, when the network structure shown in fig. 4 is trained, a graph group with a class labeled is used as a training sample, and the network structure is input, that is, image features of image inputs in the graph group are extracted, and environment information inputs are fused. When adjusting parameters according to the labeled categories and the categories output by the network structure, all the related adjustable parameters in the network structure can be adjusted, such as adjusting parameters related to image feature extraction, adjusting parameters of a sub-classification model, adjusting parameters of a proportion calculation model, and the like, and the categories of which the final adjustment result is output are the same as the actual categories of the training samples. Therefore, each parameter in the network structure can be more accurate after being trained, and the classification result is more accurate when the classification is carried out through the network structure. For example, the adjustment of the parameters realizes that the proportion corresponding to each feature vector more accurately represents the correlation between different feature vectors and the classification result, so that the proportion of each feature vector in the classification result is more accurate.

In addition, end-to-end training can be realized in the training process, the training process is shortened, image information and environment information are extracted and fused more effectively, and the classification precision is improved.

In this embodiment of the application, when determining the category to which the graph group belongs according to the graph group feature vector, the environment feature vector, and the fusion feature vector, the classification model may include an occupation ratio calculation model and three sub-classification models, so that the same feature vector may be input into the sub-classification models to obtain classification results respectively determined according to each feature vector. And then, the proportion obtained by calculation of the proportion calculation model is used for carrying out scale reduction on the classification result respectively determined by each feature vector, and the final classification result is determined according to the reduced result, so that the proportion of each feature vector in the final classification result is adjusted according to different relativity of each feature vector and the classification result, and a more accurate classification vector is obtained.

The image classification method according to the embodiment of the present application is described with reference to fig. 4 through description of a specific implementation scenario. In the embodiment of the present application, as shown in fig. 4, N images are taken as a group of graphs to be classified, where N is a positive integer. Image feature extraction can be performed on the N images through a CNN network to obtain N image feature vectors. The CNN network may train the image feature extraction capability of the CNN network through the ImageNet data set in advance.

And fusing the extracted N image feature vectors into one vector, and defining the fused vector as a graph group feature vector. In which, a guided fusion method can be adopted for fusion to achieve end-to-end training, such as Max posing, RNN, NetVLAD, and the like.

In addition, the GPS information of each image in the image group to be classified is taken as the environment information, and N images correspond to N pieces of GPS information. The GPS statistical information may be extracted for the N images as environmental information corresponding to the N images, respectively. The method for extracting the GPS statistical information for the N images may be that the statistical information is extracted after the PCA processing is performed on the N GPS information in the image group to be classified. As mentioned above, the GPS information may include longitude and latitude, the longitude and the latitude form a binary vector, and the extracted statistical information includes, but is not limited to, entropy of two-dimensional spatial distribution, range of two dimensions, product of two dimensional axes, ratio of two dimensional axes, variance of two dimensions, and the like. And combining the extracted N pieces of GPS statistical information into a GPS statistical feature vector as the environmental feature vector in the embodiment of the application.

And splicing the obtained image group feature vector and the environment feature vector together, for example, realizing the fusion of the image group feature vector and the environment feature vector through a Concat function, and defining the vector after splicing the image group feature vector and the environment feature vector as a fusion feature vector.

As shown in fig. 4, a gate network may be composed of a fully connected layer and a softmax function, and is used as a proportion calculation model for controlling the proportion of the contribution of three branches to the final prediction result in the image classification method. The three branches are respectively: fusing picture characteristics; GPS statistical characteristics; and the fused picture features and the GPS statistical features are added, namely the image feature vector, the environment feature vector and a branch corresponding to the fused feature vector respectively. The fusion characteristic vector is used as the input of the gate network, the gate network outputs the occupation ratios corresponding to the three branches respectively, and the sum of the occupation ratios of the three branches is 1.

Referring to fig. 4, each branch can be classified through a full connectivity layer (FC), that is, each branch takes FC as a sub-classification model and takes a feature vector of each branch as an input of the corresponding FC, and each branch outputs a classification result of FC. If the corresponding FC is input by the graph group feature vector, the FC outputs the classification result of the graph group feature vector; inputting the corresponding FC by the fusion feature vector, and outputting a classification result of the fusion feature vector by the FC; and inputting the environment feature vector into a corresponding FC, and outputting a classification result of the environment feature vector by the FC.

And the classification result output by each branch is multiplied by the proportion corresponding to the branch output by the gate network to obtain the multiplied result of each branch. And adding the multiplied results obtained by the three branches respectively to serve as a basis for finally determining the category. If each branch is output as a vector after passing through the FC, the vectors output by the FC and the corresponding ratios of the three branches are multiplied and added to obtain a vector, each dimension corresponds to a category in the vector, and the numerical value of the dimension is the maximum, the category corresponding to the dimension is the final classification result of the image group formed by the N images.

In the foregoing embodiment, the determination of the category of the graph group is described, that is, the specific category name of the graph group that has been determined to be the same category is identified. In practical application, a large number of images to be classified may exist, and the images may belong to different categories, so that it may be determined which images belong to the same category in the images to be classified, the images belonging to the same category may be determined as images in the same atlas, and then the specific category name of the atlas may be identified.

In the embodiment of the present application, the manner of determining the images belonging to the same group in the images to be classified may be to cluster the images to be classified, and extract the images belonging to the same category as the group to be classified.

In one implementation mode, the image features of the images to be classified can be extracted through an image feature extraction algorithm, then the image features are clustered through a clustering algorithm, and the images corresponding to the image features belonging to the same category are determined to be the images of the same category.

In another embodiment, the categories of the family map groups may be event categories, with images of an event category belonging to images in an event. For the same event, the environmental information of the image shooting has similarity, so that the images to be classified can be classified according to the environmental information, the dependence on image characteristics is reduced compared with the classification according to the image characteristics, and the images belonging to the same event can be accurately classified into the same category even if the image characteristics have larger difference.

Optionally, in this embodiment, the environment information of each image may be converted into a point in a coordinate system, a difference between the environment information is measured by a distance between the points, and a point smaller than a preset distance is taken as a point in the same category. Specifically, a coordinate system may be established according to the type of the environment information, and the environment information in the embodiment of the present application includes a geographic position and a time as an example, as shown in fig. 6, the time is used as an abscissa and the geographic position is used as an ordinate, and each environment information is used as a point in the coordinate system according to the time and the geographic position of each environment information. And traversing all the points corresponding to the images to be classified without repeated traversal. For each traversed point, if the point is greater than the preset number in the preset distance range, marking the traversed point as a core point, defining the point in the preset range of the core point as the domain of the core point, and marking the point in the preset range of the core point as the point in the domain of the core point. For each point, if the point is a core point, adding the core point and the points in the core point field into a cluster, and if the point in the same cluster is a core point, still adding the points in the core point field into the cluster until the points in the fields of all the core points in the cluster are added into the cluster, wherein the environmental information in the cluster is the environmental information of the same category. For example, as shown in fig. 6, a cluster 101 and a cluster 102 can be obtained, where the environmental information corresponding to each point in the cluster 101 is the environmental information of the same category, and the environmental information corresponding to each point in the cluster 102 is the environmental information of the same category.

The images corresponding to the environmental information of the same category are images of the same category, and can be images in the same image group. As shown in fig. 6, the environmental information is an image of the environmental information in the cluster 101, and is an image of the same category as a graph group; the environmental information is an image of the environmental information in the cluster 102, and is an image of the same category as one image group, and two image groups can be obtained according to the clustering shown in fig. 6. The images corresponding to the

points

103 and 104 do not belong to any map group.

Optionally, in this embodiment, the environmental information may also be clustered by using an existing clustering algorithm, such as a density-based clustering algorithm HDBSCAN algorithm, a K-Means clustering algorithm (K-Means algorithm), a mean shift algorithm (mean shift algorithm), and the like. In each clustering algorithm, the number of images in the same category can be set to be greater than the preset number, so that the number of images in the image group is greater than the preset number.

In the embodiment of the application, the images to be classified can be clustered, so that the images belonging to the same category in the images to be classified are extracted, the images belonging to the same category are used as the images in the same image group, and irrelevant images which do not belong to any category are removed from the obtained image group. And then, the class identification is carried out on the graph group by combining the graph group feature vector and the environment feature vector of the graph group, namely the specific class of the graph group is determined, and the classification accuracy is improved.

The embodiment of the application also provides an image classification device 400. As shown in fig. 7, the apparatus 400 includes: a vector obtaining module 410, configured to obtain a group feature vector and an environment feature vector of a group of images to be classified, where the group of images includes multiple images, the group feature vector is obtained by fusing image feature vectors of the images in the group of images, and the environment feature vector is obtained by fusing environment information obtained when the images in the group of images are captured; a fusion module 420, configured to fuse the atlas feature vector and the environment feature vector to obtain a fusion feature vector; and a classification module 430, configured to determine a category to which the graph group belongs according to the graph group feature vector, the environment feature vector, and the fusion feature vector.

Optionally, the classification module 430 may be configured to input the atlas feature vector, the environment feature vector, and the fused feature vector into a classification model, where the classification model is configured to determine a category to which the image information input therein belongs; and acquiring the class output by the classification model as the class corresponding to the graph group.

Optionally, the classification model may include a first sub-classification model, a second sub-classification model, a third sub-classification model, and a proportion calculation model. The classification module 430 may be configured to input the fused feature vector into a proportion calculation model, and obtain a proportion of the atlas feature vector, the environmental feature vector, and the fused feature vector output by the proportion calculation model, where the proportion calculation model is configured to determine a proportion of the atlas feature vector, the environmental feature vector, and the fused feature vector according to the fused feature vector input therein, and a sum of the proportions of the atlas feature vector, the environmental feature vector, and the fused feature vector is 1; inputting the characteristic vector of the graph group into a first sub-classification model to obtain a first classification vector output by the first sub-classification model; inputting the environment feature vector into a second sub-classification model to obtain a second classification vector output by the second sub-classification model; inputting the fusion feature vector into a third sub-classification model to obtain a third classification vector output by the third sub-classification model; the first sub-classification model, the second sub-classification model and the third sub-classification model are used for determining the probability that the image information belongs to different classes according to the image information input into the first sub-classification model, the second sub-classification model and the third sub-classification model; multiplying the proportion of the feature vector of the image group with the first classification vector to obtain a first product; multiplying the proportion of the environment feature vector by the second classification vector to obtain a second product; multiplying the proportion of the fusion feature vector by the third classification vector to obtain a third product; adding the first product, the second product and the third product to obtain a classification vector; and taking the class with the highest probability in the classification vector as the class of the graph group.

Optionally, the apparatus may further include a training module, configured to obtain a category of the labeled graph group; judging whether the class output by the classification model is the same as the labeled class or not; if the classification model parameters are different, the classification model parameters are adjusted, and the step of obtaining the image group characteristic vectors and the environment characteristic vectors of the image groups to be classified is executed again until whether the output classes of the classification model are the same as the labeled classes is judged.

Optionally, the vector obtaining module 410 may include a graph group feature vector obtaining unit, configured to perform image feature extraction on each image in the graph group, so as to obtain image feature vectors respectively corresponding to each image; and fusing the obtained image feature vectors to obtain the image group feature vectors.

Optionally, the vector obtaining module 410 may include an environment feature vector obtaining unit, configured to obtain environment information of each image in the group of images when shooting; and fusing the obtained environment information into a vector as the environment characteristic vector.

Optionally, the apparatus may further include a grouping module, configured to cluster the images to be classified, and extract images belonging to the same category as a group of images to be classified.

It will be clear to those skilled in the art that, for convenience and brevity of description, the various method embodiments described above may be referred to one another; for the specific working processes of the above-described devices and modules, reference may be made to corresponding processes in the foregoing method embodiments, which are not described herein again.

In the several embodiments provided in the present application, the coupling between the modules may be electrical, mechanical or other type of coupling.

In addition, functional modules in the embodiments of the present application may be integrated into one processing module, or each of the modules may exist alone physically, or two or more modules are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. Each module may be configured in different electronic devices, or may be configured in the same electronic device, and the embodiments of the present application are not limited thereto.

Referring to fig. 8, a block diagram of an electronic device 500 according to an embodiment of the present disclosure is shown. The electronic device may include one or more processors 510 (only one shown), memory 520, and one or more programs. Wherein the one or more programs are stored in the memory 520 and configured to be executed by the one or more processors 510. The one or more programs are executed by the processor for performing the methods described in the foregoing embodiments.

Processor 510 may include one or more processing cores. The processor 510 interfaces with various components throughout the electronic device 500 using various interfaces and circuitry to perform various functions of the electronic device 500 and process data by executing or executing instructions, programs, code sets, or instruction sets stored in the memory 520 and invoking data stored in the memory 520. Alternatively, the processor 510 may be implemented in hardware using at least one of Digital Signal Processing (DSP), Field-Programmable Gate Array (FPGA), and Programmable Logic Array (PLA). The processor 510 may integrate one or more of a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), a modem, and the like. Wherein, the CPU mainly processes an operating system, a user interface, an application program and the like; the GPU is used for rendering and drawing display content; the modem is used to handle wireless communications. It is understood that the modem may not be integrated into the processor 510, but may be implemented by a communication chip.

The Memory 520 may include a Random Access Memory (RAM) or a Read-Only Memory (Read-Only Memory). The memory 520 may be used to store instructions, programs, code sets, or instruction sets. The memory 520 may include a program storage area and a data storage area, wherein the program storage area may store instructions for implementing an operating system, instructions for implementing at least one function, instructions for implementing the various method embodiments described above, and the like. The stored data area may also store data created by the electronic device in use, and the like.

Referring to fig. 9, a block diagram of a computer-readable storage medium according to an embodiment of the present application is shown. The computer-readable storage medium 600 has stored therein program code that can be called by a processor to execute the method described in the above-described method embodiments.

The computer-readable storage medium 600 may be an electronic memory such as a flash memory, an EEPROM (electrically erasable programmable read only memory), an EPROM, a hard disk, or a ROM. Alternatively, the computer-readable storage medium 600 includes a non-volatile computer-readable storage medium. The computer readable storage medium 600 has storage space for program code 610 for performing any of the method steps of the method described above. The program code can be read from or written to one or more computer program products. The program code 610 may be compressed, for example, in a suitable form.

Finally, it should be noted that: the above embodiments are only used to illustrate the technical solutions of the present application, and not to limit the same; although the present application has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not necessarily depart from the spirit and scope of the corresponding technical solutions in the embodiments of the present application.

Claims

1. A method of image classification, the method comprising:

acquiring a graph group feature vector and an environment feature vector of a graph group to be classified, wherein the graph group comprises a plurality of images, the graph group feature vector is obtained by fusing image feature vectors of all the images in the graph group, and the environment feature vector is obtained by fusing environment information of all the images in the graph group during shooting;

fusing the image group feature vector and the environment feature vector to obtain a fused feature vector;

and determining the category of the graph group according to the graph group feature vector, the environment feature vector and the fusion feature vector.

2. The method of claim 1, wherein determining the class to which the graph group belongs based on the graph group feature vector, the environment feature vector, and the fused feature vector comprises:

inputting the atlas feature vector, the environment feature vector and the fusion feature vector into a classification model, wherein the classification model is used for determining a category to which image information input into the classification model belongs;

and acquiring the class output by the classification model as the class corresponding to the graph group.

3. The method according to claim 2, wherein the classification model includes a first sub-classification model, a second sub-classification model, a third sub-classification model, and an occupation computation model, and the inputting the atlas feature vector, the environment feature vector, and the fusion feature vector into a classification model and obtaining a class output by the classification model as a class corresponding to the atlas, further includes:

inputting the fusion feature vector into a proportion calculation model, and obtaining the proportions of the image group feature vector, the environment feature vector and the fusion feature vector which are output by the proportion calculation model and respectively correspond to the image group feature vector, the environment feature vector and the fusion feature vector, wherein the proportion calculation model is used for determining the proportions of the image group feature vector, the environment feature vector and the fusion feature vector according to the fusion feature vector input into the proportion calculation model, and the sum of the proportions of the image group feature vector, the environment feature vector and the fusion feature vector is 1;

inputting the characteristic vector of the graph group into a first sub-classification model to obtain a first classification vector output by the first sub-classification model; inputting the environment feature vector into a second sub-classification model to obtain a second classification vector output by the second sub-classification model; inputting the fusion feature vector into a third sub-classification model to obtain a third classification vector output by the third sub-classification model; the first sub-classification model, the second sub-classification model and the third sub-classification model are used for determining the probability that the image information belongs to different classes according to the image information input into the first sub-classification model, the second sub-classification model and the third sub-classification model;

multiplying the proportion of the feature vector of the image group with the first classification vector to obtain a first product; multiplying the proportion of the environment feature vector by the second classification vector to obtain a second product; multiplying the proportion of the fusion feature vector by the third classification vector to obtain a third product;

adding the first product, the second product and the third product to obtain a classification vector;

and taking the class with the highest probability in the classification vector as the class of the graph group.

4. The method of claim 2, further comprising:

acquiring the category of the labeled graph group;

judging whether the class output by the classification model is the same as the labeled class or not;

if the classification model parameters are different, the classification model parameters are adjusted, and the step of obtaining the image group characteristic vectors and the environment characteristic vectors of the image groups to be classified is executed again until whether the output classes of the classification model are the same as the labeled classes is judged.

5. The method of claim 1, wherein obtaining a atlas feature vector for an atlas to be classified comprises:

extracting image features of each image in the image group to obtain image feature vectors respectively corresponding to each image;

and fusing the obtained image feature vectors to obtain the image group feature vectors.

6. The method of claim 1, wherein obtaining an environmental feature vector for a set of graphs to be classified comprises:

acquiring environment information of each image in the image group during shooting;

and fusing the obtained environment information into a vector as the environment characteristic vector.

7. The method of claim 1, wherein before obtaining the atlas feature vector and the environment feature vector of the atlas to be classified, the method further comprises:

and clustering the images to be classified, and extracting the images belonging to the same category as the image group to be classified.

8. An image classification apparatus, characterized in that the apparatus comprises:

the image classification device comprises a vector acquisition module, a classification module and a classification module, wherein the vector acquisition module is used for acquiring image group feature vectors and environment feature vectors of an image group to be classified, the image group comprises a plurality of images, the image group feature vectors are obtained by fusing image feature vectors of all the images in the image group, and the environment feature vectors are obtained by fusing environment information when all the images in the image group are shot;

the fusion module is used for fusing the image group feature vector and the environment feature vector to obtain a fusion feature vector;

and the classification module is used for determining the category of the graph group according to the graph group feature vector, the environment feature vector and the fusion feature vector.

9. An electronic device, comprising:

one or more processors;

a memory;

one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors for performing the method recited in any of claims 1-7.

10. A computer-readable storage medium, having stored thereon program code that can be invoked by a processor to perform the method according to any one of claims 1 to 7.