CN110796184B

CN110796184B - Image classification method and device, electronic equipment and storage medium

Info

Publication number: CN110796184B
Application number: CN201910989790.3A
Authority: CN
Inventors: 贾玉虎
Original assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Current assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Priority date: 2019-10-17
Filing date: 2019-10-17
Publication date: 2022-09-06
Anticipated expiration: 2039-10-17
Also published as: CN110796184A

Abstract

The embodiment of the invention is suitable for the technical field of image processing, and provides an image classification method, an image classification device, electronic equipment and a storage medium, wherein the image classification method comprises the following steps: extracting a first feature of the first picture; the first feature is characterized as a color histogram for three primary RGB channels; extracting a second feature of the first picture through a first model; the first model is a convolutional neural network; determining a fusion feature of the first picture, wherein the fusion feature is obtained by connecting the first feature and the second feature; inputting the fusion features into a classifier for processing to obtain a classification result corresponding to the first picture; the classifier is used for classifying the pictures according to the objects in the pictures.

Description

Image classification method and device, electronic equipment and storage medium

Technical Field

The invention belongs to the technical field of image processing, and particularly relates to an image classification method and device, electronic equipment and a storage medium.

Background

The plant scenes relate to a plurality of classification categories such as florists, bouquets, flowers, green plants (including bushes) and green plants, and have diversity of categories and single scene semantics, so that errors are prone to occur when the images of the plant scenes are classified, for example, the florists are recognized as bouquets.

Disclosure of Invention

In view of the above, embodiments of the present invention provide an image classification method, an image classification device, an electronic device, and a storage medium, so as to at least solve the problem in the related art that an error is easily caused when an image is classified in a plant scene.

The technical scheme of the embodiment of the invention is realized as follows:

in a first aspect, an embodiment of the present invention provides an image classification method, where the method includes:

extracting a first feature of the first picture; the first feature is characterized as a color histogram for three primary RGB channels;

extracting a second feature of the first picture through a first model; the first model is a convolutional neural network;

determining a fusion feature of the first picture, wherein the fusion feature is obtained by connecting the first feature and the second feature;

inputting the fusion features into a classifier for processing to obtain a classification result corresponding to the first picture; the classifier is used for classifying the pictures according to the objects in the pictures.

In the foregoing solution, the determining the fusion characteristic of the first picture includes:

reducing the vector dimension of the first feature to one dimension to obtain a third feature;

connecting the third feature with the second feature to obtain the fused feature; the vector dimension of the first feature is two-dimensional, and the vector dimension of the second feature is one-dimensional.

In the foregoing scheme, the reducing the vector dimension of the first feature to one dimension to obtain a third feature includes:

connecting all fourth features included in the first features to obtain fifth features; the first features comprise three groups of fourth features, and each group of fourth features respectively corresponds to a red channel, a green channel and a blue channel in an RGB channel;

and reducing the vector dimension of the fifth feature to one dimension to obtain the third feature.

In the foregoing solution, the connecting all fourth features included in the first feature includes:

and connecting the three groups of fourth characteristics according to a set channel sequence.

In the above solution, the extracting the second feature of the first picture through the first model includes:

down-sampling the first picture to obtain a second picture;

and extracting the second characteristic from the second picture through the first model.

In the above scheme, the training process of the classifier includes:

acquiring a first feature and a second feature of a sample picture;

training a second model by using second characteristics of the sample picture, and determining the trained second model as the first model;

connecting the first characteristic and the second characteristic of the sample picture to obtain a fusion characteristic of the sample picture;

and training a third model by using the fusion characteristics of the sample pictures, and determining the trained third model as the classifier.

In the above scheme, the training process of the classifier includes:

acquiring a first feature and a second feature of a sample picture;

and inputting the fusion characteristics of the sample pictures into a full-connection layer in the first model, and training the first model to obtain the classifier.

In a second aspect, an embodiment of the present invention provides a model training method, where the method includes:

acquiring a first feature and a second feature of a sample picture;

In a third aspect, an embodiment of the present invention provides another model training method, where the method includes:

acquiring a first feature and a second feature of a sample picture;

In a fourth aspect, an embodiment of the present invention provides an image classification apparatus, including:

the first extraction module is used for extracting first features of the first picture; the first feature is characterized as a color histogram for three primary color RGB channels;

the second extraction module is used for extracting second characteristics of the first picture through the first model; the first model is a convolutional neural network;

the first connection module is used for determining a fusion feature of the first picture, wherein the fusion feature is obtained by connecting the first feature and the second feature;

the classification module is used for inputting the fusion features into a classifier for processing to obtain a classification result corresponding to the first picture; the classifier is used for classifying the pictures according to the objects in the pictures.

In a fifth aspect, an embodiment of the present invention provides a model training apparatus, where the apparatus includes:

the first acquisition module is used for acquiring a first characteristic and a second characteristic of the sample picture;

the first training module is used for training a second model by using the second characteristics of the sample picture and determining the trained second model as the first model;

the second connection module is used for connecting the first characteristic and the second characteristic of the sample picture to obtain a fusion characteristic of the sample picture;

and the second training module is used for training a third model by utilizing the fusion characteristics of the sample pictures and determining the trained third model as the classifier.

In a sixth aspect, an embodiment of the present invention provides another model training apparatus, including:

the second acquisition module is used for acquiring the first characteristic and the second characteristic of the sample picture;

the third training module is used for training a second model by using the second characteristics of the sample picture and determining the trained second model as the first model;

the third connecting module is used for connecting the first characteristic and the second characteristic of the sample picture to obtain a fusion characteristic of the sample picture;

and the fourth training module is used for inputting the fusion characteristics of the sample pictures into a full connection layer in the first model, training the first model and obtaining the classifier.

In a seventh aspect, an embodiment of the present invention provides an electronic device, which includes a processor and a memory, where the processor and the memory are connected to each other, where the memory is used to store a computer program, and the computer program includes program instructions, and the processor is configured to call the program instructions, to execute steps of the image classification method provided in the first aspect of the embodiment of the present invention, or to execute steps of the model training method provided in the second aspect and the third aspect of the embodiment of the present invention.

In an eighth aspect, an embodiment of the present invention provides a computer-readable storage medium, including: the computer-readable storage medium stores a computer program. The computer program when executed by a processor performs the steps of the image classification method as provided in the first aspect of the embodiment of the present invention or performs the steps of the model training method as provided in the second and third aspects of the embodiment of the present invention.

The method comprises the steps of extracting a first feature and a second feature of a picture, wherein the first feature is a color histogram of a three-primary-color RGB channel, and the second feature is a global feature extracted through a convolutional neural network; connecting the first feature and the second feature to obtain a fused feature; and inputting the fusion features into a classifier for processing to obtain a classification result corresponding to the picture. According to the embodiment of the invention, the first characteristic and the second characteristic are combined, so that the accuracy of image classification is improved, and particularly, when the image classification is carried out on the image containing the color-sensitive scene, the accuracy of image classification is higher.

Drawings

FIG. 1 is a schematic diagram of a plant scene provided by an embodiment of the invention;

fig. 2 is a schematic flow chart illustrating an implementation of a picture classification method according to an embodiment of the present invention;

FIG. 3 is a diagram of a color histogram for RGB channels according to an embodiment of the present invention;

FIG. 4 is a schematic flow chart illustrating an implementation of another image classification method according to an embodiment of the present invention;

FIG. 5 is a schematic flow chart illustrating an implementation of another image classification method according to an embodiment of the present invention;

FIG. 6 is a schematic flow chart illustrating an implementation of another image classification method according to an embodiment of the present invention;

FIG. 7 is a diagram illustrating a second model training process provided by embodiments of the present invention;

FIG. 8 is a diagram illustrating a training process of a third model provided by an embodiment of the invention;

FIG. 9 is a diagram illustrating an image classification process according to an embodiment of the present invention;

FIG. 10 is a schematic flow chart illustrating an implementation of a model training method according to an embodiment of the present invention;

FIG. 11 is a schematic flow chart of another implementation of a model training method according to an embodiment of the present invention;

fig. 12 is a block diagram of an image classification apparatus according to an embodiment of the present invention;

FIG. 13 is a block diagram of a model training apparatus according to an embodiment of the present invention;

FIG. 14 is a block diagram of another model training apparatus according to an embodiment of the present invention;

fig. 15 is a schematic diagram of a hardware structure of an electronic device according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It will be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

The technical means described in the embodiments of the present invention may be arbitrarily combined without conflict.

In addition, in the embodiments of the present invention, "first", "second", and the like are used for distinguishing similar objects, and are not necessarily used for describing a specific order or a sequential order.

Referring to fig. 1, fig. 1 is a schematic diagram of a plant scene provided by an embodiment of the present invention, as shown in fig. 1, the plant scene is, from left to right, a green plant slave, a green plant bunch, a flower, a bouquet, and a flower clump in sequence, and an image object classification technique is generally adopted to distinguish the plant scene in the picture. Due to the diversity of the plant scene categories and the single scene semantics, the related art is prone to errors when image classification is performed on the plant scenes. For example, when only a few sporadic flowers are in a flower stand, the related art may identify the plant scene as a bush or bouquet.

Aiming at the problem of plant scene classification, currently, 2 schemes are mainly adopted in the related technology. One of the methods is to construct an image classification model based on a convolutional neural network architecture such as AlexNet, VGGNet, Google inclusion Net or ResNet, and classify the plant scene through the image classification model. However, in the above scheme, the global features of the whole image are extracted to perform feature fitting classification, while the types of plant scenes are too many, the semantic similarity is high, and the plant scenes are difficult to be well distinguished only through the global features. Therefore, the above scheme is not suitable for classifying plant scenes in images.

Another scheme of the related art is a feature pyramid structure scheme, but the scheme has long training time and a complex training process, and the structure of a multi-level multi-pyramid network can result in long running time and is not beneficial to running on an embedded device.

Aiming at the defect that the related technology cannot be applied to image classification of plant scenes, the embodiment of the invention provides an image classification method which can accurately classify the plant scenes. In order to explain the technical means of the present invention, the following description will be given by way of specific examples.

Referring to fig. 2, fig. 2 is a schematic view illustrating an implementation flow of a picture classification method according to an embodiment of the present invention, where an execution subject of the method may be an electronic device such as a mobile phone, a tablet, a server, and the like. Referring to fig. 2, the image classification method includes:

s101, extracting first features of a first picture; the first feature is characterized as a color histogram for the three primary RGB channels.

Here, the first picture is a picture to be subjected to image classification by the electronic device. In the embodiment of the present invention, the first feature of the first picture is a color histogram of RGB channels of three primary colors of the first picture, which is a color feature widely used in many image retrieval systems and describes the proportion of different colors in the whole image. The process of extracting the first feature is a process of extracting color histograms of three channels, i.e., a red channel (R channel), a green channel (G channel), and a blue channel (B channel), respectively. It is also understood that the first feature includes three sets of features, and each set of features corresponds to an R channel, a G channel, and a B channel in an RGB channel.

Extracting the color histogram requires dividing the color space into several small color bins, each of which becomes a bin (bin) of the histogram, a process called color quantization. There are many methods for color quantization, such as vector quantization, clustering method, or neural network method. A color histogram can then be obtained by counting the number of pixels whose color falls within each cell. In practical applications, the electronic device may extract the color histogram of the first picture through an Open Source Computer Vision Library (OpenCV).

Referring to fig. 3, fig. 3 is a schematic diagram of a color histogram of an RGB channel according to an embodiment of the present invention, and as shown in fig. 3, the left side of fig. 3 is the color histogram of the RGB channel of a green cluster, and the right side of fig. 3 is the color histogram of the RGB channel of a flower cluster. As can be seen from fig. 3, the color histograms of the RGB channels of the green and florist are significantly different, and therefore, in the embodiment of the present invention, the color histogram of the RGB channel is used as a feature of a picture for image recognition, so that a better classification effect can be obtained in an application scene of plant scene recognition.

In the present embodiment, the vector dimensions of the first features extracted from the R, G and B channels, respectively, are both two-dimensional.

S102, extracting second features of the first picture through a first model; the first model is a convolutional neural network.

Referring to fig. 4, which is a schematic flowchart illustrating another image classification method provided by an embodiment of the present invention, as shown in fig. 4, the extracting, by using a first model, a second feature of the first picture includes:

and S1021, downsampling the first picture to obtain a second picture.

Down-sampling the picture is to reduce the size of the image. The down-sampling has 2 purposes, one is to fit the image to the size of the display area, and the other is to generate a thumbnail of the corresponding image.

The principle of downsampling a picture is as follows: an image with the size of M multiplied by N is subjected to s times downsampling, and a resolution image with the size of (M/s) multiplied by (N/s) is obtained, wherein s is a common divisor of M and N.

S1022, the second feature is extracted from the second picture through the first model.

In the embodiment of the invention, the first model is a convolutional neural network, and the second feature of the second picture is extracted through the convolutional neural network. The convolutional neural network comprises a plurality of convolutional layers, shape features can be extracted through the convolutional layers in the first layer, and texture features can be extracted through the convolutional layers in the second layer. In practical applications, although features extracted from a deeper convolutional layer can be reused, the features extracted from a deeper convolutional layer are difficult to subjectively express or visualize. Therefore, the second feature is typically extracted using only the first and second convolutional layers of the convolutional neural network.

Here, the extracted second feature is a global feature of the second picture. The global features refer to features that can represent the whole image, are relative to local features, and are used for describing overall features such as textures and shapes of the image or the target, and common global features include color features, texture features and shape features. In practical applications, texture features and shape features are usually extracted by a convolutional neural network, although color features can also be extracted by the convolutional neural network, but the convolutional neural network is not sensitive to color, and texture and shape are more interesting in practice.

And extracting a second feature from the second picture, namely the picture after down-sampling, by using the first model, wherein in practical application, convolutional neural network architectures such as AlexNet, VGGNet, Google addition Net or ResNet can be selected to construct the first model. The convolutional layer in the convolutional neural network can extract the features of the picture.

In an embodiment of the invention, the vector dimension of the second feature extracted by the first model is one-dimensional.

In the embodiment of the present invention, the execution processes of S101 and S102 share one convolutional neural network. The network architecture is selected according to the type of the electronic equipment, for example, a ResNet network architecture can be selected from a server, and a Mobile network architecture can be selected from equipment such as a mobile phone.

It should be understood that the size of the sequence numbers of S101 and S102 in the above embodiments does not mean the order of execution.

S103, determining a fusion feature of the first picture, wherein the fusion feature is obtained by connecting the first feature and the second feature.

Referring to fig. 5, which shows a schematic flowchart of another image classification method provided by the embodiment of the present invention, as shown in fig. 5, the determining a fusion feature of the first picture includes:

and S1031, reducing the vector dimension of the first features to one dimension to obtain third features.

To connect two features, the vector dimensions of the 2 features are first made the same. Wherein the vector dimension of the second feature is one-dimensional and the vector dimension of the first feature is two-dimensional. Here, the reason why the vector dimension of the first feature is reduced from two dimensions to one dimension, but the vector dimension of the second feature is not increased from one dimension to two dimensions, is that the vector dimensions of the 2 features are unified into one dimension because the amount of calculation of the one-dimensional vector is small and the processing efficiency is high.

Referring to fig. 6, which is a schematic diagram illustrating an implementation flow of another image classification method provided by an embodiment of the present invention, as shown in fig. 6, the reducing the vector dimension of the first feature from two dimensions to one dimension to obtain a third feature includes:

s10311, connecting all fourth features contained in the first features to obtain fifth features; the first features comprise three groups of fourth features, and each group of fourth features respectively corresponds to a red channel, a green channel and a blue channel in an RGB channel.

In the embodiment of the present invention, 3 groups of fourth features are two-dimensional vectors, and the number of rows and columns of the 3 groups of fourth features are corresponding, that is, if one group of fourth features is n rows and m columns, the other two groups of fourth features are n rows and m columns.

If the number of rows of a two-dimensional vector is greater than the number of columns, the last row of one of the two-dimensional vectors is connected to the first row of the other two-dimensional vector, e.g., assuming vector A is

Vector B is

Connecting the vector A and the vector B end to obtain a vector C, wherein the vector C is

If the number of columns of a two-dimensional vector is greater than the number of rows, then the last column of one two-dimensional vector is connected to the first column of the other two-dimensional vector, e.g., assuming vector D is

Vector E is

Connecting the vector A and the vector B end to obtain a vector F, wherein the vector F is

If 3 features are connected, 2 features are connected end to end, and then connected end to end with another feature. For example, assume that R, G and B channels correspond to 32 × 18 two-dimensional vectors, and if the 3 eigenvectors are connected end to end, first 2 of the two-dimensional vectors are connected end to obtain a 2 × 36 two-dimensional vector, and then connected end to end with another 2 × 18 two-dimensional vector to obtain a 2 × 54 two-dimensional vector.

Further, the connecting all fourth features included in the first feature includes:

The set channel order includes R-G-B, R-B-G, G-R-B, G-B-R, B-G-R and B-R-G, for example, the connection is made in the channel order of R-G-B, i.e., the fourth feature of the R channel and the G channel are connected first, and then the fourth feature of the B channel is connected.

And connecting the three groups of fourth features according to a set channel sequence to obtain a fifth feature, wherein the vector dimension of the fifth feature is two-dimensional.

And S10312, reducing the vector dimension of the fifth feature to one dimension to obtain the third feature.

And reducing the vector dimension of the fifth feature from two dimensions to one dimension to obtain a third feature. There are several vector dimension reduction methods, including: principal component analysis, multidimensional scaling, linear discriminant analysis, and the like. In practical application, the vector dimension of the fifth feature may be reduced to one dimension by using a principal component analysis method to obtain the third feature.

S1032, connecting the third feature with the second feature to obtain the fusion feature; the vector dimension of the first feature is two-dimensional, and the vector dimension of the second feature is one-dimensional.

And after the dimension of the first feature is reduced to a one-dimensional third feature, connecting the second feature and the third feature end to obtain a fused feature. Because the vector dimensions of the features used for connection are all one-dimensional, the vector dimensions of the fused features are also one-dimensional, the one-dimensional vector calculation amount is small, and the processing efficiency can be improved.

S104, inputting the fusion features into a classifier for processing to obtain a classification result corresponding to the first picture; the classifier is used for classifying the pictures according to the objects in the pictures.

And classifying the image objects of the picture, namely confirming the objects in the picture and determining the names of the objects in the picture. Generally, the related art globally describes the whole picture through a manual feature or a feature learning method, and then uses a classifier to determine whether a certain type of object exists in the picture.

When the image classification is carried out on the plant scene, the name of the plant and the existing form of the plant in the picture are confirmed, for example, whether the picture is a bunch of roses, a rose or a rose flower. The related art generally inputs the global features of the picture into a classifier, and confirms the plant scene through the classifier. Wherein, the classifier is obtained by global feature training. However, the plant scenes are too many in types and single in semantic meaning, and are difficult to distinguish through global features, and the related art is prone to errors when image classification is performed on the plant scenes, for example, flower clusters are recognized as flower bundles.

In order to overcome the defect that plant scenes are difficult to distinguish through global features in the related art, the embodiment of the invention extracts the color histogram of the RGB channel with higher distinguishing degree as the auxiliary feature, simultaneously keeps the extraction of the global features and makes up the problem that the color histogram of the RGB channel is sensitive to illumination change. The color histogram and the global features are combined, so that the accuracy of image recognition is improved. And the color histogram has small calculation amount, and is very suitable for running on electronic equipment such as a mobile phone, a tablet personal computer and the like.

And inputting the fusion features into a classifier to obtain a classification result of the first picture. In an embodiment of the present invention, the classifier includes a linear classifier and a non-linear classifier. When the first picture contains a plant scene, the classifier classifies the picture into image objects based on the type of the plant and/or the existence form of the plant; the types of plants include, among others, vegetation and flowers.

For example, when the image classification method provided by the present invention is used to classify the plant scene shown in fig. 1, first, a first feature and a second feature of fig. 1 are extracted, the first feature is characterized by a color histogram about three primary RGB channels, and the second feature is a global feature extracted by a convolutional neural network. And then connecting the first feature and the second feature to obtain a fused feature. And finally, inputting the fusion features into a classifier for processing to obtain a classification result corresponding to the graph 1. The color histogram and the global features are combined, so that the accuracy of image identification is improved, the image classification method provided by the embodiment of the invention can accurately identify that the plant scenes in the image 1 are green vegetation trees, flowers, bouquets and florets from left to right, is not influenced by the diversity of scene categories and the single scene semantic, and solves the problem that the related technology is prone to errors when the image classification is carried out on the plant scenes.

It should be understood that the embodiment of the present invention may perform image classification on not only the pictures including the plant scenes, but also the pictures including other scenes, and only when the pictures including the color-sensitive scenes are classified, the accuracy is higher.

Color sensitive scenes are scenes that rely heavily on color features to distinguish among image categories, such as plant scenes.

The method comprises the steps of extracting a first feature and a second feature of a picture, wherein the first feature is a color histogram related to RGB channels of three primary colors, and the second feature is a global feature extracted through a convolutional neural network; connecting the first feature and the second feature to obtain a fused feature; and inputting the fusion features into a classifier for processing to obtain a classification result corresponding to the picture. According to the embodiment of the invention, the first characteristic and the second characteristic are combined, so that the accuracy of image recognition is improved, and particularly, when the image classification is carried out on the image containing the color-sensitive scene, the accuracy of image classification is higher. The embodiment of the invention has small calculation amount and is suitable for running on electronic equipment such as mobile phones, tablets and the like.

Before the image object classification is carried out on the picture by using the classifier, the classifier is required to be trained. The following is a training method of a classifier provided by an embodiment of the present invention, the training method of the classifier includes:

s201, acquiring a first characteristic and a second characteristic of the sample picture.

For the description of the first feature and the second feature, reference may be made to steps S101 and S102 in the above embodiments, which are not described herein again. And the second feature used for model training is the manually extracted second feature.

S202, training a second model by using the second characteristics of the sample picture, and determining the trained second model as the first model.

In the embodiment of the present invention, the second model may be obtained by training a convolutional neural network based on the second feature of the sample picture through a machine learning method. And after the training of the second model is finished, determining the trained second model as the first model. In practical application, the second model can be constructed by using convolutional neural network architectures such as AlexNet, VGGNet, Google inclusion Net or ResNet.

Referring to fig. 7, fig. 7 is a schematic diagram of a second model training process according to an embodiment of the present invention. As shown in fig. 7, all sample pictures participating in the second model training are downsampled, and the downsampled sample pictures are input into the second model for iterative training. And calculating the value of the loss function once each time the second model is trained, if the calculation result of the loss function does not reach the set value, adjusting the model parameters of the second model according to the calculation result of the loss function, returning to train the second model continuously until the calculation result of the loss function reaches the set value, stopping training the second model, and obtaining the trained second model.

S203, connecting the first characteristic and the second characteristic of the sample picture to obtain the fusion characteristic of the sample picture.

For the connection of features, reference may be made to step S103 in the foregoing method embodiment, which is not described herein again.

S204, training a third model by using the fusion characteristics of the sample pictures, and determining the trained third model as the classifier.

In an embodiment of the present invention, the third model is a nonlinear classifier, and the nonlinear classifier includes: decision tree classifier, random forest classifier, gradient lifting tree classifier, multilayer perceptron classifier, support vector machine classifier of Gaussian kernel, etc. The fitting capacity of the nonlinear classifier is strong, the classification dimensionality can be expanded, and the defect of low fitting capacity in nonlinear classification in the related technology is overcome.

Referring to fig. 8, fig. 8 is a schematic diagram of a third model training process according to an embodiment of the present invention. As shown in fig. 8, the fusion features of the sample pictures are taken as training samples, the fusion features are input into the third model for iterative training, after each training is completed, the value of the objective function is calculated once, if the value of the objective function does not reach the set value, the model parameters of the third model are adjusted according to the value of the objective function, then the third model is continuously trained by using the fusion features until the value of the objective function reaches the set value, the training is stopped, the trained third model is obtained, and the trained third model is determined as the classifier.

The classifier trained by fusing the characteristics has strong fitting capability, can accurately classify the pictures according to the objects in the pictures, and can improve the accuracy of image recognition particularly when the pictures containing color-sensitive scenes are subjected to image classification.

It should be understood that, in the above embodiment, step S203 is executed after step S201 is executed, and step S203 is not executed after step S202 is executed, and the sequence numbers of the steps in the above embodiment do not mean the execution sequence, and the execution sequence of the processes should be determined by the functions and the internal logic of the processes, and should not limit the implementation process of the embodiment of the present invention in any way.

The embodiment of the invention also provides another training method of the classifier, which comprises the following steps:

s301, acquiring a first feature and a second feature of the sample picture.

S302, training a second model by using the second characteristics of the sample picture, and determining the trained second model as the first model.

And S303, connecting the first characteristic and the second characteristic of the sample picture to obtain the fusion characteristic of the sample picture.

Step S301 to step S303 may refer to step S201 to step S203 in the above embodiments, which are not described herein again.

S304, inputting the fusion characteristics of the sample pictures into a full-connection layer in the first model, and training the first model to obtain the classifier.

Here, after the model is trained, the second feature may be extracted using the first model or the classifier during the image classification.

In the embodiment of the present invention, the first model is a linear classifier, and the common linear classifier includes: bayes classifier, single-layer perceptron classifier, linear regression classifier, linear kernel support vector machine classifier, etc. The linear classifier has the advantages of high processing speed, convenient programming and easy understanding.

And inputting the fusion characteristics into a full connection layer of the first model, and training the first model to obtain the classifier. Precisely, the fully-connected layer in the trained first model is used as a classifier, because the fully-connected layer plays a role of a "classifier" in the whole convolutional neural network. And inputting the fusion characteristics into the full connection layer to obtain a classification result of the sample picture.

The embodiment of the invention only needs to train one convolutional neural network, and has simple training process and shorter training time.

It should be understood that, in the above embodiment, step S303 is executed after step S301 is executed, and step S303 is not executed after step S302 is executed, the sequence number of each step in the above embodiment does not mean the order of execution, and the execution order of each process should be determined by its function and internal logic, and should not limit the implementation process of the embodiment of the present invention in any way.

Referring to fig. 9, fig. 9 is a schematic diagram of an image classification process according to an embodiment of the present invention. As shown in fig. 9, the whole process from the input picture to the classification result is shown. Firstly, after the picture is input, the picture is processed in two aspects, namely, on one hand, the first feature of the picture is extracted, and on the other hand, the picture is downsampled and then input into the first model to extract the second feature of the picture. And after the first feature and the second feature of the picture are respectively extracted, connecting the extracted first feature and the extracted second feature to obtain a fusion feature. And finally, inputting the fusion characteristics into a classifier, and outputting a classification result by the classifier.

Referring to fig. 10, a schematic flow chart of a model training method provided by an embodiment of the present invention is shown, where the execution subject of the method may be an electronic device such as a mobile phone, a tablet, a server, and so on. Referring to fig. 10, the model training method includes:

it should be understood that the subject of execution of the method may or may not coincide with the subject of execution of the image classification method in the above-described embodiments. When the two models are inconsistent, the trained models need to be migrated to the execution body of the image classification method.

S401, acquiring a first feature and a second feature of the sample picture.

For the description of the first feature and the second feature, reference may be made to steps S101 and S102 in the above embodiments, which are not described herein again. Wherein, the second feature used for training is the manually extracted second feature, because the model is not trained yet.

S402, training a second model by using the second characteristics of the sample picture, and determining the trained second model as the first model.

And S403, connecting the first feature and the second feature of the sample picture to obtain a fusion feature.

It should be understood that, as shown in fig. 10, step S403 is executed after step S401 is executed, and step S403 is not executed after step S402 is executed, the sequence number of each step in the above embodiment does not mean the execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiment of the present invention.

S404, training a third model by utilizing the fusion characteristics of the sample pictures, and determining the trained third model as the classifier.

The classifier trained by fusing the features has strong fitting capability, can accurately classify the pictures according to the objects in the pictures, and can improve the accuracy of image identification particularly when the pictures containing color-sensitive scenes are subjected to image classification.

Referring to fig. 11, a schematic flow chart of another model training method provided in the embodiment of the present invention is shown, where the execution subject of the method may be an electronic device such as a mobile phone, a tablet, a server, and so on. Referring to fig. 11, the model training method includes:

s501, acquiring a first feature and a second feature of a sample picture.

S502, training a second model by using the second characteristics of the sample picture, and determining the trained second model as the first model.

S503, connecting the first characteristic and the second characteristic of the sample picture to obtain a fusion characteristic of the sample picture.

Step S501 to step S503 may refer to step S401 to step S403 in the above embodiment, which is not described herein again.

S504, inputting the fusion characteristics of the sample pictures into a full-connection layer in the first model, and training the first model to obtain the classifier.

Here, after the model is trained, the second feature may be extracted using the first model or the classifier during the image classification process.

And inputting the fusion characteristics into a full connection layer of the first model, and training the first model to obtain the classifier. Specifically, the fully-connected layer in the trained first model is used as a classifier, because the fully-connected layer functions as a "classifier" in the whole convolutional neural network. And inputting the fusion characteristics into the full connection layer to obtain a classification result of the sample picture.

It should be understood that, the sequence numbers of the steps in the foregoing embodiments do not imply an execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present invention.

Referring to fig. 12, fig. 12 is a schematic diagram of an image classification apparatus according to an embodiment of the present invention, as shown in fig. 12, the apparatus includes: the device comprises a first extraction module, a second extraction module, a first connection module and a classification module.

The first extraction module is used for extracting first features of the first picture; the first feature is characterized as a color histogram for three primary RGB channels;

The connection module is specifically configured to:

The connection module is specifically configured to: connecting all fourth features included in the first features to obtain fifth features; the first features comprise three groups of fourth features, and each group of fourth features respectively corresponds to a red channel, a green channel and a blue channel in an RGB channel;

The connection module is specifically configured to: said connecting all fourth features included in said first feature comprises:

and connecting the three groups of fourth features according to a set channel sequence.

The second extraction module is specifically configured to:

down-sampling the first picture to obtain a second picture;

The apparatus further comprises a training module to:

acquiring a first feature and a second feature of a sample picture;

The training module is further configured to:

acquiring a first feature and a second feature of a sample picture;

Referring to fig. 13, fig. 13 is a schematic diagram of a model training apparatus according to an embodiment of the present invention, as shown in fig. 13, the apparatus includes: the device comprises a first acquisition module, a first training module, a second connection module and a second training module.

Referring to fig. 14, fig. 14 is a schematic diagram of another model training apparatus provided in the embodiment of the present invention, as shown in fig. 14, the apparatus includes: the training system comprises a second acquisition module, a third training module, a third connection module and a fourth training module.

the third connecting module is used for connecting the first characteristic and the second characteristic of the sample picture to obtain the fusion characteristic of the sample picture;

It should be noted that: in the image classification apparatus provided in the foregoing embodiment, when performing image classification, only the division of the modules is illustrated, and in practical applications, the processing may be distributed by different modules as needed, that is, the internal structure of the apparatus may be divided into different modules to complete all or part of the processing described above. In addition, the image classification apparatus and the image classification method provided in the above embodiments belong to the same concept, and specific implementation processes thereof are described in the method embodiments in detail, and are not described herein again.

Fig. 15 is a schematic diagram of an electronic device according to an embodiment of the invention. The electronic device includes: cell phones, tablets, servers, etc. As shown in fig. 15, the electronic apparatus of this embodiment includes: a processor, a memory, and a computer program stored in the memory and executable on the processor. The processor, when executing the computer program, implements the steps in the various method embodiments described above, such as steps 101 to 104 shown in fig. 1. Alternatively, the processor, when executing the computer program, implements the functions of the modules in the above device embodiments, such as the functions of the first extraction module, the second extraction module, the first connection module, and the classification module shown in fig. 12.

Illustratively, the computer program may be partitioned into one or more modules, stored in the memory and executed by the processor, to implement the invention. The one or more modules may be a series of computer program instruction segments capable of performing certain functions, which are used to describe the execution of the computer program in the electronic device.

The electronic device may include, but is not limited to, a processor, a memory. Those skilled in the art will appreciate that fig. 15 is merely an example of an electronic device and is not intended to be limiting and may include more or fewer components than those shown, or some components may be combined, or different components, e.g., the electronic device may also include input-output devices, network access devices, buses, etc.

The Processor may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The memory may be an internal storage unit of the electronic device, such as a hard disk or a memory of the electronic device. The memory may also be an external storage device of the electronic device, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like provided on the electronic device. Further, the memory may also include both an internal storage unit and an external storage device of the electronic device. The memory is used for storing the computer program and other programs and data required by the electronic device. The memory may also be used to temporarily store data that has been output or is to be output.

It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-mentioned division of the functional units and modules is illustrated, and in practical applications, the above-mentioned function distribution may be performed by different functional units and modules according to needs, that is, the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-mentioned functions. Each functional unit and module in the embodiments may be integrated in one processing unit, or each unit may exist alone physically, or two or more units are integrated in one unit, and the integrated unit may be implemented in a form of hardware, or in a form of software functional unit. In addition, specific names of the functional units and modules are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working processes of the units and modules in the system may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In the above embodiments, the description of each embodiment has its own emphasis, and reference may be made to the related description of other embodiments for parts that are not described or recited in any embodiment.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

In the embodiments provided in the present invention, it should be understood that the disclosed apparatus/electronic device and method may be implemented in other ways. For example, the above-described apparatus/electronic device embodiments are merely illustrative, and for example, the division of the modules or units is only one type of logical function division, and other division manners may exist in actual implementation, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated module/unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, all or part of the flow of the method according to the embodiments of the present invention may also be implemented by a computer program, which may be stored in a computer-readable storage medium, and when the computer program is executed by a processor, the steps of the method embodiments may be implemented. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, and the like. It should be noted that the computer-readable medium may contain suitable additions or subtractions depending on the requirements of legislation and patent practice in jurisdictions, for example, in some jurisdictions, computer-readable media may not include electrical carrier signals or telecommunication signals in accordance with legislation and patent practice.

The above-mentioned embodiments are only used for illustrating the technical solutions of the present invention, and not for limiting the same; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present invention, and are intended to be included within the scope of the present invention.

Claims

1. An image classification method, comprising:

extracting a second feature of the first picture through a first model; wherein the first model is a convolutional neural network, the convolutional neural network comprises a first convolutional layer and a second convolutional layer, the second feature comprises a shape feature and a texture feature, the shape feature is extracted based on the first convolutional layer, and the texture feature is extracted based on the second convolutional layer;

inputting the fusion features into a classifier for processing to obtain a classification result corresponding to the first picture; the classifier is used for classifying the pictures according to the objects in the pictures;

wherein the determining the fusion characteristics of the first picture comprises:

connecting three groups of fourth features contained in the first feature according to a set channel sequence to obtain a fifth feature; reducing the vector dimension of the fifth feature to one dimension to obtain a third feature; wherein each of the three sets of fourth features respectively corresponds to a red channel, a green channel and a blue channel in an RGB channel;

connecting the third feature with the second feature to obtain a fused feature; the vector dimension of the first feature is two-dimensional, and the vector dimension of the second feature is one-dimensional.

2. The image classification method according to claim 1, wherein the extracting the second feature of the first picture through the first model comprises:

down-sampling the first picture to obtain a second picture;

3. The image classification method according to claim 1, wherein the training process of the classifier comprises:

acquiring a first feature and a second feature of a sample picture;

and training a third model by utilizing the fusion characteristics of the sample pictures, and determining the trained third model as the classifier.

4. The image classification method according to claim 1, wherein the training process of the classifier comprises:

acquiring a first feature and a second feature of a sample picture;

5. A model training method for training the classifier of any one of claims 1 to 2, the model training method comprising:

acquiring a first feature and a second feature of a sample picture;

training a second model by using second characteristics of the sample picture, and determining the trained second model as a first model;

6. A model training method for training the classifier of any one of claims 1 to 2, the model training method comprising:

acquiring a first feature and a second feature of a sample picture;

7. An image classification apparatus, comprising:

the second extraction module is used for extracting second characteristics of the first picture through the first model; wherein the first model is a convolutional neural network, the convolutional neural network comprises a first convolutional layer and a second convolutional layer, the second feature comprises a shape feature and a texture feature, the shape feature is extracted based on the first convolutional layer, and the texture feature is extracted based on the second convolutional layer;

the classification module is used for inputting the fusion features into a classifier for processing to obtain a classification result corresponding to the first picture; the classifier is used for classifying the pictures according to the objects in the pictures;

the first connection module is specifically configured to connect three groups of fourth features included in the first feature according to a set channel sequence to obtain a fifth feature; reducing the vector dimension of the fifth feature to one dimension to obtain a third feature; wherein each of the three sets of fourth features respectively corresponds to a red channel, a green channel and a blue channel in an RGB channel;

8. A model training apparatus for training the classifier of any one of claims 1 to 2, the model training apparatus comprising:

the second connection module is used for connecting the first characteristic and the second characteristic of the sample picture to obtain the fusion characteristic of the sample picture;

9. A training apparatus for a model, the training apparatus being used for training the classifier of any one of claims 1 to 2, the training apparatus further comprising:

10. An electronic device comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor implements the image classification method of any one of claims 1 to 4 or the model training method of any one of claims 8 to 9 when executing the computer program.

11. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program comprising program instructions which, when executed by a processor, cause the processor to carry out the image classification method according to any one of claims 1 to 4 or to carry out the model training method according to any one of claims 5 to 6.