CN111325271A

CN111325271A - Image classification method and device

Info

Publication number: CN111325271A
Application number: CN202010101515.6A
Authority: CN
Inventors: 孙哲
Original assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Current assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Priority date: 2020-02-18
Filing date: 2020-02-18
Publication date: 2020-06-23
Anticipated expiration: 2040-02-18
Also published as: CN111325271B; WO2021164550A1

Abstract

The embodiment of the application discloses an image classification method and device, which are applied to terminal equipment, and the method comprises the following steps: the method comprises the steps of obtaining an image to be classified, dividing the image to be classified to obtain M local image blocks, clustering the M local image blocks to obtain N clustering results, determining the global characteristics of the image to be classified according to the M local image blocks and the N clustering results, determining the classification results of the image to be classified according to the global characteristics, reducing the redundant calculation amount of image classification, realizing simplification and acceleration of an algorithm, and providing a possible way for the operation of the image with large calculation amount on terminal hardware.

Description

Image classification method and device

Technical Field

The present application relates to the field of image processing technologies, and in particular, to an image classification method and apparatus.

Background

In recent years, image classification has attracted great research interest, and meanwhile, the image classification is successfully deployed in many application products, such as mobile phones, personal computers and other terminal devices, and intelligently solves many practical image processing problems. With the rapid development of deep learning technology, deep learning has become an advanced technology in image classification. However, the existing deep learning model usually adopts an end-to-end calculation method to obtain a result, and for an image with a small calculation amount or an image with a small input resolution, the existing terminal hardware may meet the performance requirement, but for an image with a large calculation amount or an image with a high resolution, the existing deep learning model may not be able to run on the terminal hardware.

Content of application

The embodiment of the application provides an image classification method and device, which can reduce the calculation amount of image classification and provide a possible way for the image with large calculation amount to run on terminal hardware.

In a first aspect, an embodiment of the present application provides an image classification method, where the method includes:

acquiring an image to be classified;

dividing the image to be classified to obtain M local image blocks, wherein M is a positive integer greater than 1;

clustering the M local image blocks to obtain N clustering results, wherein N is a positive integer greater than 1;

determining the global features of the image to be classified according to the M local image blocks and the N clustering results, wherein the global features are feature vectors of the image to be classified;

and determining the classification result of the image to be classified according to the global features.

In a second aspect, an embodiment of the present application provides an image classification apparatus, including:

an acquisition unit for acquiring an image to be classified;

the dividing unit is used for dividing the image to be classified to obtain M local image blocks, wherein M is a positive integer greater than 1;

the clustering unit is used for clustering the M local image blocks to obtain N clustering results, wherein N is a positive integer greater than 1;

a first determining unit, configured to determine a global feature of the image to be classified according to the M local image blocks and the N clustering results, where the global feature is a feature vector of the image to be classified;

and the second determining unit is used for determining the classification result of the image to be classified according to the global features.

In a third aspect, an embodiment of the present application provides an electronic device, including a processor, a memory, a communication interface, and one or more programs, where the one or more programs are stored in the memory and configured to be executed by the processor, and the program includes instructions for performing the steps in the method according to the first aspect of the embodiment of the present application.

In a fourth aspect, an embodiment of the present application provides a computer-readable storage medium, where the computer-readable storage medium stores a computer program for electronic data exchange, where the computer program enables a computer to perform some or all of the steps described in the first aspect of the embodiment of the present application.

In a fifth aspect, embodiments of the present application provide a computer program product, where the computer program product includes a non-transitory computer-readable storage medium storing a computer program, where the computer program is operable to cause a computer to perform some or all of the steps as described in the first aspect of the embodiments of the present application. The computer program product may be a software installation package.

The embodiment of the application has the following beneficial effects:

the image to be classified is divided to obtain M local image blocks, the M local image blocks are clustered to obtain N clustering results, the global feature of the image to be classified is determined according to the M local image blocks and the N clustering results, and the classification result of the image to be classified is determined according to the global feature. In the method for optimizing the end-to-end deep learning model algorithm by combining the traditional algorithm with the deep learning algorithm, the redundant calculation amount of image classification is reduced, the simplification and acceleration of the algorithm are realized, and a possible path is provided for the operation of the image with large calculation amount on terminal hardware.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings without creative efforts.

Fig. 1 is a schematic flowchart of an image classification method provided in an embodiment of the present application;

fig. 2 is a schematic flowchart of image partitioning according to an embodiment of the present disclosure;

fig. 3 is a schematic flowchart of image clustering provided in an embodiment of the present application;

fig. 4 is a schematic flow chart of obtaining an image feature vector according to an embodiment of the present application;

FIG. 5 is a block diagram of functional units of another image classification apparatus according to an embodiment of the present disclosure;

fig. 6 is a schematic structural diagram of a terminal device according to an embodiment of the present application.

Detailed Description

In order to make the technical solutions of the present application better understood, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

The terms "first," "second," and the like in the description and claims of the present application and in the above-described drawings are used for distinguishing between different objects and not for describing a particular order. Furthermore, the terms "include" and "have," as well as any variations thereof, are intended to cover non-exclusive inclusions. For example, a process, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements but may alternatively include other steps or elements not expressly listed or inherent to such process, system, article, or apparatus.

Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments.

In particular implementations, the terminal devices described in embodiments of the present application include, but are not limited to, other portable devices such as mobile phones, laptop computers, or tablet computers having touch sensitive surfaces (e.g., touch screen displays and/or touch pads). It should also be understood that in some embodiments, the device is not a portable communication device, but is a desktop computer having a touch-sensitive surface (e.g., a touch screen display and/or touchpad).

In embodiments of the present application, a terminal device comprising a display and a touch-sensitive surface is described. However, it should be understood that the terminal device may include one or more other physical user interface devices such as a physical keyboard, mouse, and/or joystick.

The terminal device supports various applications, such as one or more of the following: a drawing application, a presentation application, a word processing application, a website creation application, a disc burning application, a spreadsheet application, a gaming application, a telephone application, a video conferencing application, an email application, an instant messaging application, an exercise support application, a photo management application, a digital camera application, a web browsing application, a digital music player application, and/or a digital video player application.

Various applications that may be executed on the terminal device may use at least one common physical user interface device, such as a touch-sensitive surface. One or more functions of the touch sensitive surface and corresponding information displayed on the terminal can be adjusted and/or changed between applications and/or within respective applications. In this way, a common physical architecture (e.g., touch-sensitive surface) of the terminal can support various applications with user interfaces that are intuitive and transparent to the user.

At present, end-to-end calculation is mostly adopted by a deep learning model to obtain results, when the deep learning model is used for image classification, a simple task can meet performance requirements on general terminal equipment hardware, but when an image with an original resolution is processed, for example, in the field of image enhancement, a user pays more attention to details of the image, and at the moment, an input image cannot be zoomed, so that the calculation amount is very large, and the algorithm cannot be operated on the terminal equipment. Therefore, the existing image classification algorithm can meet the requirements in a simple task or when the input image is a small-resolution image, but the existing image classification algorithm cannot be operated on terminal equipment hardware such as a mobile phone when the processing calculation amount is large or the input image is an original image.

The image classification method comprises the steps of obtaining an image to be classified, dividing the image to be classified to obtain M local image blocks, clustering the M local image blocks to obtain N clustering results, determining global features of the image to be classified according to the M local image blocks and the N clustering results, and determining the classification results of the image to be classified according to the global features, so that the calculated amount of image classification is reduced, and the image classification of the image with large operation calculated amount or high resolution on a terminal device is realized.

In order to explain the technical solution described in the present application, the following detailed description is given by way of specific examples.

Referring to fig. 1, fig. 1 is a schematic flowchart of an image classification method provided in an embodiment of the present application, where the image classification method is applied to a terminal device, and as shown in the figure, the image classification method may include the following steps:

and S110, acquiring an image to be classified.

In this embodiment of the application, the image to be classified may be locally obtained from the terminal device, and the image to be classified sent by other devices may also be received, which is not limited herein. The local acquisition of the image to be classified from the terminal device may refer to the acquisition of the image to be classified from a memory of the terminal device, or may refer to the acquisition of a photo that is not stored in the memory when the terminal device takes a picture. When the image to be classified is a photo which is shot by the terminal equipment and is not stored in the memory, the photo can be classified between the pictures stored in the memory without being classified subsequently.

The image to be classified may refer to an image of a category to be detected. The image generally comprises a main body and a background, wherein the main body is an object mainly represented by the image, the background is a scene which is used for setting off the main body in the image, the category of the image is determined according to the main body in the image, for example, the main body in the image is a building, and then the category of the image is a building class; if the subject in the image is a green plant, then the class of the image is the green plant class.

Wherein, the image to be classified can be one or more.

In an implementation manner of the embodiment of the present application, the image to be detected may be an image obtained by various applications executed on the terminal device, for example, a drawing application, a surveying application, a word processing application, a photo management application, and the like, and different applications or application scenes may have different main bodies and backgrounds in the image, for example, for the surveying application, the main bodies may include entities representing geographical appearances, such as buildings, roads, trees, rivers, and the like, and for the word processing application, the main bodies mainly include words. Therefore, the terminal device can simply classify the image to be classified according to the application program and/or the application scene to which the image to be classified belongs, namely, the application program and/or the application scene to which the image to be classified belongs is determined according to the source of the image to be classified, so that the processing of the image to be classified is simplified.

Optionally, before dividing the image to be classified, the method further includes: and acquiring an original image, and preprocessing the original image to obtain an image to be classified.

Before the original image is specifically processed, the original image may be subjected to image compression, increase, decrease, restoration, and other processing, for example, the size of the original image is cut into a uniform format, for example, the size of the image to be classified may be uniformly cut into 512 × 512 sizes, and the original image may be subjected to a normalization operation, so as to obtain the image to be classified.

Further, when the resolution of the original image is too high or the size of the original image is too large to exceed the hardware requirement of the terminal device, the terminal device may compress the original image first, thereby further reducing the amount of calculation for processing the image to be classified.

And S120, dividing the image to be classified to obtain M local image blocks, wherein M is a positive integer greater than 1.

Specifically, the terminal device may divide the image to be classified into M identical local image blocks according to a preset image size. For example, as shown in fig. 2, the image to be classified may be divided into 9 rectangular partial image blocks in a squared manner. The terminal equipment can also divide the image to be classified into M local image blocks with the same or different sizes according to a preset image size list or a pattern; or, the terminal device may also randomly divide the image to be classified into M local image blocks, which is not limited in the embodiment of the present application. According to the image classification method and device, the M local image blocks can be used as input data to perform image classification, so that the diversity of the data is increased, and the robustness of an image classification model can be improved.

In an implementation manner of the embodiment of the present application, whether the sizes of the local image blocks are the same may be set according to the scenes to which the images belong, and the scenes to which the images belong may be divided according to whether the scene features of the images are fixed, for example, the first scene is an image scene with unfixed scene features, such as a natural scene; the second scene is an image scene with fixed scene features, such as a plant scene. For example, an image belonging to a natural scene, which generally has no clearly fixed scene features, may extract a local image block of the image belonging to the natural scene using a size list to extract valid local features; the images belonging to the plant scene generally have fixed scene images, the local features of the images belonging to the plant scene are extracted by using the same preset size, if the local features of the images belonging to the plant scene are extracted by using the size list, the extracted local features may not be obvious and are easily interfered by other information (such as background information) because the size in the size list may change greatly, for example, one image has three or four flowers in a large piece of grass, which belong to a flower cluster and do not belong to a grass cluster, and if the local features of the images are extracted by using the size list to judge the category of the image, the category of the image may be judged as the grass cluster. The scene features may refer to image features that can characterize a scene to which the image belongs. The fixed scene features may mean that the scene features are distributed in the image more intensively and are not dispersed; the scene characteristics are not fixed, which means that the scene characteristics are distributed in the image more dispersedly and not concentrated. The scene to which the image to be detected belongs may refer to a scene to which a main body in the image to be detected belongs, for example, if the main body of the image is a flower, the scene to which the image belongs is a plant scene; the main body of the image is a sand beach, a valley, etc., and the scene to which the image belongs is a natural scene.

It should be noted that the local image blocks are not overlapped, and the blocks of the local image blocks may be rectangles or irregular polygons, and the shape of the local image blocks is not limited in this embodiment of the present application.

In this embodiment of the present application, the terminal device may determine the size of M according to the resolution of the image to be classified, and the resolution of the image to be classified may have a direct proportional relationship or a mapping relationship with the size of M, that is, the higher the resolution of the image to be classified is, the larger the number M of local image blocks into which the terminal device may divide the image to be classified is; the lower the resolution of the image to be classified is, the smaller the number M of the terminal device that can divide the image to be classified into local image blocks is. By dividing the high-resolution image into a plurality of local image blocks, the complexity of an image classification algorithm can be reduced, and the simplification and acceleration of the algorithm are realized.

In an implementation manner of the embodiment of the present application, the terminal device may determine the shape and the number of the local image blocks according to an application program and/or an application scene to which an image to be classified belongs, for example, for a remote sensing image generated by a survey application program, feature distribution conditions of different types of images in a high resolution image of the remote sensing image are concentrated, that is, main bodies belonging to the same type are distributed and concentrated on the whole image, and the terminal device may divide the image to be classified into a plurality of local image blocks with the same rectangular size; for the character image generated in the photographing application program, most of the main body of the character image is distributed in the middle of the whole image, the terminal device can be sequentially divided from the middle, and the size of the local image block in the middle part can be larger, so that the local image block contains more features.

S130, clustering the M local image blocks to obtain N clustering results.

Optionally, the clustering the M local image blocks to obtain N clustering results includes: clustering the M local image blocks by adopting a non-supervised learning clustering method to obtain N clustering results, wherein N is a positive integer greater than 1.

In this embodiment of the application, before clustering the M local image blocks, a classification model needs to be used to obtain local features of the local image blocks. Therefore, the classification model needs to be trained first, and the trained classification model is used to obtain the local features of the local image blocks. When the classification training model is trained, the training samples are divided to obtain a plurality of local training samples, then the local training samples are sampled, the sampled training samples are input into the classification model, the classification model outputs the local features of the training samples, and a loss function is used for training and returning.

In the embodiment of the application, the classification model can be used for respectively extracting the features of the M local image blocks to obtain M local features, and in order to overcome the limitation of a single feature, the application can adopt the combined local feature formed by combining multiple features to perform cluster analysis. The common local features of the image may include color features, LBP features, texture features, etc., where the color features of the image have less dependence on the size, direction, and viewing angle of the image itself, and the common color histogram features describe the proportion of different colors in the entire image; the texture is important spatial information of the remote sensing image, and the internal structure of the ground object is clearer along with the improvement of the resolution, which shows that the texture structure of the ground object is more and more obvious in the remote sensing image; relative to the spectral information, the textural features can reflect regular spatial variation of pixels within the target ground object. Therefore, the terminal device can select different features or feature combinations according to different application scenes or application programs, for example, the remote sensing image can adopt color features and texture features to combine into combined local features; local features or local feature combinations can also be selected according to preset feature options, which is not limited in the embodiments of the present application.

The combined local features may be one-dimensional, that is, the combined features are spliced, for example, texture features may be spliced behind color features; the combined local features may also be multidimensional, that is, a plurality of combined features may be a feature matrix, which is not limited in this embodiment of the application.

Further, color histogram features are extracted based on an HSL (Hue, Saturation, brightness) color space, and the HSL color space is more suitable for the visual perception characteristics of human eyes than the RGB color space.

In the embodiment of the application, local image blocks belonging to the same category in the M local image blocks are clustered together based on the extracted M local features, so as to obtain N clustering results. The size of the M local image blocks depends on the number of categories contained in the M local image blocks, as shown in fig. 3, the number in the local image block indicates the category of the local image block, 9 local image blocks are clustered, and the local image blocks in the same category are clustered into the same category, so that 4 clustering results are obtained, wherein a clustering result 1 contains 3 local image blocks, and clustering results 2-4 respectively contain 2 local image blocks.

Wherein, the clustering method of unsupervised learning includes but is not limited to: a K-means clustering algorithm, a Birch clustering algorithm, a DBSCAN clustering algorithm and a K nearest neighbor classification algorithm.

Generally, a good clustering partition should reflect the internal structure of the data set as much as possible, so that the classes in the same class are as identical as possible, and the classes in the same class are as different as possible, for example, taking K-means clustering algorithm as an example, from the perspective of distance, the clustering with extremely small distance in the same class and extremely large distance between the classes is the optimal clustering.

S140, determining the global characteristics of the image to be classified according to the M local image blocks and the N clustering results.

The global features are feature vectors of the images to be classified, the global features and the local features are image features of the images to be classified, the global features are feature vectors extracted from the whole image to be detected and come from features of the whole image to be detected; the local features refer to feature vectors extracted from local image blocks of the whole image to be detected, and are features of the local image blocks from the image to be detected.

Further, in order to overcome the limitation of a single feature, the application may use a combined global feature combined by multiple features for image classification. The global features of the image may include color features, LBP features, texture features, etc., and the terminal device may select different features or feature combinations according to different application scenes or application programs, for example, the remote sensing image may adopt the color features and the texture features to combine into a combined global feature; the global feature or the global feature combination may also be selected according to a preset feature option, which is not limited in the embodiment of the present application.

Optionally, the determining the global features of the image to be classified according to the M local image blocks and the N clustering results includes: performing convolution operation on the M local image blocks and each clustering result in the N clustering results respectively to obtain a first feature vector; and carrying out binary image coding on the first feature vector to obtain the global feature.

For example, as shown in fig. 4, 9 local image blocks are respectively convolved with 4 clustering results obtained by clustering the 9 local image blocks, so as to obtain a first feature vector of the image to be classified, where the first feature vector is used to describe global information of the image to be classified.

The first feature vector can be one-dimensional, and the feature vectors obtained by convolving the local image blocks and the clustering result can be spliced to obtain the first feature vector; the first feature vector may also be multidimensional, which is not limited in this application.

Wherein, the first feature vector may also be one or more.

It should be noted that, performing convolution operation on the M local image blocks and each of the N clustering results respectively, or performing convolution on local features extracted from the local image blocks and local features extracted from the clustering results respectively to obtain a first feature vector.

Optionally, the performing binary image coding on the first feature vector to obtain the global feature includes: and setting a value which is larger than a first value in the first feature vector as a second value and setting a value which is smaller than or equal to the first value in the first feature vector as the first value according to a binary image coding rule to obtain the global feature.

Wherein the first value may be 0 and the second value may be 1; the first value may also be 255 and the second value may be 0, for example, each pixel of the image may be represented by a value between (0,255), the binary image encoding is to convert the image into a gray-scale image, each pixel of the gray-scale image has only two possible values or gray-scale states, and the global feature may be a feature represented by 8 bits. Of course, the first value and the second value may be other values, which is not limited in this embodiment of the application.

In this embodiment of the application, before performing convolution operation on the M local image blocks and each of the N clustering results, a classification model needs to be used to obtain local features of the local image blocks and the clustering results, and the classification model may be used to obtain local features of the local image blocks and the clustering results.

In an implementation manner of the embodiment of the present application, the terminal device may input the local image block and the clustering result into a preset algorithm, and process the image by using the preset algorithm. The preset algorithm may be a Fast Region Based Convolutional neural network (RCNN) algorithm. For example, during implementation, a user may preset a convolution window in the Fast RCNN algorithm, and after the terminal inputs the local image block and the clustering result into the Fast RCNN algorithm, the terminal performs convolution on the image by using the convolution window to obtain the first feature vector. The first feature vector is a complete matrix obtained by convolving the image.

S150, determining the classification result of the image to be classified according to the global features.

Optionally, the determining the classification result of the image to be classified according to the global feature includes: performing convolution operation on the global features and convolution vectors to obtain probability vectors of the global features, wherein the convolution vectors are obtained by training image samples of labeled categories; and taking the category corresponding to the maximum value in the probability vector as the classification result of the image to be classified.

The size of the convolution operation is 1 × 1, a value in the probability vector represents a probability that the image to be classified belongs to each category, and a category corresponding to a maximum value can be used as a classification result of the image to be classified according to the size of a median value in the probability vector.

In the embodiment of the present application, a convolution vector needs to be obtained first, the convolution vector may be obtained by training the original training samples of each category in the image library, or the convolution vector may be obtained by training the collected image samples labeled with the image category, which is not limited in the embodiment of the present application. It should be noted that the convolution vector may depend on the image sample of the labeled image category, and for image samples of different labeled image categories, the convolution vector may also be different, so that the terminal device may select an appropriate image sample to train to obtain the convolution vector according to an application scenario or an application program, for example, for an observed terminal device, in order to accurately classify a remote sensing image, detailed ground information is provided, and the remote sensing image sample of the labeled category may be directly used to train to obtain the convolution vector, thereby improving the accuracy of image classification.

The probability vector of the image to be classified can be directly obtained after convolution operation is carried out on the global features and the convolution vector, and due to the fact that 1x1 convolution calculation amount is small, parallelization acceleration is obvious, calculation amount and memory occupancy rate of original data are greatly reduced, the image classification algorithm is optimized under the condition that accuracy is guaranteed, and on the premise that image classification accuracy is guaranteed, calculation cost of a non-supervised learning method is greatly reduced.

In an implementation manner of the embodiment of the present application, the method further includes: acquiring a first image set, wherein the first image set comprises images of labeled image categories; training a classifier to be trained by using the first image set to obtain a first classifier;

the determining the classification result of the image to be classified according to the global features comprises: and inputting the global features into the first classifier, and outputting a classification result of the image to be classified.

The global features of the image to be classified are input into the classifier, so that the classifier can classify the image to be classified according to the global features of the image to be classified, for example, for a remote sensing image, the classifier can classify the image to be classified according to color features and texture features in the global features.

In the embodiment of the present application, before the classifier is used to obtain the category of the image to be classified, the classifier needs to be trained first, and the trained classifier is used to obtain the category of the image to be classified. : when the classifier is trained, the trained classifier is used for obtaining the global features of the images in the first image set, the global features are input into the classifier, the classifier outputs the classes of the images in the first image set, and target supervision is used for carrying out return training on the classifier. Wherein the target supervision is supervised learning in deep learning, such as a loss function.

Further, the classifier may be a model for classifying the image to be classified according to the image features of the image to be classified, and optionally, the classifier in the present application may be a non-linear classifier, such as a Support Vector Machine (SVM). The nonlinear classifier can effectively expand the classification dimensionality and reduce the defects of linear classifiers such as softmax and full connection layers in nonlinear classification.

In the embodiment of the application, a clustering result of a local image block is obtained by an unsupervised learning clustering method, the global feature of an image to be classified is obtained by using a classification model with supervised learning, and the category of the image to be classified is determined according to the global feature by using the classification model with supervised learning. The method and the device improve the performance of the image classification algorithm and reduce the power consumption by combining the unsupervised learning and the supervised learning deep learning algorithm.

The first classifier and the convolution vector can be obtained by training the classifier to be trained through one image sample, and the first classifier and the convolution vector can also be obtained by training the classifier to be trained through different image samples respectively.

The image classification method is applied to terminal equipment, and comprises the steps of obtaining an image to be classified, dividing the image to be classified to obtain M local image blocks, clustering the M local image blocks to obtain N clustering results, determining global characteristics of the image to be classified according to the M local image blocks and the N clustering results, determining the classification result of the image to be classified according to the global characteristics, reducing redundant calculation amount of image classification, realizing simplification and acceleration of an algorithm, and providing a possible way for the operation of the image with large calculation amount on terminal hardware.

The above description has introduced the solution of the embodiment of the present application mainly from the perspective of the method-side implementation process. It is understood that the electronic device comprises corresponding hardware structures and/or software modules for performing the respective functions in order to realize the above-mentioned functions. Those of skill in the art will readily appreciate that the present application is capable of hardware or a combination of hardware and computer software implementing the various illustrative elements and algorithm steps described in connection with the embodiments provided herein. Whether a function is performed as hardware or computer software drives hardware depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

In the embodiment of the present application, the electronic device may be divided into the functional units according to the method example, for example, each functional unit may be divided corresponding to each function, or two or more functions may be integrated into one processing unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit. It should be noted that the division of the unit in the embodiment of the present application is schematic, and is only a logic function division, and there may be another division manner in actual implementation.

Referring to fig. 5, fig. 5 is a block diagram of functional units of an image classification apparatus according to an embodiment of the present application, applied to a terminal device, and as shown in fig. 5, the apparatus includes:

an obtaining unit 510, configured to obtain an image to be classified;

a dividing unit 520, configured to divide the image to be classified to obtain M local image blocks, where M is a positive integer greater than 1;

a clustering unit 530, configured to cluster the M local image blocks to obtain N clustering results, where N is a positive integer greater than 1;

a first determining unit 540, configured to determine a global feature of the image to be classified according to the M local image blocks and the N clustering results, where the global feature is a feature vector of the image to be classified;

a second determining unit 550, configured to determine a classification result of the image to be classified according to the global feature.

In an implementation manner of the embodiment of the present application, the first determining unit 540 is specifically configured to: performing convolution operation on the M local image blocks and each clustering result in the N clustering results respectively to obtain a first feature vector; and carrying out binary image coding on the first feature vector to obtain the global feature.

In an implementation manner of the embodiment of the present application, the first determining unit 540 is further specifically configured to: and setting a value which is larger than a first value in the first feature vector as a second value and setting a value which is smaller than or equal to the first value in the first feature vector as the first value according to a binary image coding rule to obtain the global feature.

In an implementation manner of the embodiment of the present application, the second determining unit 550 is specifically configured to: performing convolution operation on the global features and convolution vectors to obtain probability vectors of the global features, wherein the convolution vectors are obtained by training image samples of labeled categories; and taking the category corresponding to the maximum value in the probability vector as the classification result of the image to be classified.

In one implementation of the embodiment of the present application, the size of the convolution operation is 1 × 1.

In an implementation manner of the embodiment of the present application, the obtaining unit 510 is further configured to: a first image set is obtained, the first image set including images of labeled image classes.

In an implementation manner of the embodiment of the present application, the apparatus further includes a training unit 560, where the training unit 560 is configured to train a classifier to be trained by using the first image set, so as to obtain a first classifier.

In an implementation manner of the embodiment of the present application, the second determining unit 550 is further specifically configured to: and inputting the global features into the first classifier, and outputting a classification result of the image to be classified.

In an implementation manner of the embodiment of the present application, the clustering unit 530 is specifically configured to: and clustering the M local image blocks by adopting an unsupervised learning clustering method to obtain the N clustering results.

In an implementation manner of the embodiment of the present application, before the dividing the image to be classified, the obtaining unit is further configured to: and acquiring an original image, and preprocessing the original image to obtain an image to be classified.

It can be understood that the functions of each program module of the image classification device in the embodiment of the present application can be specifically implemented according to the method in the foregoing method embodiment, and the specific implementation process of the method can refer to the related description of the foregoing method embodiment, which is not described herein again.

The image classification device is applied to terminal equipment, and is used for obtaining an image to be classified, dividing the image to be classified to obtain M local image blocks, clustering the M local image blocks to obtain N clustering results, determining global characteristics of the image to be classified according to the M local image blocks and the N clustering results, and determining the classification result of the image to be classified according to the global characteristics, so that redundant computation of image classification is reduced, simplification and acceleration of an algorithm are realized, and a possible way is provided for the operation of the image with large computation on terminal hardware.

Referring to fig. 6, fig. 6 is a schematic structural diagram of a terminal device according to an embodiment of the present disclosure, as shown in fig. 6, the terminal device includes one or more processors, one or more memories, one or more communication interfaces, and one or more programs;

the one or more programs are stored in the memory and configured to be executed by the one or more processors;

the program includes instructions for performing the steps of:

acquiring an image to be classified;

In an implementation manner of the embodiment of the present application, the program includes instructions further configured to: performing convolution operation on the M local image blocks and each clustering result in the N clustering results respectively to obtain a first feature vector; and carrying out binary image coding on the first feature vector to obtain the global feature.

In an implementation manner of the embodiment of the present application, the program includes instructions further configured to: and setting a value which is larger than a first value in the first feature vector as a second value and setting a value which is smaller than or equal to the first value in the first feature vector as the first value according to a binary image coding rule to obtain the global feature.

In an implementation manner of the embodiment of the present application, the program includes instructions further configured to: performing convolution operation on the global features and convolution vectors to obtain probability vectors of the global features, wherein the convolution vectors are obtained by training image samples of labeled categories; and taking the category corresponding to the maximum value in the probability vector as the classification result of the image to be classified.

In an implementation manner of the embodiment of the present application, the program includes instructions further configured to: acquiring a first image set, wherein the first image set comprises images of labeled image categories; and training a classifier to be trained by using the first image set to obtain a first classifier.

In an implementation manner of the embodiment of the present application, the program includes instructions further configured to: and inputting the global features into the first classifier, and outputting a classification result of the image to be classified.

In an implementation manner of the embodiment of the present application, the program includes instructions further configured to: and clustering the M local image blocks by adopting an unsupervised learning clustering method to obtain the N clustering results.

In an implementation manner of the embodiment of the present application, before the dividing the image to be classified, the program includes instructions further configured to: and acquiring an original image, and preprocessing the original image to obtain an image to be classified.

Embodiments of the present application also provide a computer storage medium, where the computer storage medium stores a computer program for electronic data exchange, and the computer program enables a computer to execute part or all of the steps of any one of the methods described in the above method embodiments, and the computer includes a terminal device.

Embodiments of the present application also provide a computer program product comprising a non-transitory computer readable storage medium storing a computer program operable to cause a computer to perform some or all of the steps of any of the methods as described in the above method embodiments. The computer program product may be a software installation package, the computer comprising the terminal device.

It should be noted that, for simplicity of description, the above-mentioned method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present application is not limited by the order of acts described, as some steps may occur in other orders or concurrently depending on the application. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required in this application.

In the foregoing embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus may be implemented in other manners. For example, the above-described embodiments of the apparatus are merely illustrative, and for example, the above-described division of the units is only one type of division of logical functions, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of some interfaces, devices or units, and may be an electric or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated unit may be stored in a computer readable memory if it is implemented in the form of a software functional unit and sold or used as a stand-alone product. Based on such understanding, the technical solution of the present application may be substantially implemented or a part of or all or part of the technical solution contributing to the prior art may be embodied in the form of a software product stored in a memory, and including several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the above-mentioned method of the embodiments of the present application. And the aforementioned memory comprises: a U-disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic or optical disk, and other various media capable of storing program codes.

Those skilled in the art will appreciate that all or part of the steps in the methods of the above embodiments may be implemented by associated hardware instructed by a program, which may be stored in a computer-readable memory, which may include: flash memory, ROM, RAM, magnetic or optical disks, and the like.

The foregoing detailed description of the embodiments of the present application has been presented to illustrate the principles and implementations of the present application, and the above description of the embodiments is only provided to help understand the method and the core concept of the present application; meanwhile, for a person skilled in the art, according to the idea of the present application, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present application.

Claims

1. A method of image classification, the method comprising:

acquiring an image to be classified;

2. The method according to claim 1, wherein the determining the global feature of the image to be classified according to the M local image blocks and the N clustering results comprises:

performing convolution operation on the M local image blocks and each clustering result in the N clustering results respectively to obtain a first feature vector;

and carrying out binary image coding on the first feature vector to obtain the global feature.

3. The method of claim 2, wherein the binary image coding the first feature vector to obtain the global feature comprises:

and setting a value which is larger than a first value in the first feature vector as a second value and setting a value which is smaller than or equal to the first value in the first feature vector as the first value according to a binary image coding rule to obtain the global feature.

4. The method according to any one of claims 1-3, wherein the determining the classification result of the image to be classified according to the global features comprises:

performing convolution operation on the global features and convolution vectors to obtain probability vectors of the global features, wherein the convolution vectors are obtained by training image samples of labeled categories;

and taking the category corresponding to the maximum value in the probability vector as the classification result of the image to be classified.

5. The method of claim 4, wherein the convolution operation has a size of 1x 1.

6. The method according to any one of claims 1-3, further comprising:

acquiring a first image set, wherein the first image set comprises images of labeled image categories;

training a classifier to be trained by using the first image set to obtain a first classifier;

7. The method according to any one of claims 1 to 6, wherein the clustering the M local image blocks to obtain N clustering results comprises:

and clustering the M local image blocks by adopting an unsupervised learning clustering method to obtain the N clustering results.

8. An image classification apparatus, characterized in that the apparatus comprises:

an acquisition unit for acquiring an image to be classified;

9. A terminal device, characterized in that the terminal device comprises a processor, a memory, a communication interface, and one or more programs stored in the memory and configured to be executed by the processor, the programs comprising instructions for carrying out the steps in the method according to any one of claims 1-7.

10. A computer-readable storage medium, characterized in that the computer-readable storage medium comprises a computer program stored for electronic data exchange, the computer program causing a computer to perform the method according to any one of claims 1-7.