CN111325271B

CN111325271B - Image classification method and device

Info

Publication number: CN111325271B
Application number: CN202010101515.6A
Authority: CN
Inventors: 孙哲
Original assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Current assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Priority date: 2020-02-18
Filing date: 2020-02-18
Publication date: 2023-09-12
Anticipated expiration: 2040-02-18
Also published as: CN111325271A; WO2021164550A1

Abstract

The embodiment of the application discloses an image classification method and device, which are applied to terminal equipment, wherein the method comprises the following steps: the method comprises the steps of obtaining an image to be classified, dividing the image to be classified to obtain M local image blocks, clustering the M local image blocks to obtain N clustering results, determining global features of the image to be classified according to the M local image blocks and the N clustering results, determining the classification result of the image to be classified according to the global features, reducing redundant calculation amount of image classification, simplifying and accelerating an algorithm, and providing a possible path for running the image with large calculation amount on terminal hardware.

Description

Image classification method and device

Technical Field

The present application relates to the field of image processing technologies, and in particular, to an image classification method and apparatus.

Background

Image classification has attracted great research interest in recent years and has been successfully deployed in many application products, such as mobile phones, personal computers, and other terminal devices, to intelligently solve many practical image processing problems. With the rapid development of deep learning technology, deep learning has become an advanced technology in image classification. However, the existing deep learning model generally adopts an end-to-end computing manner to obtain a result, and for an image with small calculation amount or an image with small resolution input, the existing terminal hardware can meet the performance requirement, but for an image with large calculation amount or an image with high resolution, the existing terminal hardware may not be operated on the terminal hardware.

Content of the application

The embodiment of the application provides an image classification method and device, which can reduce the calculated amount of image classification and provide a possible way for running images with large calculated amount on terminal hardware.

In a first aspect, an embodiment of the present application provides an image classification method, including:

acquiring an image to be classified;

dividing the image to be classified to obtain M local image blocks, wherein M is a positive integer greater than 1;

clustering the M partial image blocks to obtain N clustering results, wherein N is a positive integer greater than 1;

according to the M local image blocks and the N clustering results, determining global features of the images to be classified, wherein the global features are feature vectors of the images to be classified;

and determining a classification result of the image to be classified according to the global features.

In a second aspect, an embodiment of the present application provides an image classification apparatus, including:

the acquisition unit is used for acquiring the images to be classified;

the dividing unit is used for dividing the image to be classified to obtain M partial image blocks, wherein M is a positive integer greater than 1;

the clustering unit is used for clustering the M local image blocks to obtain N clustering results, wherein N is a positive integer greater than 1;

The first determining unit is used for determining global features of the images to be classified according to the M local image blocks and the N clustering results, wherein the global features are feature vectors of the images to be classified;

and the second determining unit is used for determining the classification result of the image to be classified according to the global feature.

In a third aspect, an embodiment of the present application provides an electronic device, including a processor, a memory, a communication interface, and one or more programs, where the one or more programs are stored in the memory and configured to be executed by the processor, the programs including instructions for performing the steps in the method of the first aspect of the embodiment of the present application.

In a fourth aspect, embodiments of the present application provide a computer-readable storage medium storing a computer program for electronic data exchange, wherein the computer program causes a computer to perform part or all of the steps described in the first aspect of the embodiments of the present application.

In a fifth aspect, embodiments of the present application provide a computer program product, wherein the computer program product comprises a non-transitory computer readable storage medium storing a computer program operable to cause a computer to perform some or all of the steps described in the first aspect of the embodiments of the present application. The computer program product may be a software installation package.

The embodiment of the application has the following beneficial effects:

it can be seen that the method and the device can obtain M local image blocks by obtaining the image to be classified, divide the image to be classified, cluster the M local image blocks to obtain N clustering results, determine global characteristics of the image to be classified according to the M local image blocks and the N clustering results, and determine the classification result of the image to be classified according to the global characteristics. The method of combining the traditional algorithm and the deep learning algorithm is adopted to optimize the end-to-end deep learning model algorithm, so that redundant calculation amount of image classification is reduced, algorithm simplification and acceleration are realized, and a possible way is provided for running the image with large calculation amount on terminal hardware.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present application, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic flow chart of an image classification method according to an embodiment of the present application;

FIG. 2 is a schematic flow chart of image division according to an embodiment of the present application;

FIG. 3 is a schematic flow chart of image clustering according to an embodiment of the present application;

FIG. 4 is a schematic flow chart of obtaining an image feature vector according to an embodiment of the present application;

FIG. 5 is a functional unit composition diagram of another image classification apparatus according to an embodiment of the present application;

fig. 6 is a schematic structural diagram of a terminal device according to an embodiment of the present application.

Detailed Description

In order that those skilled in the art will better understand the present application, a technical solution in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in which it is apparent that the described embodiments are only some embodiments of the present application, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

The terms first, second and the like in the description and in the claims and in the above-described figures are used for distinguishing between different objects and not necessarily for describing a sequential or chronological order. Furthermore, the terms "comprise" and "have," as well as any variations thereof, are intended to cover a non-exclusive inclusion. For example, a process, system, article, or apparatus that comprises a list of steps or elements is not limited to only those listed but may optionally include additional steps or elements not listed or inherent to such process, article, or apparatus.

Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment may be included in at least one embodiment of the application. The appearances of such phrases in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Those of skill in the art will explicitly and implicitly appreciate that the embodiments described herein may be combined with other embodiments.

In particular implementations, the terminal devices described in embodiments of the present application include, but are not limited to, other portable devices such as mobile phones, laptop computers, or tablet computers having a touch-sensitive surface (e.g., a touch screen display and/or a touch pad). It should also be appreciated that in some embodiments, the device is not a portable communication device, but a desktop computer having a touch-sensitive surface (e.g., a touch screen display and/or a touch pad).

In an embodiment of the application, a terminal device comprising a display and a touch sensitive surface is described. However, it should be understood that the terminal device may include one or more other physical user interface devices such as a physical keyboard, mouse, and/or joystick.

The terminal device supports various applications, such as one or more of the following: drawing applications, presentation applications, word processing applications, website creation applications, disk burning applications, spreadsheet applications, gaming applications, telephony applications, video conferencing applications, email applications, instant messaging applications, workout support applications, photo management applications, digital camera applications, digital video camera applications, web browsing applications, digital music player applications, and/or digital video player applications.

Various applications that may be executed on the terminal device may use at least one common physical user interface device such as a touch sensitive surface. One or more functions of the touch-sensitive surface and corresponding information displayed on the terminal may be adjusted and/or changed between applications and/or within the corresponding applications. In this way, the common physical architecture (e.g., touch-sensitive surface) of the terminal may support various applications with user interfaces that are intuitive and transparent to the user.

At present, a deep learning model mostly adopts an end-to-end computing mode to obtain a result, when the deep learning model is used for classifying images, a simple task can reach performance requirements on general terminal equipment hardware, but when images with original resolution are processed, for example, in the field of image enhancement, users pay attention to details of the images, and at the moment, input images cannot be scaled, so that the computing amount is very large, and algorithms often cannot run on the terminal equipment. Therefore, the existing image classification algorithm can meet the requirements when a simple task or an image with small resolution is input, but when the processing calculation amount is large or the input is that the original image is input, the existing image classification algorithm can not run on terminal equipment hardware such as a mobile phone.

Therefore, the application provides an image classification method, which comprises the steps of obtaining an image to be classified, dividing the image to be classified to obtain M local image blocks, clustering the M local image blocks to obtain N clustering results, determining global characteristics of the image to be classified according to the M local image blocks and the N clustering results, and determining classification results of the image to be classified according to the global characteristics, thereby reducing the calculation amount of image classification and realizing the image classification of the image with large calculation amount or high resolution running on terminal equipment.

In order to illustrate the technical scheme of the application, the following detailed description is given by specific examples.

Referring to fig. 1, fig. 1 is a flowchart of an image classification method according to an embodiment of the present application, where the image classification method is applied to a terminal device, and as shown in the figure, the image classification method may include the following steps:

s110, acquiring an image to be classified.

In the embodiment of the application, the image to be classified can be obtained locally from the terminal equipment, or the image to be classified sent by other equipment can be received, and the method is not limited herein. The step of locally acquiring the image to be classified from the terminal device may be to acquire the image to be classified from a memory of the terminal device, or may be to acquire a photo which is not stored in the memory when the terminal device photographs. When the image to be classified is a photo which is shot by the terminal equipment and is not stored in the memory, the photo can be stored in the memory, and classification of the photo is completed without subsequent classification.

The image to be classified may refer to an image of a category to be detected. The image generally comprises a subject and a background, the subject is an object mainly represented by the image, the background is a scene of the subject in the image, the category of the image is determined according to the subject in the image, for example, the subject in the image is a building, and then the category of the image is a building category; the subject in the image is green plants, and then the category of the image is green plants.

Wherein the images to be classified may be one or more.

In an implementation manner of the embodiment of the present application, the image to be detected may be an image obtained by various applications executed on the terminal device, for example, a drawing application, a surveying application, a word processing application, a photo management application, etc., different applications or application scenarios, and the subject and the background in the image are different, for example, for the surveying application, the subject may include entities representing geographical features such as buildings, roads, trees, rivers, etc., while for the word processing application, the subject mainly includes words. Therefore, the terminal device can firstly perform one-time simple classification on the image to be classified according to the application program and/or the application scene to which the image to be classified belongs, namely, determine the application program and/or the application scene to which the image to be classified belongs according to the source of the image to be classified, so that the processing of the image to be classified is simplified.

Optionally, before the dividing the image to be classified, the method further includes: and acquiring an original image, and preprocessing the original image to obtain an image to be classified.

Specifically, a large number of images may be collected or captured in advance, or an image that has been disclosed in the prior art may be used as the original image, or of course, in other embodiments, the original image may be randomly selected from the image library, which is not limited by the embodiment of the present application. Before specifically processing the original image, the original image may be subjected to image compression, increase/decrease restoration, and other processes, for example, the size of the original image may be cut into a uniform format, for example, the image to be classified may be cut into a size of 512×512 in a uniform manner, and the original image may be subjected to normalization operation, so as to obtain the image to be classified.

Further, when the resolution of the original image is too high or the size of the original image is too large, so that the original image exceeds the requirement of the hardware of the terminal equipment, the terminal equipment can compress the original image first, thereby further reducing the calculated amount of processing the image to be classified.

S120, dividing the image to be classified to obtain M partial image blocks, wherein M is a positive integer greater than 1.

Specifically, the terminal device may divide the image to be classified into M identical partial image blocks according to a preset image size. For example, as shown in fig. 2, an image may be divided into 9 rectangular partial image blocks in a nine-grid manner. The terminal equipment can divide the image to be classified into M local image blocks with the same or different sizes according to a preset image size list or pattern; or, the terminal device may also divide the image to be classified into M partial image blocks at random, which is not limited in the embodiment of the present application. According to the embodiment of the application, the M local image blocks can be used as input data for image classification, so that the diversity of data is increased, and the robustness of an image classification model can be improved.

In an implementation manner of the embodiment of the present application, whether the sizes of the local image blocks are the same may be set according to the scenes to which the images belong, and the scenes to which the images belong may be divided according to whether the scene features of the images are fixed, for example, the first scene is an image scene with unfixed scene features, for example, a natural scene; the second scene is an image scene with fixed scene features, such as a plant scene. For example, images belonging to natural scenes, which typically have no explicitly fixed scene features, a size list may be used to extract local image blocks of images belonging to natural scenes to extract valid local features; while images belonging to a plant scene generally have fixed scene images, local features of images belonging to a plant scene are extracted using the same preset size, if a size list is used to extract local features of images belonging to a plant scene, the extracted local features may not be obvious and may be easily disturbed by other information (such as background information) because the size in the size list may vary greatly, such as an image, the content of which has three flowers in a large grass, which belongs to a flower cluster and does not belong to a grass cluster, if the size list is used to extract local features of the image to determine the category of the image, the category of the image may be determined as a grass cluster. The scene feature may refer to an image feature that can characterize a scene to which the image belongs. Scene feature fixation may mean that the distribution of scene features in an image is more concentrated and not dispersed; the unfixed scene features may refer to the scene features being distributed in the image more discretely and not concentrated. The scene to which the image to be measured belongs may refer to a scene to which a subject in the image to be measured belongs, for example, the subject of the image is a flower, and then the scene to which the image belongs is a plant scene; the main body of the image is a beach, a valley, etc., and then the scene to which the image belongs is a natural scene.

It should be noted that, the local image blocks are not overlapped with each other, and the shape blocks of the local image blocks may be rectangular or irregular polygons, and the shape of the local image blocks is not limited in this embodiment of the present application.

In the embodiment of the application, the terminal equipment can determine the size of M according to the resolution of the image to be classified, wherein the resolution of the image to be classified and the size of M can have a proportional relation or a mapping relation, namely, the higher the resolution of the image to be classified is, the larger the number M of the image to be classified divided into local image blocks by the terminal equipment is; the lower the resolution of the image to be classified, the smaller the number M of partial image blocks into which the terminal device can divide the image to be classified. By dividing the high-resolution image into a plurality of local image blocks, the complexity of the image classification algorithm can be reduced, and the algorithm is simplified and accelerated.

In an implementation manner of the embodiment of the present application, the terminal device may determine the shape and the number of local image blocks according to an application program and/or an application scene to which an image to be classified belongs, for example, for a remote sensing image generated by a survey application program, feature distribution conditions of different types of images in a high resolution image are concentrated, that is, a main body belonging to the same type is concentrated in a whole image, and the terminal device may divide the image to be classified into a plurality of local image blocks with the same rectangular size; for the generation of a character image in a photographing application program, a main body of the character image is mostly distributed in the middle of the whole image, terminal equipment can divide the whole image in sequence from the middle, and the size of a local image block of the middle part can be larger, so that the local image block contains more features.

S130, clustering the M local image blocks to obtain N clustering results.

Optionally, the clustering the M local image blocks to obtain N clustering results includes: and clustering the M local image blocks by adopting an unsupervised learning clustering method to obtain N clustering results, wherein N is a positive integer greater than 1.

In the embodiment of the application, before the M local image blocks are clustered, a classification model is required to be used for acquiring the local characteristics of the local image blocks. Therefore, the classification model needs to be trained first, and the trained classification model is used to acquire the local features of the local image blocks. When the classification training model is trained, the training samples are divided to obtain a plurality of local training samples, the local training samples are sampled, the sampled training samples are input into the classification model, the classification model outputs the local characteristics of the training samples, and the loss function is used for training and returning.

In the embodiment of the application, the classification model can be used for extracting the characteristics of M local image blocks respectively to obtain M local characteristics, and in order to overcome the limitation of single characteristics, the application can adopt the combined local characteristics formed by combining multiple characteristics for cluster analysis. The common image local features can comprise color features, LBP features, texture features and the like, wherein the dependency of the color features of the image on the size, direction and viewing angle of the image is small, and the common color histogram features describe the proportion of different colors in the whole image; texture is important spatial information of a remote sensing image, and as resolution is improved, the internal structure of the ground object becomes clearer, and the texture structure of the ground object is more obvious in the remote sensing image; relative to the spectrum information, the texture features can reflect the regular spatial variation of pixels in the target ground object. Therefore, the terminal device can select different features or feature combinations according to different application scenes or application programs, for example, the remote sensing image can adopt color features and texture features to be combined into combined local features; local features or local feature combinations may also be selected according to preset feature options, which are not limited in the embodiment of the present application.

Wherein the combined local feature may be one-dimensional, i.e. a plurality of combined features are stitched, e.g. texture features may be stitched behind color features; the combined local feature may also be multidimensional, i.e., the plurality of combined features may be a feature matrix, which is not limited by the embodiments of the present application.

Further, the color features are based on an HSL (Hue, saturation, lightness) color space, and the color histogram features are extracted, and compared with an RGB color space, the HSL color space is more in line with the visual perception characteristics of human eyes.

In the embodiment of the application, local image blocks belonging to the same category in the M local image blocks are clustered together based on the M extracted local features to obtain N clustering results. The size of the M N depends on the number of categories contained in the M local image blocks, as shown in fig. 3, the numbers in the local image blocks represent the categories of the local image blocks, 9 local image blocks are clustered, and the local image blocks in the same category are clustered into the same category, so as to obtain 4 clustering results, wherein the clustering result 1 contains 3 local image blocks, and the clustering results 2-4 respectively contain 2 local image blocks.

Wherein the non-supervised learning clustering method includes, but is not limited to: k-means clustering algorithm, birch clustering algorithm, DBSCAN clustering algorithm and K nearest neighbor classification algorithm.

Generally, a good cluster partition should reflect the internal structure of the dataset as much as possible, so that the classes in the same class are as much as possible, and the classes between classes are as much as possible different, for example, taking the K-means clustering algorithm as an example, from the distance point of view, the cluster with the smallest distance in the same class and the largest distance between the classes is the optimal cluster, and in the embodiment of the present application, the local features are classified as similar as possible into the same class, and the local features are not similar as much as possible into different classes.

And S140, determining the global features of the images to be classified according to the M local image blocks and the N clustering results.

The global features are feature vectors of the images to be classified, the global features and the local features are image features of the images to be classified, and the global features are feature vectors extracted from the whole images to be classified and come from the features of the whole images to be classified; the local feature refers to a feature vector extracted from a local image block of the whole image to be detected, and is a feature of the local image block from the image to be detected.

Further, to overcome the limitations of a single feature, the present application may employ a combined global feature that is a combination of multiple features for image classification. The global image features may include color features, LBP features, texture features, etc., and the terminal device may select different features or feature combinations according to different application scenarios or application programs, for example, the remote sensing image may use the combination of the color features and the texture features to form a combined global feature; the global feature or the global feature combination may also be selected according to a preset feature option, which is not limited in the embodiment of the present application.

Optionally, the determining the global feature of the image to be classified according to the M local image blocks and the N clustering results includes: carrying out convolution operation on the M local image blocks and each clustering result in the N clustering results respectively to obtain a first feature vector; and carrying out binary image coding on the first feature vector to obtain the global feature.

For example, as shown in fig. 4, the 9 local image blocks are respectively convolved with 4 clustering results obtained by clustering the 9 local image blocks, so as to obtain a first feature vector of the image to be classified, where the first feature vector is used to describe global information of the image to be classified.

The first feature vector can be one-dimensional, and the feature vector obtained by convolving the local image block with the clustering result can be spliced to obtain the first feature vector; the first feature vector may also be multidimensional, which is not limited by the embodiment of the present application.

Wherein the first feature vector may also be one or more.

It should be noted that, the convolution operation may be performed on the M local image blocks and each of the N clustering results, where the local features extracted from the local image blocks may be convolved with the local features extracted from the clustering results, so as to obtain a first feature vector.

Optionally, the performing binary image coding on the first feature vector to obtain the global feature includes: and setting a value larger than a first value in the first feature vector as a second value according to a binary image coding rule, and setting a value smaller than or equal to the first value in the first feature vector as the first value to obtain the global feature.

Wherein the first value may be 0 and the second value may be 1; the first value may also be 255 and the second value may be 0, for example, each pixel of the image may be represented by a value between (0, 255), the binary image encoding is to convert the image into a gray image, each pixel on the gray image has only two possible values or gray scale states, and the global feature may be a feature represented by 8 bits. Of course, the first value and the second value may be other values, which are not limited in this embodiment of the present application.

In the embodiment of the application, before the convolution operation is performed on the M local image blocks and each clustering result in the N clustering results, a classification model is required to be used for acquiring local features of the local image blocks and the clustering results, and the classification model can be used for acquiring local features of the local image blocks and the clustering results.

In an implementation manner of the embodiment of the present application, the terminal device may input the local image block and the clustering result into a preset algorithm, and process the image by using the preset algorithm. The preset algorithm may be a Fast (Fast) convolutional neural network (Fast Region Based Convolutional NeuralNetwork, RCNN) algorithm based on a region. For example, in implementation, a user may preset a convolution window in the Fast RCNN algorithm, and after the terminal inputs the local image block and the clustering result to the Fast RCNN algorithm, the terminal convolves the image with the convolution window to obtain the first feature vector. The first eigenvector is a complete matrix obtained by convolving the image.

S150, determining a classification result of the image to be classified according to the global features.

Optionally, the determining the classification result of the image to be classified according to the global feature includes: performing convolution operation on the global feature and a convolution vector to obtain a probability vector of the global feature, wherein the convolution vector is obtained by training image samples of a labeling category; and taking the category corresponding to the maximum value in the probability vector as a classification result of the image to be classified.

The size of the convolution operation is 1x1, the value in the probability vector represents the probability that the image to be classified belongs to each category, and the category corresponding to the maximum value can be used as the classification result of the image to be classified according to the size of the median of the probability vector.

In the embodiment of the application, the convolution vector needs to be obtained first, the convolution vector can be obtained by training with original training samples of various types in the image library, and the convolution vector can also be obtained by training collected image samples marked with image types, which is not limited in the embodiment of the application. It should be noted that, the above convolution vector may depend on the image samples of the labeling image class, and the convolution vectors may also be different for the image samples of different labeling image classes, so the terminal device may select the image samples suitable for the application scene or the application program to train to obtain the convolution vector, for example, for the observed terminal device, in order to accurately classify the remote sensing image, detailed ground information is provided, and the remote sensing image samples labeled with the classes may be directly adopted to train to obtain the convolution vector, thereby improving the accuracy of image classification.

The probability vector of the image to be classified can be directly obtained after the convolution operation is carried out on the global features and the convolution vector, and the 1x1 convolution calculation amount is small, so that parallelization acceleration is obvious, the calculation amount and the memory occupancy rate of the original data are greatly reduced, the image classification algorithm is optimized under the condition of ensuring the accuracy, and the calculation cost of the non-supervision learning method is greatly reduced under the premise of ensuring the image classification accuracy.

In an implementation manner of the embodiment of the present application, the method further includes: acquiring a first image set, wherein the first image set comprises images of marked image categories; training a classifier to be trained by using the first image set to obtain a first classifier;

the determining the classification result of the image to be classified according to the global feature comprises the following steps: and inputting the global features into the first classifier, and outputting a classification result of the image to be classified.

The global features of the image to be classified are input to the classifier, so that the classifier can classify the image to be classified according to the global features of the image to be classified, for example, for a remote sensing image, the classifier can classify the image to be classified according to the color features and the texture features in the global features.

In the embodiment of the application, before the classifier is used for acquiring the class of the image to be classified, the classifier is trained, and the trained classifier is used for acquiring the class of the image to be classified. : when training the classifier, the trained classifier is used for acquiring global features of the images in the first image set, the global features are input into the classifier, the classifier outputs the categories of the images in the first image set, and the target supervision is used for carrying out feedback training on the classifier. Wherein the target supervision is a supervision learning in deep learning, such as a loss function.

Further, the classifier may refer to a model for classifying the image to be classified according to the image characteristics of the image to be classified, and optionally, the classifier in the present application may be a nonlinear classifier, such as a nonlinear support vector machine (Support Vector Machine, SVM). The nonlinear classifier can effectively expand the classification dimension and reduce the defects of the linear classifier such as softmax, full-connection layer and the like on nonlinear classification.

In the embodiment of the application, the clustering result of the local image block is obtained by an unsupervised learning clustering method, the global feature of the image to be classified is obtained by using a supervised learning classification model, and the category of the image to be classified is determined according to the global feature by using the supervised learning classification model. The application combines the unsupervised learning and the supervised learning deep learning algorithm, improves the performance of the image classification algorithm and reduces the power consumption.

According to the embodiment of the application, the first classifier and the convolution vector can be obtained by training the classifier to be trained through one image sample, and the first classifier and the convolution vector can be obtained by respectively training the classifier to be trained through different image samples.

It can be seen that the embodiment of the application provides an image classification method, which is applied to terminal equipment, and is used for obtaining M local image blocks by obtaining an image to be classified and dividing the image to be classified, clustering the M local image blocks to obtain N clustering results, determining global characteristics of the image to be classified according to the M local image blocks and the N clustering results, determining the classification result of the image to be classified according to the global characteristics, reducing redundant calculation amount of image classification, realizing simplification and acceleration of an algorithm, and providing a possible way for running an image with large calculation amount on terminal hardware.

The foregoing description of the embodiments of the present application has been presented primarily in terms of a method-side implementation. It will be appreciated that the electronic device, in order to achieve the above-described functions, includes corresponding hardware structures and/or software modules that perform the respective functions. Those of skill in the art will readily appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as hardware or combinations of hardware and computer software. Whether a function is implemented as hardware or computer software driven hardware depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

The embodiment of the application can divide the functional units of the electronic device according to the method example, for example, each functional unit can be divided corresponding to each function, and two or more functions can be integrated in one processing unit. The integrated units may be implemented in hardware or in software functional units. It should be noted that, in the embodiment of the present application, the division of the units is schematic, which is merely a logic function division, and other division manners may be implemented in actual practice.

Referring to fig. 5, fig. 5 is a functional unit block diagram of an image classification apparatus according to an embodiment of the present application, which is applied to a terminal device, as shown in fig. 5, and the apparatus includes:

an acquiring unit 510, configured to acquire an image to be classified;

the dividing unit 520 is configured to divide the image to be classified to obtain M partial image blocks, where M is a positive integer greater than 1;

a clustering unit 530, configured to cluster the M local image blocks to obtain N clustering results, where N is a positive integer greater than 1;

a first determining unit 540, configured to determine global features of the image to be classified according to the M local image blocks and the N clustering results, where the global features are feature vectors of the image to be classified;

A second determining unit 550, configured to determine a classification result of the image to be classified according to the global feature.

In an implementation manner of the embodiment of the present application, the first determining unit 540 is specifically configured to: carrying out convolution operation on the M local image blocks and each clustering result in the N clustering results respectively to obtain a first feature vector; and carrying out binary image coding on the first feature vector to obtain the global feature.

In an implementation manner of the embodiment of the present application, the first determining unit 540 is further specifically configured to: and setting a value larger than a first value in the first feature vector as a second value according to a binary image coding rule, and setting a value smaller than or equal to the first value in the first feature vector as the first value to obtain the global feature.

In an implementation manner of the embodiment of the present application, the second determining unit 550 is specifically configured to: performing convolution operation on the global feature and a convolution vector to obtain a probability vector of the global feature, wherein the convolution vector is obtained by training image samples of a labeling category; and taking the category corresponding to the maximum value in the probability vector as a classification result of the image to be classified.

In an implementation manner of the embodiment of the present application, the size of the convolution operation is 1x1.

In an implementation manner of the embodiment of the present application, the obtaining unit 510 is further configured to: a first set of images is acquired, the first set of images including images of the annotated image categories.

In an implementation manner of the embodiment of the present application, the apparatus further includes a training unit 560, where the training unit 560 is configured to train the classifier to be trained using the first image set, to obtain a first classifier.

In an implementation manner of the embodiment of the present application, the second confirmation unit 550 is further specifically configured to: and inputting the global features into the first classifier, and outputting a classification result of the image to be classified.

In an implementation manner of the embodiment of the present application, the clustering unit 530 is specifically configured to: and clustering the M local image blocks by adopting an unsupervised learning clustering method to obtain N clustering results.

In an implementation manner of the embodiment of the present application, before the dividing the image to be classified, the obtaining unit is further configured to: and acquiring an original image, and preprocessing the original image to obtain an image to be classified.

It may be understood that the functions of each program module of the image classification device according to the embodiment of the present application may be specifically implemented according to the method in the embodiment of the method, and the specific implementation process may refer to the related description of the embodiment of the method, which is not repeated herein.

It can be seen that the embodiment of the application provides an image classification device, which is applied to terminal equipment, and is used for obtaining M local image blocks by obtaining an image to be classified and dividing the image to be classified, clustering the M local image blocks to obtain N clustering results, determining global characteristics of the image to be classified according to the M local image blocks and the N clustering results, determining the classification result of the image to be classified according to the global characteristics, reducing redundant calculation amount of image classification, realizing simplification and acceleration of an algorithm, and providing a possible way for running an image with large calculation amount on terminal hardware.

Referring to fig. 6, fig. 6 is a schematic structural diagram of a terminal device according to an embodiment of the present application, where, as shown in fig. 6, the terminal device includes one or more processors, one or more memories, one or more communication interfaces, and one or more programs;

The one or more programs are stored in the memory and configured to be executed by the one or more processors;

the program includes instructions for performing the steps of:

acquiring an image to be classified;

In an implementation of an embodiment of the present application, the program includes instructions for performing the following steps: carrying out convolution operation on the M local image blocks and each clustering result in the N clustering results respectively to obtain a first feature vector; and carrying out binary image coding on the first feature vector to obtain the global feature.

In an implementation of an embodiment of the present application, the program includes instructions for performing the following steps: and setting a value larger than a first value in the first feature vector as a second value according to a binary image coding rule, and setting a value smaller than or equal to the first value in the first feature vector as the first value to obtain the global feature.

In an implementation of an embodiment of the present application, the program includes instructions for performing the following steps: performing convolution operation on the global feature and a convolution vector to obtain a probability vector of the global feature, wherein the convolution vector is obtained by training image samples of a labeling category; and taking the category corresponding to the maximum value in the probability vector as a classification result of the image to be classified.

In an implementation of an embodiment of the present application, the program includes instructions for performing the following steps: acquiring a first image set, wherein the first image set comprises images of marked image categories; and training the classifier to be trained by using the first image set to obtain a first classifier.

In an implementation of an embodiment of the present application, the program includes instructions for performing the following steps: and inputting the global features into the first classifier, and outputting a classification result of the image to be classified.

In an implementation of an embodiment of the present application, the program includes instructions for performing the following steps: and clustering the M local image blocks by adopting an unsupervised learning clustering method to obtain N clustering results.

In an implementation manner of the embodiment of the present application, before the dividing the image to be classified, the program includes instructions for further performing the following steps: and acquiring an original image, and preprocessing the original image to obtain an image to be classified.

The embodiment of the application also provides a computer storage medium, wherein the computer storage medium stores a computer program for electronic data exchange, and the computer program makes a computer execute part or all of the steps of any one of the methods described in the embodiment of the method, and the computer includes a terminal device.

Embodiments of the present application also provide a computer program product comprising a non-transitory computer-readable storage medium storing a computer program operable to cause a computer to perform part or all of the steps of any one of the methods described in the method embodiments above. The computer program product may be a software installation package, said computer comprising a terminal device.

It should be noted that, for simplicity of description, the foregoing method embodiments are all described as a series of acts, but it should be understood by those skilled in the art that the present application is not limited by the order of acts described, as some steps may be performed in other orders or concurrently in accordance with the present application. Further, those skilled in the art will also appreciate that the embodiments described in the specification are all preferred embodiments, and that the acts and modules referred to are not necessarily required for the present application.

In the foregoing embodiments, the descriptions of the embodiments are emphasized, and for parts of one embodiment that are not described in detail, reference may be made to related descriptions of other embodiments.

In the several embodiments provided by the present application, it should be understood that the disclosed apparatus may be implemented in other manners. For example, the above-described apparatus embodiments are merely illustrative, such as the above-described division of units, merely a division of logic functions, and there may be additional manners of dividing in actual implementation, such as multiple units or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, or may be in electrical or other forms.

The units described above as separate components may or may not be physically separate, and components shown as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in the embodiments of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.

The integrated units described above, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable memory. Based on such understanding, the technical solution of the present application may be embodied in essence or a part contributing to the prior art or all or part of the technical solution in the form of a software product stored in a memory, comprising several instructions for causing a computer device (which may be a personal computer, a server or a network device, etc.) to perform all or part of the steps of the above-mentioned method of the various embodiments of the present application. And the aforementioned memory includes: a U-disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a removable hard disk, a magnetic disk, or an optical disk, or other various media capable of storing program codes.

Those of ordinary skill in the art will appreciate that all or a portion of the steps in the various methods of the above embodiments may be implemented by a program that instructs associated hardware, and the program may be stored in a computer readable memory, which may include: flash memory, ROM, RAM, magnetic or optical disk, etc.

The foregoing has outlined rather broadly the more detailed description of embodiments of the application, wherein the principles and embodiments of the application are explained in detail using specific examples, the above examples being provided solely to facilitate the understanding of the method and core concepts of the application; meanwhile, as those skilled in the art will have variations in the specific embodiments and application scope in accordance with the ideas of the present application, the present description should not be construed as limiting the present application in view of the above.

Claims

1. A method of classifying images, the method comprising:

acquiring an image to be classified, and determining an application program and/or an application scene to which the image to be classified belongs according to the source of the image to be classified;

dividing the image to be classified to obtain M local image blocks, wherein M is a positive integer greater than 1; determining the shape, size and number of local image blocks according to the application program and/or application scene to which the image to be classified belongs;

determining a classification result of the image to be classified according to the global features;

before the clustering of the M partial image blocks, the method further includes: obtaining M local features of M local image blocks through a classification model;

the clustering the M partial image blocks to obtain N clustering results includes: selecting different image local features or image local feature combinations according to different application programs or application scenes to which the images belong, and clustering the M local features according to the image local features or the image local feature combinations to obtain N clustering results.

2. The method of claim 1, wherein determining global features of the image to be classified based on the M local image blocks and the N clustering results comprises:

carrying out convolution operation on the M local image blocks and each clustering result in the N clustering results respectively to obtain a first feature vector;

And carrying out binary image coding on the first feature vector to obtain the global feature.

3. The method of claim 2, wherein the binary image encoding the first feature vector to obtain the global feature comprises:

and setting a value larger than a first value in the first feature vector as a second value according to a binary image coding rule, and setting a value smaller than or equal to the first value in the first feature vector as the first value to obtain the global feature.

4. A method according to any one of claims 1-3, wherein said determining the classification result of the image to be classified from the global features comprises:

performing convolution operation on the global feature and a convolution vector to obtain a probability vector of the global feature, wherein the convolution vector is obtained by training image samples of a labeling category;

and taking the category corresponding to the maximum value in the probability vector as a classification result of the image to be classified.

5. The method of claim 4, wherein the convolution operation has a size of 1x1.

6. A method according to any one of claims 1-3, wherein the method further comprises:

Acquiring a first image set, wherein the first image set comprises images of marked image categories;

training a classifier to be trained by using the first image set to obtain a first classifier;

7. The method according to any one of claims 1-6, wherein clustering the M partial image blocks to obtain N clustering results includes:

and clustering the M local image blocks by adopting an unsupervised learning clustering method to obtain N clustering results.

8. An image classification apparatus, the apparatus comprising:

the acquisition unit is used for acquiring the image to be classified and determining an application program and/or an application scene to which the image to be classified belongs according to the source of the image to be classified;

the dividing unit is used for dividing the image to be classified to obtain M partial image blocks, wherein M is a positive integer greater than 1; determining the shape, size and number of local image blocks according to the application program and/or application scene to which the image to be classified belongs;

the second determining unit is used for determining a classification result of the image to be classified according to the global features;

the obtaining unit is further configured to obtain M local features of the M local image blocks through a classification model before clustering the M local image blocks;

the clustering unit is specifically configured to select different image local features or image local feature combinations according to different application programs or application scenes to which the image belongs, and cluster the M local features according to the image local features or the image local feature combinations to obtain N clustering results.

9. A terminal device, characterized in that it comprises a processor, a memory, a communication interface, and one or more programs stored in the memory and configured to be executed by the processor, the programs comprising instructions for performing the steps in the method according to any of claims 1-7.

10. A computer readable storage medium, characterized in that the computer readable storage medium comprises a computer program for storing a computer program for electronic data exchange, the computer program causing a computer to perform the method according to any one of claims 1-7.