WO2021164550A1

WO2021164550A1 - Image classification method and apparatus

Info

Publication number: WO2021164550A1
Application number: PCT/CN2021/075045
Authority: WO
Inventors: 孙哲
Original assignee: Oppo广东移动通信有限公司
Priority date: 2020-02-18
Filing date: 2021-02-03
Publication date: 2021-08-26
Also published as: CN111325271A; CN111325271B

Abstract

The embodiments of the present application disclose an image classification method and apparatus, which are applied to a terminal device. Said method comprises: acquiring an image to be classified; dividing said image to obtain M local image blocks; clustering the M local image blocks to obtain N clustering results; determining a global feature of said image according to the M local image blocks and the N clustering results; and determining a classification result of said image according to the global feature. The present invention can reduce the redundant computation of image classification, realizes a simplified and accelerated algorithm, and allows for the possibility of an image with a large amount of computation to run on terminal hardware.

Description

Image classification method and device

Technical field

This application relates to the field of image processing technology, and in particular to an image classification method and device.

Background technique

In recent years, image classification has aroused people's great research interest, and it has been successfully deployed in many application products, such as mobile phones, personal computers and other terminal devices, which intelligently solve many practical image processing problems. With the rapid development of deep learning technology, deep learning has become an advanced technology in image classification. However, the existing deep learning models usually use end-to-end calculations to obtain the results. For images with a small amount of calculation or images with a small input resolution, the existing terminal hardware can meet the performance requirements, but for those with a large amount of calculation. Images or high-resolution images may not run on the terminal hardware.

Summary of the invention

The embodiments of the present application provide an image classification method and device, which can reduce the amount of calculation for image classification, and provide a possible way for images with a large amount of calculation to run on terminal hardware.

In a first aspect, an embodiment of the present application provides an image classification method, and the method includes:

Obtain the image to be classified;

Dividing the image to be classified to obtain M partial image blocks, where M is a positive integer greater than 1;

Clustering the M partial image blocks to obtain N clustering results, where N is a positive integer greater than 1;

Determining a global feature of the image to be classified according to the M local image blocks and the N clustering results, where the global feature is a feature vector of the image to be classified;

The classification result of the image to be classified is determined according to the global feature.

In a second aspect, an embodiment of the present application provides an image classification device, the device including:

The acquiring unit is used to acquire the image to be classified;

A dividing unit, configured to divide the image to be classified to obtain M partial image blocks, where M is a positive integer greater than 1;

A clustering unit, configured to cluster the M partial image blocks to obtain N clustering results, where N is a positive integer greater than 1;

A first determining unit, configured to determine a global feature of the image to be classified according to the M local image blocks and the N clustering results, where the global feature is a feature vector of the image to be classified;

The second determining unit is configured to determine the classification result of the image to be classified according to the global feature.

In a third aspect, an embodiment of the present application provides an electronic device, including a processor, a memory, a communication interface, and one or more programs, wherein the one or more programs are stored in the memory and configured by The processor executes, and the program includes instructions for executing the steps in the method described in the first aspect of the embodiments of the present application.

In a fourth aspect, an embodiment of the present application provides a computer-readable storage medium, wherein the above-mentioned computer-readable storage medium stores a computer program for electronic data exchange, wherein the above-mentioned computer program enables a computer to execute Some or all of the steps described in one aspect.

In a fifth aspect, the embodiments of the present application provide a computer program product, wherein the computer program product includes a non-transitory computer-readable storage medium storing a computer program, and the computer program is operable to cause a computer to execute as implemented in this application. Example part or all of the steps described in the first aspect. The computer program product may be a software installation package.

The implementation of the embodiments of this application has the following beneficial effects:

It can be seen that this application can obtain the image to be classified, divide the image to be classified to obtain M partial image blocks, and cluster the M partial image blocks to obtain N clustering results. The M local image blocks and the N clustering results determine the global feature of the image to be classified, and determine the classification result of the image to be classified according to the global feature. By using a combination of traditional algorithms and deep learning algorithms to optimize the end-to-end deep learning model algorithm, the redundant calculation of image classification is reduced, the algorithm is simplified and accelerated, and the image with a large amount of calculation is used in the terminal hardware. Running on provides a possible way.

Description of the drawings

In order to more clearly describe the technical solutions in the embodiments of the present application, the following will briefly introduce the drawings needed in the embodiments. Obviously, the drawings in the following description are some embodiments of the present application. For those of ordinary skill in the art, without creative work, other drawings can be obtained from these drawings.

FIG. 1 is a schematic flowchart of an image classification method provided by an embodiment of the present application;

FIG. 2 is a schematic diagram of an image division process provided by an embodiment of the present application;

FIG. 3 is a schematic flowchart of an image clustering provided by an embodiment of the present application;

FIG. 4 is a schematic diagram of a process for obtaining image feature vectors according to an embodiment of the present application;

Fig. 5 is a functional unit composition diagram of another image classification device provided by an embodiment of the present application;

Fig. 6 is a schematic structural diagram of a terminal device provided by an embodiment of the present application.

Detailed ways

In order to enable those skilled in the art to better understand the solutions of the application, the technical solutions in the embodiments of the application will be clearly and completely described below in conjunction with the drawings in the embodiments of the application. Obviously, the described embodiments are only It is a part of the embodiments of the present application, but not all of the embodiments. Based on the embodiments in this application, all other embodiments obtained by those of ordinary skill in the art without creative work shall fall within the protection scope of this application.

The terms "first", "second", etc. in the specification and claims of this application and the above-mentioned drawings are used to distinguish different objects, rather than to describe a specific sequence. In addition, the terms "including" and "having" and any variations of them are intended to cover non-exclusive inclusions. For example, a process, system, product, or device that includes a series of steps or units is not limited to the listed steps or units, but optionally includes steps or units that are not listed, or optionally also includes Other steps or units inherent in a process, product, or equipment.

The reference to "embodiments" herein means that a specific feature, structure, or characteristic described in conjunction with the embodiments may be included in at least one embodiment of the present application. The appearance of the phrase in various places in the specification does not necessarily refer to the same embodiment, nor is it an independent or alternative embodiment mutually exclusive with other embodiments. Those skilled in the art clearly and implicitly understand that the embodiments described herein can be combined with other embodiments.

In specific implementation, the terminal devices described in the embodiments of the present application include but are not limited to other portable devices such as mobile phones, laptop computers, or tablet computers with touch-sensitive surfaces (for example, touch screen displays and/or touch pads). . It should also be understood that in some embodiments, the device is not a portable communication device, but a desktop computer with a touch-sensitive surface (e.g., touch screen display and/or touch pad).

In the embodiments of the present application, a terminal device including a display and a touch-sensitive surface is described. However, it should be understood that the terminal device may include one or more other physical user interface devices such as a physical keyboard, mouse, and/or joystick.

The terminal device supports various applications, such as one or more of the following: drawing application, presentation application, word processing application, website creation application, disk burning application, spreadsheet application, game application, phone Apps, video conferencing apps, email apps, instant messaging apps, exercise support apps, photo management apps, digital camera apps, digital camera apps, web browsing apps, digital music player apps And/or digital video player application.

Various application programs that can be executed on the terminal device can use at least one common physical user interface device such as a touch-sensitive surface. One or more functions of the touch-sensitive surface and corresponding information displayed on the terminal can be adjusted and/or changed between applications and/or within corresponding applications. In this way, the common physical architecture of the terminal (for example, a touch-sensitive surface) can support various applications with a user interface that is intuitive and transparent to the user.

At present, most deep learning models use end-to-end calculations to obtain results. When using deep learning models for image classification, simple tasks can meet the performance requirements on general terminal equipment hardware, but when processing the original resolution For example, in the field of image enhancement, the user pays more attention to the details of the image. At this time, the input image cannot be zoomed. Therefore, the amount of calculation is very large, which often leads to the inability of the algorithm to run on the terminal device. Therefore, the existing image classification algorithm can meet the requirements for simple tasks or when the input is a small-resolution image, but when the processing calculation is large or the input is an original image, it may not be able to run on terminal device hardware such as mobile phones.

For this reason, this application proposes an image classification method, which obtains an image to be classified, divides the image to be classified to obtain M partial image blocks, and clusters the M partial image blocks to obtain N Clustering results, determining the global feature of the image to be classified according to the M local image blocks and the N clustering results, and determining the classification result of the image to be classified according to the global feature, thereby reducing image classification The amount of calculation is realized on the terminal device to run the image classification of the large amount of calculation or the high-resolution image.

In order to illustrate the technical solutions described in the present application, specific embodiments are used to describe in detail below.

Please refer to FIG. 1. FIG. 1 is a schematic flowchart of an image classification method provided by an embodiment of the present application. The image classification method is applied to a terminal device. As shown in the figure, the image classification method may include the following steps:

S110. Obtain an image to be classified.

In the embodiment of the present application, the image to be classified can be obtained locally from the terminal device, and the image to be classified can also be received from other devices, which is not limited here. Obtaining the image to be classified locally from the terminal device may refer to obtaining the image to be classified from the memory of the terminal device, or it may be obtaining the photos that have not been stored in the memory when the terminal device is taking a picture. When the image to be classified is a photo taken by the terminal device that has not been stored in the memory, it can be realized that the photo is stored in the memory to complete the classification of the photo without subsequent classification.

Among them, the image to be classified may refer to an image of the category to be detected. An image usually includes a subject and a background. The subject is the main object of the image. The background is the scene that sets off the subject in the image. The category of the image is determined according to the subject in the image. For example, if the subject in the image is a building, then the category of the image is the architectural category; The subject of the image is green plants, so the category of the image is green plants.

There may be one or more images to be classified.

In an implementation of the embodiment of the present application, the image to be detected may be an image obtained by various applications executed on the terminal device, for example, a drawing application, a survey application, a word processing application, a photo management application, etc., Different applications or application scenarios have different subjects and backgrounds in the image. For example, for survey applications, the subject can include entities representing geographic features such as buildings, roads, trees, and rivers, while for word processing applications, The main body mainly includes text. Therefore, the terminal device can first perform a simple classification of the image to be classified according to the application and/or application scenario to which the image to be classified belongs, that is, determine the application and/or application scenario to which the image to be classified belongs according to the source of the image to be classified. This simplifies the processing of images to be classified.

Optionally, before dividing the image to be classified, the method further includes: acquiring an original image, and preprocessing the original image to obtain the image to be classified.

Specifically, a large number of images can be collected or taken in advance, or a publicly available image can be used as the original image. Of course, in other embodiments, the original image can also be randomly selected from the image library. There is no restriction on this. Before processing the original image, you can perform image compression, increase, decrease, and restoration of the original image. For example, the original image size can be cropped into a uniform format. For example, the image to be classified can be uniformly cropped to a size of 512×512, and The original image is normalized to obtain the image to be classified.

Further, when the resolution of the original image is too high, or the size of the original image is too large, which exceeds the hardware requirements of the terminal device, the terminal device may compress the original image first, thereby further reducing the need for processing. The amount of calculation to classify images.

S120. Divide the image to be classified to obtain M partial image blocks, where M is a positive integer greater than 1.

Specifically, the terminal device may divide the image to be classified into M identical partial image blocks according to a preset image size. For example, as shown in FIG. 2, the image to be classified can be divided into 9 rectangular partial image blocks in a nine-square grid manner. The terminal device can also divide the image to be classified into M partial image blocks of the same or different sizes according to a preset image size list or pattern; or, the terminal device can also randomly divide the image to be classified into M partial image blocks. The application embodiment does not limit this. The embodiment of the present application can perform image classification by using M partial image blocks as input data, which not only increases the diversity of data, but also improves the robustness of the image classification model.

In an implementation of the embodiment of the present application, the size of the partial image blocks can be set according to the scene to which the image belongs, and the scene to which the image belongs can be divided according to whether the scene characteristic of the image is fixed. For example, the first scene is the scene characteristic. An unfixed image scene, such as a natural scene; the second scene is an image scene with fixed scene characteristics, such as a plant scene. For example, images belonging to natural scenes usually have no clear and fixed scene features. You can use the size list to extract local image blocks of images belonging to natural scenes to extract effective local features; while images belonging to plant scenes usually have fixed scene images , Use the same preset size to extract the local features of the image belonging to the plant scene. If the size list is used to extract the local features of the image belonging to the plant scene, because the size in the size list may vary greatly, the extracted local features may not be obvious , It is easy to be disturbed by other information (such as background information), such as an image. The content contains three or four flowers in a large patch of grass. It belongs to the flower cluster and does not belong to the grass. If you use the size list to extract the local features of the image, you can judge the image The category of the image may be judged as grass. Among them, the scene feature may refer to the image feature that can characterize the scene to which the image belongs. Fixed scene features may mean that the distribution of scene features in the image is relatively concentrated and not scattered; unfixed scene features may mean that the distribution of scene features in the image is relatively scattered and not concentrated. The scene of the image to be tested may refer to the scene of the subject in the image to be tested. For example, if the subject of the image is a flower, then the scene to which the image belongs is a plant scene; the subject of the image is a beach, valley, etc., then the scene to which the image belongs is a natural scene.

It should be noted that the partial image blocks do not overlap each other, and the shape blocks of the partial image blocks may be rectangular or irregular polygons. The embodiment of the present application does not limit the shape of the partial image blocks here.

In this embodiment of the application, the terminal device may determine the size of M according to the resolution of the image to be classified, and the resolution of the image to be classified may have a proportional relationship or a mapping relationship with the size of M, that is, the size of the image to be classified The higher the resolution of the image, the greater the number M that the terminal device can divide the image to be classified into partial image blocks; the lower the resolution of the image to be classified, the terminal device can divide the image to be classified into the number of partial image blocks The smaller M is. By dividing a high-resolution image into multiple partial image blocks, the complexity of the image classification algorithm can be reduced, and the algorithm can be simplified and accelerated.

In an implementation of the embodiment of the present application, the terminal device can determine the shape and quantity of the partial image block according to the application and/or application scenario to which the image to be classified belongs. For example, for a remote sensing image generated by a survey application, the height The feature distribution of different categories of images in the resolution image is relatively concentrated, that is, subjects belonging to the same category are concentrated on the entire image, and the terminal device can divide the image to be classified into multiple partial image blocks of the same rectangular size; for photographing applications The image of a person is generated in the program, and the main body of the image is mostly distributed in the middle of the whole image. The terminal device can be divided in order from the middle, and the size of the partial image block in the middle part can be larger, so that the partial image block contains more features .

S130. Cluster the M partial image blocks to obtain N clustering results.

Optionally, the clustering of the M partial image blocks to obtain N clustering results includes: clustering the M partial image blocks using an unsupervised learning clustering method to obtain the N For clustering results, the N is a positive integer greater than 1.

In the embodiment of the present application, before clustering the M partial image blocks, a classification model needs to be used to obtain the local features of the partial image blocks. Therefore, it is necessary to train the classification model first, and use the trained classification model to obtain the local features of the local image block. When training the classification training model, first divide the training samples to obtain multiple partial training samples, then sample the partial training samples, input the sampled training samples into the classification model, and the classification model outputs the local characteristics of the training samples , Use the loss function to train back.

In the embodiment of the present application, the classification model may be used to extract the features of the M partial image blocks respectively to obtain the M partial features. In order to overcome the limitation of a single feature, the present application may use a combined partial feature formed by combining multiple features To perform cluster analysis. Commonly used image local features can include color features, LBP features, texture features, etc. Among them, the color features of the image have less dependence on the size, direction, and viewing angle of the image itself. The commonly used color histogram features describe the different colors. The proportion of the whole image; texture is an important spatial information of remote sensing images. With the increase of resolution, the internal structure of the ground objects becomes clearer and clearer, which is manifested as the texture structure of the ground objects in the remote sensing image. The coming is more obvious; relative to spectral information, texture features can reflect the regular spatial changes of pixels in the target object. Therefore, the terminal device can select different features or feature combinations according to different application scenarios or applications. For example, remote sensing images can use color features and texture features to combine to form combined local features; or select local features based on preset feature options. Features or local feature combinations are not limited in the embodiment of the present application.

Among them, the combined local features can be one-dimensional, that is, multiple combined features can be spliced. For example, the texture feature can be spliced behind the color feature; the combined local feature can also be multi-dimensional, that is, multiple combined features can be A feature matrix, which is not limited in the embodiment of the present application.

Further, the color features are based on the HSL (Hue, Saturation, Lightness) color space, and the color histogram features are extracted. Compared with the RGB color space, the HSL color space is more in line with the visual perception characteristics of the human eye.

In the embodiment of the present application, the partial image blocks belonging to the same category in the M partial image blocks are clustered together based on the extracted M partial features, and N clustering results are obtained. Wherein, the size of the M and N depends on the number of categories contained in the M partial image blocks. As shown in FIG. 3, the number in the partial image block represents the category of the partial image block, and the 9 partial image blocks Perform clustering, cluster the partial image blocks of the same category into the same class, and obtain 4 clustering results. Among them, the clustering result 1 contains 3 partial image blocks, and the clustering results 2-4 contain 2 partial image blocks respectively. Image block.

Wherein, the clustering methods of unsupervised learning include but are not limited to: K-means clustering algorithm, Birch clustering algorithm, DBSCAN clustering algorithm and K nearest neighbor classification algorithm.

Generally speaking, a good clustering division should reflect the internal structure of the data set as much as possible, so that the categories within the same category are as the same as possible, and the categories between the categories are as different as possible. For example, take the K-means clustering algorithm as an example. From the perspective of distance, clusters with extremely small intra-class distances and large inter-class distances are the optimal clusters. In the embodiment of this application, the clusters with similar local features are divided into the same class as much as possible. Dissimilar local features are divided into different categories as much as possible.

S140. Determine a global feature of the image to be classified according to the M local image blocks and the N clustering results.

Wherein, the global feature is the feature vector of the image to be classified, both the global feature and the local feature are the image features of the image to be classified, and the global feature refers to the feature vector extracted from the entire image to be tested, which is derived from the entire The feature of an image to be tested; the local feature refers to a feature vector extracted from a partial image block of the entire image to be tested, and is a feature from a partial image block of the image to be tested.

Further, in order to overcome the limitation of a single feature, the present application may use a combined global feature formed by a combination of multiple features to perform image classification. Image global features can include color features, LBP features, texture features, etc. The terminal device can select different features or feature combinations according to different application scenarios or applications. For example, remote sensing images can use color features and texture features to combine to form a combined global Features; global features or global feature combinations can also be selected according to preset feature options, which are not limited in the embodiment of the present application.

Optionally, the determining the global feature of the image to be classified according to the M local image blocks and the N clustering results includes: separately clustering the M local image blocks with the N clustering results Each clustering result in the result is subjected to a convolution operation to obtain a first feature vector; binary image coding is performed on the first feature vector to obtain the global feature.

For example, as shown in Figure 4, 9 partial image blocks are respectively convolved with 4 clustering results obtained by clustering the 9 partial image blocks to obtain the first feature vector of the image to be classified. The first feature vector is used to describe the global information of the image to be classified.

Wherein, the first feature vector may be one-dimensional, and the feature vector obtained by convolution of the partial image block and the clustering result may be spliced to obtain the first feature vector; the first feature vector may also be multi-dimensional. The application embodiment does not limit this.

Wherein, the first feature vector may also be one or more.

It should be noted that performing a convolution operation on each of the M partial image blocks and each of the N clustering results may be to combine the local features extracted from the partial image blocks with the clustering results. The local features extracted from the class result are convolved to obtain the first feature vector.

Optionally, the performing binary image encoding on the first feature vector to obtain the global feature includes: setting a value greater than the first value in the first feature vector as a first value according to a binary image encoding rule. Two values, and setting a value in the first feature vector that is less than or equal to the first value as the first value to obtain the global feature.

Wherein, the first value can be 0, and the second value can be 1; the first value can also be 255, and the second value can be 0. For example, each pixel of the image can be between (0,255). Value to indicate that the above binary image coding is to convert the image into a grayscale image. Each pixel on the grayscale image has only two possible values or grayscale states. The above global feature can be represented by 8 bits. Characteristics. Of course, the first value and the second value may also be other values, which are not limited in the embodiment of the present application.

In the embodiment of the present application, before performing the convolution operation on the M partial image blocks and each of the N clustering results respectively, it is necessary to use a classification model to obtain the partial image blocks and clusters. The local features of the result can be obtained by using the above classification model to obtain the local features of the local image block and the clustering result.

In an implementation manner of the embodiment of the present application, the terminal device may input the partial image block and the clustering result into the preset algorithm, and use the preset algorithm to process the image. Among them, the preset algorithm may be Fast (Fast Region Based Convolutional Neural Network, RCNN) algorithm. For example, in the implementation, the user can pre-set the convolution window in the Fast RCNN algorithm. After the terminal inputs the partial image blocks and clustering results into the Fast RCNN algorithm, the terminal uses the convolution window to convolve the image to obtain the first Feature vector. Among them, the first feature vector refers to a complete matrix obtained after convolving the image.

S150. Determine a classification result of the image to be classified according to the global feature.

Optionally, the determining the classification result of the image to be classified according to the global feature includes: performing a convolution operation on the global feature and a convolution vector to obtain a probability vector of the global feature, and the convolution vector It is obtained by training the image samples of the labeled category; the category corresponding to the maximum value in the probability vector is used as the classification result of the image to be classified.

Wherein, the size of the above convolution operation is 1x1, and the value in the above probability vector represents the probability that the image to be classified belongs to each category. According to the value of the probability vector, the category corresponding to the maximum value can be used as the category to be classified. The classification result of the image.

In the embodiment of this application, it is necessary to obtain the convolution vector first. The original training samples of each category in the image library can be used for training to obtain the convolution vector, or the collection of image samples marked with image categories can be trained to obtain the convolution vector. The product vector is not limited in the embodiment of this application. It should be noted that the above-mentioned convolution vector may depend on the image samples of the annotated image category. For different image samples of the annotated image category, the convolution vector may be different. Therefore, the terminal device can select the corresponding image according to the application scenario or application. The convolutional vector can be obtained by training with adapted image samples. For example, for the observing terminal equipment, in order to accurately classify remote sensing images and provide detailed ground information, you can directly use the remote sensing image samples with marked categories for training to obtain the convolution vector , Thereby improving the accuracy of image classification.

Among them, the global feature and the convolution vector can directly obtain the probability vector of the image to be classified after the convolution operation. Due to the small amount of 1x1 convolution calculation, the parallelization acceleration is obvious, which greatly reduces the calculation amount of the original data and the memory usage Therefore, the image classification algorithm is optimized under the condition of ensuring the accuracy, and the calculation cost of the unsupervised learning method is greatly reduced under the premise of ensuring the accuracy of the image classification.

In an implementation manner of the embodiment of the present application, the method further includes: acquiring a first image set, the first image set including images of an already-labeled image category; using the first image set to train the classifier to be trained , Get the first classifier;

The determining the classification result of the image to be classified according to the global feature includes: inputting the global feature into the first classifier, and outputting the classification result of the image to be classified.

Among them, inputting the global features of the image to be classified into the classifier can enable the classifier to classify according to the global features of the image to be classified. sort.

In the embodiment of the present application, before using the classifier to obtain the category of the image to be classified, the classifier needs to be trained first, and the trained classifier is used to obtain the category of the image to be classified. : When training the classifier, use the trained classifier to obtain the global features of the images in the first image set, input the global features to the classifier, the classifier outputs the category of the images in the first image set, and use target supervision to control the classifier Perform backhaul training. Among them, target supervision is supervised learning in deep learning, such as loss function.

Further, the classifier may refer to a model that classifies the image to be classified according to the image characteristics of the image to be classified. Optionally, the classifier in this application may be a non-linear classifier, such as a non-linear support vector machine (Support Vector Machine, SVM). Non-linear classifiers can effectively expand the classification dimension and reduce the defects of linear classifiers such as softmax and fully connected layers in non-linear classification.

In the embodiment of the present application, the clustering results of local image blocks are obtained by the clustering method of unsupervised learning, the classification model of supervised learning is used to obtain the global features of the image to be classified, and the classification model of supervised learning is used according to the global features. Determine the category of the image to be classified. This application improves the performance of the image classification algorithm and reduces power consumption by combining unsupervised learning and supervised deep learning algorithms.

In the embodiment of the application, the first classifier and the convolution vector may be obtained by training the classifier to be trained through one image sample, or the first classifier and the convolution vector may be obtained by training the classifier to be trained through different image samples. The embodiment of the application does not limit this.

It can be seen that the embodiment of the application proposes an image classification method, which is applied to a terminal device. By acquiring an image to be classified, the image to be classified is divided to obtain M partial image blocks, and the M partial image blocks are Perform clustering to obtain N clustering results, determine the global feature of the image to be classified according to the M local image blocks and the N clustering results, and determine the global feature of the image to be classified according to the global feature The classification result reduces the amount of redundant calculation for image classification, realizes the simplification and acceleration of the algorithm, and provides a possible way for the image with a large amount of calculation to run on the terminal hardware.

The foregoing mainly introduces the solution of the embodiment of the present application from the perspective of the execution process on the method side. It can be understood that, in order to implement the above-mentioned functions, an electronic device includes hardware structures and/or software modules corresponding to each function. Those skilled in the art should easily realize that in combination with the units and algorithm steps of the examples described in the embodiments provided herein, this application can be implemented in the form of hardware or a combination of hardware and computer software. Whether a certain function is executed by hardware or computer software-driven hardware depends on the specific application and design constraint conditions of the technical solution. Professionals and technicians can use different methods for each specific application to implement the described functions, but such implementation should not be considered beyond the scope of this application.

The embodiment of the present application may divide the electronic device into functional units according to the foregoing method examples. For example, each functional unit may be divided corresponding to each function, or two or more functions may be integrated into one processing unit. The above-mentioned integrated unit can be implemented in the form of hardware or software functional unit. It should be noted that the division of units in the embodiments of the present application is illustrative, and is only a logical function division, and there may be other division methods in actual implementation.

Please refer to FIG. 5. FIG. 5 is a block diagram of the functional unit composition of an image classification device provided by an embodiment of the present application, which is applied to a terminal device. As shown in FIG. 5, the device includes:

The obtaining unit 510 is configured to obtain the image to be classified;

The dividing unit 520 is configured to divide the image to be classified to obtain M partial image blocks, where M is a positive integer greater than 1;

The clustering unit 530 is configured to cluster the M partial image blocks to obtain N clustering results, where N is a positive integer greater than 1;

The first determining unit 540 is configured to determine a global feature of the image to be classified according to the M local image blocks and the N clustering results, where the global feature is a feature vector of the image to be classified;

The second determining unit 550 is configured to determine the classification result of the image to be classified according to the global feature.

In an implementation manner of the embodiment of the present application, the first determining unit 540 is specifically configured to: perform a convolution operation on the M partial image blocks and each of the N clustering results. , Obtain a first feature vector; perform binary image coding on the first feature vector to obtain the global feature.

In an implementation manner of the embodiment of the present application, the first determining unit 540 is further specifically configured to: set a value greater than the first value in the first feature vector to a second value according to a binary image coding rule, And setting a value in the first feature vector that is less than or equal to the first value as the first value to obtain the global feature.

In an implementation manner of the embodiment of the present application, the second determining unit 550 is specifically configured to: perform a convolution operation on the global feature and the convolution vector to obtain the probability vector of the global feature, and the convolution vector It is obtained by training the image samples of the labeled category; the category corresponding to the maximum value in the probability vector is used as the classification result of the image to be classified.

In an implementation manner of the embodiment of the present application, the size of the convolution operation is 1×1.

In an implementation manner of the embodiment of the present application, the obtaining unit 510 is further configured to: obtain a first image set, where the first image set includes images with annotated image categories.

In an implementation of the embodiment of the present application, the device further includes a training unit 560 configured to train the classifier to be trained using the first image set to obtain the first classifier.

In an implementation manner of the embodiment of the present application, the second confirmation unit 550 is further specifically configured to: input the global feature into the first classifier, and output the classification result of the image to be classified.

In an implementation manner of the embodiment of the present application, the clustering unit 530 is specifically configured to use an unsupervised learning clustering method to cluster the M partial image blocks to obtain the N clustering results.

In an implementation manner of the embodiment of the present application, before the dividing the image to be classified, the obtaining unit is further configured to: obtain an original image, and preprocess the original image to obtain the image to be classified.

It is understandable that the functions of the program modules of the image classification device in the embodiment of the present application can be specifically implemented according to the method in the above method embodiment, and the specific implementation process can refer to the relevant description of the above method embodiment, which will not be omitted here. Go into details.

It can be seen that the embodiment of the present application proposes an image classification device, which is applied to a terminal device. By acquiring an image to be classified, the image to be classified is divided to obtain M partial image blocks, and the M partial image blocks are Perform clustering to obtain N clustering results, determine the global feature of the image to be classified according to the M local image blocks and the N clustering results, and determine the global feature of the image to be classified according to the global feature The classification result reduces the amount of redundant calculation for image classification, realizes the simplification and acceleration of the algorithm, and provides a possible way for the image with a large amount of calculation to run on the terminal hardware.

Please refer to FIG. 6. FIG. 6 is a schematic structural diagram of a terminal device provided by an embodiment of the present application. As shown in FIG. 6, the terminal device includes one or more processors, one or more memories, and one or more communications. Interface, and one or more programs;

The one or more programs are stored in the memory, and are configured to be executed by the one or more processors;

The program includes instructions for performing the following steps:

Obtain the image to be classified;

In an implementation manner of the embodiment of the present application, the program includes instructions that are further used to perform the following steps: respectively, the M partial image blocks are collated with each of the N clustering results. Product operation to obtain a first feature vector; binary image coding is performed on the first feature vector to obtain the global feature.

In an implementation manner of the embodiment of the present application, the program includes instructions that are further used to perform the following steps: according to a binary image coding rule, a value greater than the first value in the first feature vector is set to a second value , And setting a value in the first feature vector that is less than or equal to the first value as the first value to obtain the global feature.

In an implementation manner of the embodiment of the present application, the program includes instructions that are further used to perform the following steps: perform a convolution operation on the global feature and the convolution vector to obtain the probability vector of the global feature, and the volume The product vector is obtained by training image samples of the labeled category; the category corresponding to the maximum value in the probability vector is used as the classification result of the image to be classified.

In an implementation manner of the embodiment of the present application, the program includes instructions that are further used to perform the following steps: obtaining a first image set, the first image set including images of an already-labeled image category; using the first image Set the classifier to be trained to obtain the first classifier.

In an implementation manner of the embodiment of the present application, the program includes instructions that are further used to perform the following steps: input the global feature to the first classifier, and output the classification result of the image to be classified.

In an implementation manner of the embodiment of the present application, the program includes instructions that are further used to perform the following steps: clustering the M partial image blocks using an unsupervised learning clustering method to obtain the N clusters Class result.

In an implementation manner of the embodiment of the present application, before dividing the image to be classified, the program includes instructions for performing the following steps: obtaining an original image, and preprocessing the original image to obtain the image to be classified. Categorize images.

An embodiment of the present application also provides a computer storage medium, wherein the computer storage medium stores a computer program for electronic data exchange, and the computer program enables a computer to execute part or all of the steps of any method as recorded in the above method embodiment , The above-mentioned computer includes terminal equipment.

The embodiments of the present application also provide a computer program product. The above-mentioned computer program product includes a non-transitory computer-readable storage medium storing a computer program. Part or all of the steps of the method. The computer program product may be a software installation package, and the above-mentioned computer includes terminal equipment.

It should be noted that for the foregoing method embodiments, for the sake of simple description, they are all expressed as a series of action combinations, but those skilled in the art should know that this application is not limited by the described sequence of actions. Because according to this application, some steps can be performed in other order or at the same time. Secondly, those skilled in the art should also know that the embodiments described in the specification are all preferred embodiments, and the actions and modules involved are not necessarily required by this application.

In the above-mentioned embodiments, the description of each embodiment has its own emphasis. For parts that are not described in detail in an embodiment, reference may be made to related descriptions of other embodiments.

In the several embodiments provided in this application, it should be understood that the disclosed device may be implemented in other ways. For example, the device embodiments described above are only illustrative, for example, the division of the above-mentioned units is only a logical function division, and there may be other divisions in actual implementation, for example, multiple units or components can be combined or integrated. To another system, or some features can be ignored, or not implemented. In addition, the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical or other forms.

The units described above as separate components may or may not be physically separate, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.

In addition, the functional units in the various embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit. The above-mentioned integrated unit can be implemented in the form of hardware or software functional unit.

If the above integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer readable memory. Based on this understanding, the technical solution of the present application essentially or the part that contributes to the existing technology or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a memory. A number of instructions are included to enable a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the foregoing methods of the various embodiments of the present application. The aforementioned memory includes: U disk, Read-Only Memory (ROM, Read-Only Memory), Random Access Memory (RAM, Random Access Memory), mobile hard disk, magnetic disk or optical disk and other media that can store program codes.

Those of ordinary skill in the art can understand that all or part of the steps in the various methods of the above-mentioned embodiments can be completed by a program instructing relevant hardware. The program can be stored in a computer-readable memory, and the memory can include: flash memory, ROM, RAM, magnetic disk or CD, etc.

The embodiments of the application are described in detail above, and specific examples are used in this article to illustrate the principles and implementation of the application. The descriptions of the above embodiments are only used to help understand the methods and core ideas of the application; at the same time, for Those of ordinary skill in the art, based on the idea of the application, will have changes in the specific implementation and the scope of application. In summary, the content of this specification should not be construed as a limitation to the application.

Claims

An image classification method, characterized in that the method includes:

Obtain the image to be classified;

Dividing the image to be classified to obtain M partial image blocks, where M is a positive integer greater than 1;

Clustering the M partial image blocks to obtain N clustering results, where N is a positive integer greater than 1;

Determining a global feature of the image to be classified according to the M local image blocks and the N clustering results, where the global feature is a feature vector of the image to be classified;

The classification result of the image to be classified is determined according to the global feature.
The method according to claim 1, wherein the determining the global feature of the image to be classified according to the M local image blocks and the N clustering results comprises:

Performing a convolution operation on the M partial image blocks and each of the N clustering results to obtain a first feature vector;

Binary image coding is performed on the first feature vector to obtain the global feature.
The method according to claim 2, wherein the performing binary image encoding on the first feature vector to obtain the global feature comprises:

According to the binary image coding rule, a value in the first feature vector greater than a first value is set to a second value, and a value in the first feature vector that is less than or equal to the first value is set to the The first value, the global feature is obtained.
The method according to any one of claims 1 to 3, wherein the determining the classification result of the image to be classified according to the global feature comprises:

Performing a convolution operation on the global feature and a convolution vector to obtain a probability vector of the global feature, where the convolution vector is obtained by training an image sample of an annotation category;

The category corresponding to the maximum value in the probability vector is used as the classification result of the image to be classified.
The method according to claim 4, wherein the size of the convolution operation is 1×1.
The method according to any one of claims 1-3, wherein the method further comprises:

Acquiring a first image set, the first image set including images of an already-labeled image category;

Use the first image set to train the classifier to be trained to obtain the first classifier;

The determining the classification result of the image to be classified according to the global feature includes: inputting the global feature into the first classifier, and outputting the classification result of the image to be classified.
The method according to any one of claims 1 to 6, wherein the clustering the M partial image blocks to obtain N clustering results comprises:

Clustering the M partial image blocks using an unsupervised learning clustering method to obtain the N clustering results.
The method according to any one of claims 1-7, wherein the dividing the image to be classified to obtain M partial image blocks comprises:

The image to be classified is divided into M partial image blocks of the same size according to the preset image size.
The method according to claim 8, wherein the size of the M is determined by the resolution of the image to be classified, and the resolution of the image to be classified and the size of the M are in a proportional relationship or a mapping relationship.
An image classification device, characterized in that the device includes:

The acquiring unit is used to acquire the image to be classified;

A dividing unit, configured to divide the image to be classified to obtain M partial image blocks, where M is a positive integer greater than 1;

A clustering unit, configured to cluster the M partial image blocks to obtain N clustering results, where N is a positive integer greater than 1;

A first determining unit, configured to determine a global feature of the image to be classified according to the M local image blocks and the N clustering results, where the global feature is a feature vector of the image to be classified;

The second determining unit is configured to determine the classification result of the image to be classified according to the global feature.
The device according to claim 10, wherein the first determining unit is specifically configured to:

Perform a convolution operation on the M local image blocks and each of the N clustering results to obtain a first feature vector; perform binary image coding on the first feature vector to obtain the State the overall characteristics.
The device according to claim 11, wherein the first determining unit is further specifically configured to:

According to the binary image coding rule, a value in the first feature vector greater than a first value is set to a second value, and a value in the first feature vector that is less than or equal to the first value is set to the The first value, the global feature is obtained.
The device according to any one of claims 10-12, wherein the second determining unit is specifically configured to:

Perform a convolution operation on the global feature and the convolution vector to obtain a probability vector of the global feature. The convolution vector is obtained by training the image samples of the labeled category; and the category corresponding to the maximum value in the probability vector As the classification result of the image to be classified.
The device according to claim 13, wherein the size of the convolution operation is 1×1.
The device according to any one of claims 10-12, wherein the device further comprises a training unit;

The obtaining unit is further configured to: obtain a first image set, the first image set including images of an already-labeled image category;

The training unit is used to train a classifier to be trained using the first image set to obtain a first classifier;

The second confirmation unit is further specifically configured to: input the global feature into the first classifier, and output a classification result of the image to be classified.
The device according to any one of claims 10-15, wherein the clustering unit is specifically configured to:

Clustering the M partial image blocks using an unsupervised learning clustering method to obtain the N clustering results.
The device according to any one of claims 10-16, wherein the dividing unit is specifically configured to:

The image to be classified is divided into M partial image blocks of the same size according to the preset image size.
The device according to claim 17, wherein the size of the M is determined by the resolution of the image to be classified, and the resolution of the image to be classified is in a proportional relationship or a mapping relationship with the size of the M.
A terminal device, characterized in that the terminal device includes a processor, a memory, a communication interface, and one or more programs, and the one or more programs are stored in the memory and configured by the The processor executes, and the program includes instructions for executing the steps in the method according to any one of claims 1-9.
A computer-readable storage medium, wherein the computer-readable storage medium includes a computer program stored for electronic data exchange, and the computer program causes a computer to execute the method according to any one of claims 1-9 .