CN110751218B

CN110751218B - Image classification method, image classification device and terminal equipment

Info

Publication number: CN110751218B
Application number: CN201911005691.3A
Authority: CN
Inventors: 贾玉虎
Original assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Current assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Priority date: 2019-10-22
Filing date: 2019-10-22
Publication date: 2023-01-06
Anticipated expiration: 2039-10-22
Also published as: CN110751218A

Abstract

The application is applicable to the technical field of image processing, and provides an image classification method, an image classification device and a terminal device, which comprise: acquiring an image to be detected; acquiring global features of the image to be detected, wherein the global features refer to feature vectors extracted from the whole image to be detected; acquiring local features of the image to be detected through a random frame, wherein the local features refer to feature vectors extracted from a local area of the whole image to be detected; and inputting the global features and the local features into a classifier, and outputting the category of the image to be detected by the classifier. By the method and the device, the accuracy of image classification can be improved.

Description

Image classification method, image classification device and terminal equipment

Technical Field

The present application belongs to the technical field of image processing, and in particular, to an image classification method, an image classification device, and a terminal device.

Background

In recent years, image classification has attracted great research interest, and meanwhile, the image classification is successfully deployed in many application products, such as mobile phones, personal computers and other terminal devices, and intelligently solves many practical image processing problems.

With the rapid development of deep learning technology, deep learning has become an advanced technology in image classification. However, the conventional image classification method generally performs image classification by using deep learning to extract global features of the whole image, and cannot accurately classify some similar images.

Disclosure of Invention

The application provides an image classification method, an image classification device and a terminal device, which are used for improving the accuracy of image classification.

A first aspect of the present application provides an image classification method, including:

acquiring an image to be detected;

acquiring global features of the image to be detected, wherein the global features refer to feature vectors extracted from the whole image to be detected;

acquiring local features of the image to be detected through a random frame, wherein the local features refer to feature vectors extracted from a local area of the whole image to be detected;

and inputting the global features and the local features into a classifier, and outputting the category of the image to be detected by the classifier.

A second aspect of the present application provides an image classification apparatus comprising:

the image acquisition module is used for acquiring an image to be detected;

the global feature acquisition module is used for acquiring the global features of the image to be detected, wherein the global features refer to feature vectors extracted from the whole image to be detected;

the local feature acquisition module is used for acquiring local features of the image to be detected through a random frame, wherein the local features refer to feature vectors extracted from a local area of the whole image to be detected;

and the category output module is used for inputting the global features and the local features into a classifier, and the classifier outputs the category of the image to be detected.

A third aspect of the present application provides a terminal device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, the processor implementing the steps of the image classification method according to the first aspect when executing the computer program.

A fourth aspect of the present application provides a computer-readable storage medium, in which a computer program is stored, which computer program, when being executed by a processor, carries out the steps of the image classification method according to the first aspect as described above.

A fifth aspect of the present application provides a computer program product, which, when run on a terminal device, causes the terminal device to perform the steps of the image classification method as described in the first aspect above.

Therefore, the image to be detected is divided into two branches to be processed, one branch acquires the global features of the image to be detected, the other branch acquires the local features of the image to be detected through the random frame, the global features and the local features of the image to be detected are combined and input to the classifier together, and the category of the image to be detected can be acquired through the classifier. According to the image classification method and device, the local features of the image to be detected can be effectively extracted through the random frame, the global features and the local features of the image to be detected are combined to classify the image to be detected, the number of the features for image classification is increased, the accuracy of image classification is improved, and the problems that the categories of the images are similar and the global features are difficult to distinguish can be solved.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings required to be used in the embodiments or the prior art description will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and for those skilled in the art, other drawings may be obtained according to these drawings without inventive labor.

Fig. 1 is a schematic flow chart of an implementation of an image classification method according to an embodiment of the present application;

fig. 2 is a schematic flow chart illustrating an implementation of an image classification method according to a second embodiment of the present application;

FIG. 3a is an exemplary diagram of generating random boxes of the same size on an image under test; FIG. 3b is an exemplary diagram of generating random frames of different sizes on an image to be measured;

fig. 4 is a schematic flow chart illustrating an implementation of an image classification method provided in the third embodiment of the present application;

FIG. 5a is a diagram illustrating an example of a training process for a classification model; FIG. 5b is a diagram illustrating an example of a training process for a classifier; FIG. 5c is a diagram of another example training process for a classification model; FIG. 5d is a diagram of another example training process for a classifier;

FIG. 6 is a schematic diagram of an image classification apparatus according to a fourth embodiment of the present application;

fig. 7 is a schematic diagram of a terminal device according to a fifth embodiment of the present application;

fig. 8 is a schematic diagram of a terminal device according to a sixth embodiment of the present application.

Detailed Description

In the following description, for purposes of explanation and not limitation, specific details are set forth such as particular system structures, techniques, etc. in order to provide a thorough understanding of the embodiments of the present application. It will be apparent, however, to one skilled in the art that the present application may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present application with unnecessary detail.

In particular implementations, the terminal devices described in embodiments of the present application include, but are not limited to, other portable devices such as mobile phones, laptop computers, or tablet computers having touch sensitive surfaces (e.g., touch screen displays and/or touch pads). It should also be understood that in some embodiments, the device is not a portable communication device, but is a desktop computer having a touch-sensitive surface (e.g., a touch screen display and/or touchpad).

In the discussion that follows, a terminal device that includes a display and a touch-sensitive surface is described. However, it should be understood that the terminal device may include one or more other physical user interface devices such as a physical keyboard, mouse, and/or joystick.

The terminal device supports various applications, such as one or more of the following: a drawing application, a presentation application, a word processing application, a website creation application, a disc burning application, a spreadsheet application, a gaming application, a telephone application, a video conferencing application, an email application, an instant messaging application, an exercise support application, a photo management application, a digital camera application, a digital video camera application, a web browsing application, a digital music player application, and/or a digital video player application.

Various applications that may be executed on the terminal device may use at least one common physical user interface device, such as a touch-sensitive surface. One or more functions of the touch-sensitive surface and corresponding information displayed on the terminal can be adjusted and/or changed between applications and/or within respective applications. In this way, a common physical architecture (e.g., touch-sensitive surface) of the terminal can support various applications with user interfaces that are intuitive and transparent to the user.

It should be understood that, the sequence numbers of the steps in this embodiment do not mean the execution sequence, and the execution sequence of each process should be determined by the function and the inherent logic of the process, and should not constitute any limitation to the implementation process of the embodiment of the present application.

In order to explain the technical solution described in the present application, the following description will be given by way of specific examples.

Referring to fig. 1, which is a schematic view of an implementation flow of an image classification method provided in an embodiment of the present application, where the image classification method is applied to a terminal device, as shown in the figure, the image classification method may include the following steps:

and S101, acquiring an image to be detected.

In this embodiment of the application, the image to be detected may be obtained locally from the terminal device, and the image to be detected sent by other devices may also be received, which is not limited herein. The local acquisition of the image to be detected from the terminal equipment may refer to the acquisition of the image to be detected from a memory of the terminal equipment, or may refer to the acquisition of a picture which is not stored in the memory when the terminal equipment takes a picture. When the image to be detected is a picture which is shot by the terminal equipment and not stored in the memory, the picture can be classified, and the picture does not need to be classified subsequently.

In this embodiment of the application, the image to be detected may be divided into two branches to be processed respectively, specifically, the image to be detected is copied to obtain two identical images to be detected, the two identical images to be detected may be referred to as a first image to be detected and a second image to be detected respectively, one branch processes the first image to be detected to obtain a global feature of the image to be detected, and the other branch processes the second image to be detected to obtain a local feature of the image to be detected.

The image to be detected may refer to an image of a category to be detected. The image generally comprises a main body and a background, wherein the main body is an object mainly represented by the image, the background is a scene which is used for setting off the main body in the image, the category of the image is determined according to the main body in the image, for example, the main body in the image is a bouquet, and then the category of the image is a bouquet class; if the subject in the image is a green plant, then the class of the image is the green plant class.

And S102, acquiring the global characteristics of the image to be detected.

And S103, acquiring local characteristics of the image to be detected through a random frame.

The global features and the local features are image features of the image to be detected and are one-dimensional feature vectors. The global feature refers to a feature vector extracted from the whole image to be detected, and is a feature from the whole image to be detected; the local feature is a feature vector extracted from a local region of the entire image to be measured, and is a feature from the local region of the image to be measured.

In the embodiment of the application, as more images have no definite fixed image features, the images are easy to be confused, effective local features cannot be extracted, and the local features of the images can be effectively extracted through the random frame. The random frame is a frame generated randomly and used for selecting a local image (namely a local area) in the image to be detected, and the shape of the random frame is not limited herein.

And step S104, inputting the global features and the local features into a classifier, and outputting the category of the image to be detected by the classifier.

In the embodiment of the application, the global features and the local features of the image to be detected are combined and input to the classifier together, so that the classifier classifies the image to be detected according to the global features and the local features of the image to be detected, namely the local features are input in image classification, and the problems that the categories of the images are similar and the global features cannot be distinguished can be solved. The classifier may be a model for classifying the image to be detected according to the image features of the image to be detected, such as a softmax classifier, a full connection layer or a Support Vector Machine (SVM) classifier, and the specific type of the classifier is not limited herein. The nonlinear SVM classifier can effectively expand classification dimensionality and reduce the defects of softmax in nonlinear classification.

According to the image classification method and device, the local features of the image to be detected are introduced through the random frame on the basis of the global features of the image to be detected, the global features and the local features of the image to be detected are combined to classify the image to be detected, the number of the features for image classification is increased, the problems that the categories of the images are similar and the global features are difficult to distinguish can be solved, the accuracy of image classification is improved, and the image classification method and device are suitable for being deployed in embedded equipment.

Referring to fig. 2, it is a schematic diagram of an implementation flow of an image classification method provided in the second embodiment of the present application, where the image classification method is applied to a terminal device, and as shown in the figure, the image classification method may include the following steps:

step S201, acquiring an image to be measured.

The step is the same as step S101, and reference may be made to the related description of step S101, which is not repeated herein.

And S202, generating a random frame on the image to be detected.

In the embodiment of the application, as more images have no definite fixed image features, the images are easy to be confused, effective local features cannot be extracted, and the local features of the images can be effectively extracted through the random frame. The random frame is a frame that is randomly generated and used for selecting a local image (i.e., a local area) in an image to be measured, as shown in fig. 3a and 3b, fig. 3a is an exemplary diagram of generating random frames with the same size on the image to be measured, fig. 3b is an exemplary diagram of generating random frames with different sizes on the image to be measured, where the different sizes refer to sizes that are not completely the same.

Optionally, after acquiring the image to be measured, the embodiment of the present application further includes:

acquiring a scene to which the image to be detected belongs;

correspondingly, the generating a random frame on the image to be tested includes:

if the scene of the image to be detected is a first scene, generating random frames with different sizes on the image to be detected;

and if the scene to which the image to be detected belongs is a second scene, generating random frames with the same size on the image to be detected.

In the embodiment of the application, whether the sizes of the random frames are the same or not can be set according to the scenes to which the images belong, and the scenes to which the images belong can be divided according to whether the scene features of the images are fixed or not, for example, the first scene is an image scene with unfixed scene features, such as a natural scene; the second scene is an image scene with fixed scene features, such as a plant scene. For example, images belonging to a natural scene generally have no definite fixed scene features, and local features of the images belonging to the natural scene can be extracted by using random frames with different sizes so as to extract effective local features; the images belonging to the plant scenes usually have fixed scene images, local features of the images belonging to the plant scenes can be effectively extracted by using random frames with the same size, if the local features of the images belonging to the plant scenes are extracted by using random frames with different sizes, the extracted local features may not be obvious and are easily interfered by other information (such as background information) because the size change of the random frames is large, for example, one image has three or four flowers in a large piece of grass, the flowers belong to a flower clump and do not belong to a grass clump, if the local features of the images are extracted by using random frames with different sizes, the category of the image is judged, and the category of the image is possibly judged as the grass clump. The scene features may refer to image features that can characterize a scene to which an image belongs. The fixed scene features may mean that the scene features are distributed in the image more intensively and are not dispersed; the scene features are not fixed, which means that the scene features are distributed in the image more dispersedly and are not concentrated. The scene to which the image to be detected belongs may be a scene to which a main body in the image to be detected belongs, for example, if the main body of the image is a flower, the scene to which the image belongs is a plant scene; the main body of the image is a sand beach, a valley, etc., and the scene to which the image belongs is a natural scene.

And step S203, down-sampling the image in the random frame.

In the embodiment of the application, the size of the image in the random frame can be reduced by down-sampling the image in the random frame, and the calculation amount of the image in the random frame is reduced.

Optionally, when the random frame is multiple in number, before down-sampling the image in the random frame, the embodiment of the present application further includes:

sequencing the images in the random frames according to preset conditions;

correspondingly, the down-sampling the image in the random frame comprises:

and downsampling the images in the random frame according to the arrangement sequence.

In the embodiment of the application, when a plurality of random frames are generated on the image to be detected, the images in the random frames can be sequenced according to preset conditions, the images in each random frame are down-sampled according to the arrangement sequence, the images in each random frame after down-sampling are sequentially input into the first classification model according to the arrangement sequence, the feature vector of the images in each random frame is sequentially extracted, so that the local features of the image to be detected corresponding to the images in each random frame are obtained, and the feature vector of the image in one random frame is the local features of the image to be detected corresponding to the images in the random frame. The preset condition may refer to a preset sorting policy. If the sizes of the plurality of random frames are different, the preset condition includes, but is not limited to, the size of the random frame (i.e., the size of the images in the random frame), for example, the images in the random frames are sorted according to the order from the small size to the large size of the random frame; if the sizes of the random frames are the same, the preset condition includes, but is not limited to, the positions of the random frames in the image to be measured, as shown in fig. 3a, for the image to be measured with the size of 224 × 224, 5 × 5 may be selected as the size of the random frame, and it is assumed that the coordinate of the upper left corner of each random frame is (x) _min ,y _min ) Comparing x of each random frame _min Sorting from small to large, if x _min If the same, y of each random frame is compared _min The sorting is from small to large. Due to the fact that the feature vectors of the images in different random frames are similar, the images in the random frames are sequenced, and therefore the situation that similar feature vectors corresponding to different random frame images are mixed up can be avoided.

And S204, inputting the down-sampled image in the random frame into a first classification model, and outputting a feature vector of the image in the random frame by the first classification model, wherein the feature vector is a local feature of the image to be detected.

The first classification model may be a model that obtains a feature vector (i.e., an image feature) of an image in a random frame, and is used to output the feature vector of the image in the random frame, where a specific type of the first classification model is not limited herein.

And S205, down-sampling the image to be detected.

In the embodiment of the application, the image to be detected is downsampled, so that the size of the image to be detected can be reduced, and the calculated amount of the image to be detected is reduced.

Step S206, inputting the downsampled image to be detected into a second classification model, and outputting the global features of the image to be detected by the second classification model.

The second classification model may be a model for obtaining a feature vector of the image to be detected, and is used to output the feature vector of the image to be detected, where a specific type of the second classification model is not defined.

It should be noted that the first classification model and the second classification model may be the same classification model or different classification models. When the first classification model and the second classification model are the same classification model, the down-sampled image in the random frame and the down-sampled image to be detected are input into the classification model, the classification model outputs the local characteristic and the global characteristic of the image to be detected, namely the local characteristic and the global characteristic of the image to be detected are extracted through one classification model, the occupation of the memory of the terminal equipment can be reduced, meanwhile, two branches share one weight, and the efficiency of the classification model is improved. When the first classification model and the second classification model are different classification models, the first classification model is suitable for extracting local features, the second classification model is suitable for extracting global features, images in the random frame after down sampling are input into the first classification model, the local features of the images to be detected can be better detected, the images to be detected after down sampling are input into the second classification model, and the local features of the images to be detected can be better detected.

And step S207, inputting the global features and the local features into a classifier, and outputting the category of the image to be detected by the classifier.

The step is the same as step S104, and reference may be made to the related description of step S104, which is not repeated herein.

The image classification method and the image classification device can extract the local features and the global features of the image to be detected through one classification model, can also extract the local features and the global features of the image to be detected through different classification models respectively, can reduce occupation of the content of the terminal equipment by using one classification model, can simplify the image classification process while improving the image classification accuracy, and can better extract the corresponding image features by using two different classification models.

Referring to fig. 4, which is a schematic view of an implementation flow of an image classification method provided in the third embodiment of the present application, where the image classification method is applied to a terminal device, as shown in the figure, the image classification method may include the following steps:

step S401, an image to be measured is obtained.

And S402, generating a random frame on the image to be detected.

The step is the same as step S202, and reference may be made to the related description of step S202, which is not repeated herein.

And S403, down-sampling the image in the random frame.

The step is the same as step S203, and reference may be specifically made to the related description of step S203, which is not repeated herein.

And S404, inputting the down-sampled image in the random frame into a first classification model, outputting a feature vector of the image in the random frame by the first classification model, wherein the feature vector is the local feature of the image to be detected.

The step is the same as step S204, and reference may be made to the related description of step S204, which is not repeated herein.

And step S405, down-sampling the image to be detected.

The step is the same as step S205, and reference may be made to the related description of step S205, which is not repeated herein.

Step S406, inputting the downsampled image to be detected into a second classification model, and outputting the global features of the image to be detected by the second classification model.

The step is the same as step S206, and reference may be made to the related description of step S206, which is not repeated herein.

And S407, performing feature splicing on the global features and the local features to form a spliced one-dimensional feature vector.

In the embodiment of the application, the dimensions of the global feature and the local feature of the image to be detected are both one-dimensional, the global feature can be spliced after the local feature, the global feature can also be spliced before the local feature, no limitation is made here, and after the global feature and the local feature of which the dimensions are all one-dimensional are spliced, a spliced one-dimensional feature vector can be formed, for example, n selection boxes, n is an integer greater than zero, the feature vector corresponding to the image in the n selection boxes is v, and v is e { v ∈ { v [ ] ₁ ,…v _n Is global with w ₁ After the global features and the local features are spliced, the formed spliced one-dimensional feature vector can be w e { v ∈ [ ] ₁ ,…v _n ,w ₁ Is equal to or w e { w ∈ } ₁ ,v ₁ ,…v _n }。

Optionally, when the number of the local features is multiple, the performing feature splicing on the global features and the local features to form a one-dimensional feature vector after splicing includes:

sequencing the local features according to a preset condition;

and splicing the sequenced local features and the global features to form a one-dimensional feature vector.

In the embodiment of the application, the local features extracted from the images in the random frames can be sequenced according to preset conditions, then the global features are spliced before or after the local features, and the fused one-dimensional feature vector is formed. The preset conditions are the same as the preset conditions for sequencing the images in the random frames, and the feature vectors corresponding to the images in different random frames can be prevented from being mixed up.

And step S408, inputting the spliced one-dimensional feature vector to the classifier, and outputting the category of the image to be detected by the classifier.

The step is partially the same as step S104, and the same parts may specifically refer to the related description of step S104, and are not described herein again.

In the embodiment of the application, the classification model and the classifier need to be trained respectively, the trained classification model is used for extracting the local features and the global features of the image to be detected, and the trained classifier is used for classifying the image to be detected.

When the first classification model and the second classification model are the same classification model, the training process of the classification model is as shown in fig. 5a, when the classification model is trained, a training sample is divided into two branches to be processed, a random frame is randomly generated on the training sample in an upper branch, images in the random frame are sequenced, and the images in the sequenced random frame are down-sampled; meanwhile, the training samples are down-sampled in the down-branch, then the images in the random frame after down-sampling and the training samples after down-sampling are queued, the images in the queue (including the images in the random frame and the training samples after down-sampling) are sequentially input into the classification model, and the classification model outputs sequential local features and global features for training and returning. As shown in fig. 5b, after the classification model is trained, the trained classification model is used to extract the local features and the global features of the training sample, and the local features and the global features are subjected to feature splicing to form a spliced one-dimensional feature vector, the spliced one-dimensional feature vector is input to a classifier, and target supervision is used to perform back pass training on the classifier.

When the first classification model and the second classification model are different classification models, the training process of the first classification model and the second classification model is as shown in fig. 5c, when the classification models are trained, training samples are divided into two branches to be processed, firstly, random frames are randomly generated on the training samples in the upper branches, images in the random frames are sequenced, the images in the sequenced random frames are downsampled, the downsampled images in the random frames are input into the first classification model, and the loss function is used for carrying out pass-back training on the first classification model; meanwhile, the training samples are down-sampled in the lower branch, the down-sampled training samples are input into a second classification model, and the second classification model is subjected to pass-back training by using a loss function. As shown in fig. 5d, after the first classification model and the second classification model are trained, the trained first classification model is used to extract the local features of the training samples, the trained second classification model is used to extract the global features of the training samples, and the local features and the global features are subjected to feature splicing to form a spliced one-dimensional feature vector, the spliced one-dimensional feature vector is input to the classifier, and the classifier is trained by using target supervision. The first classification model and the second classification model are trained respectively, so that the trained first classification model and the trained second classification model have higher pertinence, for example, the first classification model is more suitable for extracting local features, and the second classification model is more suitable for extracting global features.

According to the image classification method and device, the global features and the local features of the image to be detected are spliced into the one-dimensional feature vector, so that the calculated amount can be reduced, and the image classification efficiency is improved.

Fig. 6 is a schematic diagram of an image classification apparatus provided in the fourth embodiment of the present application, and for convenience of description, only the portions related to the fourth embodiment of the present application are shown.

The image classification apparatus includes:

the image acquisition module 61 is used for acquiring an image to be detected;

a global feature obtaining module 62, configured to obtain a global feature of the image to be detected, where the global feature refers to a feature vector extracted from the entire image to be detected;

a local feature obtaining module 63, configured to obtain a local feature of the image to be detected through a random frame, where the local feature refers to a feature vector extracted from a local area of the entire image to be detected;

and a category output module 64, configured to input the global features and the local features to a classifier, where the classifier outputs a category of the image to be detected.

Optionally, the local feature obtaining module 63 includes:

a random frame generating unit, configured to generate a random frame on the image to be detected;

the random frame down-sampling unit is used for down-sampling the image in the random frame;

the local feature obtaining unit is used for inputting the images in the random frame to a first classification model after down sampling, outputting the feature vectors of the images in the random frame by the first classification model, and enabling the feature vectors to be local features of the images to be detected.

Optionally, the global feature obtaining module 62 includes:

the image sorting unit is used for sorting the images in the random frames according to preset conditions when the number of the random frames is multiple;

the random frame downsampling unit is specifically configured to downsample the images in the random frame according to the arrangement order.

Optionally, the image classification apparatus further includes:

a scene obtaining module 65, configured to obtain a scene to which the image to be detected belongs;

the random frame generating unit is specifically configured to:

and if the scene of the image to be detected is a second scene, generating random frames with the same size on the image to be detected.

Optionally, the global feature obtaining module 62 includes:

the image downsampling unit is used for downsampling the image to be detected;

and the global feature acquisition unit is used for inputting the downsampled image to be detected into a second classification model, and the second classification model outputs the global features of the image to be detected.

Optionally, the category output module 64 includes:

the feature splicing unit is used for performing feature splicing on the global features and the local features to form a spliced one-dimensional feature vector;

and the feature input unit is used for inputting the spliced one-dimensional feature vector to the classifier.

Optionally, the feature splicing unit is specifically configured to:

sequencing the local features according to a preset condition;

The image classification device provided in the embodiment of the present application can be applied to the first, second, and third embodiments of the foregoing methods, and for details, refer to the description of the first, second, and third embodiments of the foregoing methods, which is not described herein again.

Fig. 7 is a schematic diagram of a terminal device according to a fifth embodiment of the present application. The terminal device as shown in the figure may include: one or more processors 701 (only one shown); one or more input devices 702 (only one shown), one or more output devices 703 (only one shown), and memory 704. The processor 701, the input device 702, the output device 703, and the memory 704 are connected by a bus 705. The memory 704 is used to store instructions and the processor 701 is used to execute instructions stored by the memory 704. Wherein:

the processor 701 is configured to obtain an image to be detected; acquiring global features of the image to be detected, wherein the global features refer to feature vectors extracted from the whole image to be detected; acquiring local features of the image to be detected through a random frame, wherein the local features refer to feature vectors extracted from a local area of the whole image to be detected; and inputting the global features and the local features into a classifier, and outputting the category of the image to be detected by the classifier.

Optionally, the processor 701 is specifically configured to:

generating a random frame on the image to be detected;

down-sampling the image in the random frame;

and inputting the down-sampled image in the random frame into a first classification model, and outputting a feature vector of the image in the random frame by the first classification model, wherein the feature vector is the local feature of the image to be detected.

Optionally, the processor 701 is further configured to:

when the number of the random frames is multiple, sorting the images in the multiple random frames according to a preset condition;

Optionally, the processor 701 is specifically configured to:

acquiring a scene to which the image to be detected belongs;

Optionally, the processor 701 is specifically configured to:

down-sampling the image to be detected;

and inputting the downsampled image to be detected into a second classification model, and outputting the global features of the image to be detected by the second classification model.

Optionally, the processor 701 is specifically configured to:

performing feature splicing on the global features and the local features to form spliced one-dimensional feature vectors;

and inputting the spliced one-dimensional feature vector to the classifier.

Optionally, the processor 701 is specifically configured to:

when the number of the local features is multiple, sequencing the multiple local features according to a preset condition;

It should be understood that, in the embodiment of the present Application, the Processor 701 may be a Central Processing Unit (CPU), and the Processor may also be other general processors, digital Signal Processors (DSPs), application Specific Integrated Circuits (ASICs), field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components, and the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The input device 702 may include a touch pad, a fingerprint sensor (for collecting fingerprint information of a user and direction information of the fingerprint), a microphone, a data receiving interface, and the like. The output devices 703 may include a display (LCD, etc.), speakers, a data transmission interface, and so forth.

The memory 704 may include both read-only memory and random-access memory and provides instructions and data to the processor 701. A portion of the memory 704 may also include non-volatile random access memory. For example, the memory 704 may also store device type information.

In specific implementation, the processor 701, the input device 702, the output device 703, and the memory 704 described in this embodiment may execute the implementation described in the embodiment of the image classification method provided in this embodiment, or may execute the implementation described in the image classification apparatus described in the fourth embodiment, which is not described herein again.

Fig. 8 is a schematic diagram of a terminal device provided in a sixth embodiment of the present application. As shown in fig. 8, the terminal device 8 of this embodiment includes: a processor 80, a memory 81 and a computer program 82 stored in said memory 81 and executable on said processor 80. The processor 80 implements the steps in the various image classification method embodiments described above when executing the computer program 82. Alternatively, the processor 80 implements the functions of the modules/units in the above-described apparatus embodiments when executing the computer program 82.

The terminal device 8 may be a desktop computer, a notebook, a palm computer, a cloud server, or other computing devices. The terminal device may include, but is not limited to, a processor 80, a memory 81. Those skilled in the art will appreciate that fig. 8 is merely an example of a terminal device 8 and does not constitute a limitation of terminal device 8 and may include more or fewer components than shown, or some components may be combined, or different components, e.g., the terminal device may also include input-output devices, network access devices, buses, etc.

The processor 80 may be a central processing unit CPU, but may also be other general purpose processors, digital signal processors DSP, application specific integrated circuits ASIC, off-the-shelf programmable gate arrays FPGA or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The storage 81 may be an internal storage unit of the terminal device 8, such as a hard disk or a memory of the terminal device 8. The memory 81 may also be an external storage device of the terminal device 8, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like, which are provided on the terminal device 8. Further, the memory 81 may also include both an internal storage unit and an external storage device of the terminal device 8. The memory 81 is used for storing the computer program and other programs and data required by the terminal device. The memory 81 may also be used to temporarily store data that has been output or is to be output.

It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-mentioned division of the functional units and modules is illustrated, and in practical applications, the above-mentioned function distribution may be performed by different functional units and modules according to needs, that is, the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-mentioned functions. Each functional unit and module in the embodiments may be integrated in one processing unit, or each unit may exist alone physically, or two or more units are integrated in one unit, and the integrated unit may be implemented in a form of hardware, or in a form of software functional unit. In addition, specific names of the functional units and modules are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working processes of the units and modules in the system may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and reference may be made to the related descriptions of other embodiments for parts that are not described or illustrated in a certain embodiment.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the technical solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus/terminal device and method may be implemented in other ways. For example, the above-described embodiments of the apparatus/terminal device are merely illustrative, and for example, the division of the modules or units is only one type of logical function division, and other division manners may be available in actual implementation, for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be through some interfaces, indirect coupling or communication connection of devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one position, or may be distributed on multiple network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The embodiments of the present application further provide a computer-readable storage medium, where a computer program is stored, and when the computer program is executed by a processor, the computer program implements the steps in the above-mentioned method embodiments.

The embodiments of the present application provide a computer program product, which when running on a terminal device, enables the terminal device to implement the steps in the above method embodiments when executed.

The integrated module/unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, all or part of the flow in the method of the embodiments described above can be realized by a computer program, which can be stored in a computer-readable storage medium and can realize the steps of the embodiments of the methods described above when the computer program is executed by a processor. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, U.S. disk, removable hard disk, magnetic diskette, optical disk, computer Memory, read-Only Memory (ROM), random Access Memory (RAM), electrical carrier wave signal, telecommunications signal, and software distribution medium, etc. It should be noted that the computer readable medium may contain content that is subject to appropriate increase or decrease as required by legislation and patent practice in jurisdictions, for example, in some jurisdictions, computer readable media does not include electrical carrier signals and telecommunications signals as is required by legislation and patent practice.

The above-mentioned embodiments are only used to illustrate the technical solutions of the present application, and not to limit the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present application and are intended to be included within the scope of the present application.

Claims

1. An image classification method, characterized in that the image classification method comprises:

acquiring an image to be detected;

acquiring a scene to which the image to be detected belongs;

obtaining local features of the image to be detected through a random frame, wherein the local features refer to feature vectors of the image in the random frame, and the random frame is generated randomly;

inputting the global features and the local features into a classifier, and outputting the category of the image to be detected by the classifier;

the obtaining of the local features of the image to be detected through the random frame includes:

if the scene to which the image to be detected belongs is a first scene, generating random frames with different sizes on the image to be detected, wherein the first scene is an image scene with unfixed scene characteristics; if the scene of the image to be detected is a second scene, generating random frames with the same size on the image to be detected, wherein the second scene is an image scene with fixed scene characteristics;

down-sampling the images in the random frame;

and inputting the images in the random frame after down sampling into a first classification model, and outputting the local features by the first classification model.

2. The image classification method according to claim 1, when the random frame is plural in number, before downsampling the images in the random frame, further comprising:

sequencing the images in the random frames according to a preset condition;

correspondingly, the down-sampling the image in the random frame comprises:

3. The image classification method according to claim 1, wherein the obtaining of the global features of the image to be detected includes:

down-sampling the image to be detected;

4. The image classification method of any of claims 1 to 3, wherein inputting the global features and the local features to a classifier comprises:

performing feature splicing on the global features and the local features to form a spliced one-dimensional feature vector;

and inputting the spliced one-dimensional feature vector to the classifier.

5. The image classification method according to claim 4, wherein when the number of the local features is plural, the feature stitching the global features and the local features to form a stitched one-dimensional feature vector comprises:

sequencing the local features according to a preset condition;

6. An image classification apparatus, characterized by comprising:

the image acquisition module is used for acquiring an image to be detected;

the scene acquisition module is used for acquiring the scene of the image to be detected;

the local feature acquisition module is used for acquiring local features of the image to be detected through a random frame, wherein the local features refer to feature vectors of the image in the random frame, and the random frame is generated randomly;

the category output module is used for inputting the global features and the local features into a classifier, and the classifier outputs the category of the image to be detected;

the local feature acquisition module includes:

a random frame generating unit, configured to generate random frames with different sizes on the image to be detected if a scene to which the image to be detected belongs is a first scene, where the first scene is an image scene with unfixed scene characteristics; if the scene to which the image to be detected belongs is a second scene, generating random frames with the same size on the image to be detected, wherein the second scene is an image scene with fixed scene characteristics;

and the local special acquisition unit is used for inputting the images in the random frame after down sampling into a first classification model, and the first classification model outputs the local features.

7. A terminal device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the steps of the image classification method according to any one of claims 1 to 5 when executing the computer program.

8. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the image classification method according to any one of claims 1 to 5.