CN114373098A

CN114373098A - Image classification method and device, computer equipment and storage medium

Info

Publication number: CN114373098A
Application number: CN202111672688.4A
Authority: CN
Inventors: 牛力强
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2021-12-31
Filing date: 2021-12-31
Publication date: 2022-04-19

Abstract

The application provides an image classification method, an image classification device, computer equipment and a storage medium, which can be applied to the field of artificial intelligence or intelligent transportation and the like and are used for solving the problem of low efficiency and accuracy of image classification. The method at least comprises the following steps: performing multiple rounds of feature screening on the feature vector sequence by adopting the target image classification model to generate a key feature vector, wherein the following operations are executed in at least one round of feature screening: and aiming at the feature vector sequence obtained after the previous round of feature screening, before the feature screening of the current round, filtering the obtained feature vector sequence based on the feature similarity between each candidate feature vector currently contained in the obtained feature vector sequence and the target object in the image to be classified. Therefore, through filtering processing, the data volume needing to be processed is reduced, meanwhile, relevant features can be identified in a more targeted manner, and the efficiency and the accuracy of image classification are improved.

Description

Image classification method and device, computer equipment and storage medium

Technical Field

The present application relates to the field of computer technologies, and in particular, to an image classification method and apparatus, a computer device, and a storage medium.

Background

With the continuous development of science and technology, more and more devices can classify images based on the features contained in the images. For example, images are classified by language based on the language of characters included in the images; as another example, images are classified by species based on the species of the target object they contain.

Different image areas of the image respectively contain different features, and usually only some features contained in the target image area help the device to obtain the correct classification result of the image. For example, when an image is classified according to languages, only the features of characters in the target image region including the characters contribute to determining the correct language classification result of the image.

However, in the conventional image classification process, it is necessary to calculate probabilities that images belong to different classifications based on features included in all image regions of the images, and then determine a classification result of the images based on the calculated magnitudes of the probabilities.

When calculating the probability that an image belongs to different classifications based on the features contained in all image regions of the image, the device not only needs to consume a large amount of calculation resources in other image regions except for a target image region, so that the efficiency of image classification is low; and the device is also easily influenced by features contained in other image regions, and the calculated probability that the image belongs to the correct classification is lower than the probability that the image belongs to the wrong classification, so that the device generates a wrong classification result.

Therefore, in the related art, the efficiency and accuracy of image classification are low.

Disclosure of Invention

The embodiment of the application provides an image classification method, an image classification device, computer equipment and a storage medium, and aims to solve the problem that the efficiency and the accuracy of image classification are low.

In a first aspect, an image classification method is provided, including:

for an image to be classified, dividing the image to be classified into a plurality of sub-image regions by adopting a trained target image classification model, and respectively extracting respective alternative feature vectors of the plurality of sub-image regions to generate a feature vector sequence;

performing multiple rounds of feature screening on the feature vector sequence by adopting the target image classification model to generate a key feature vector, wherein the following operations are executed in at least one round of feature screening: aiming at a feature vector sequence obtained after the previous round of feature screening, before the current round of feature screening is carried out, filtering the obtained feature vector sequence based on the feature similarity between each candidate feature vector currently contained in the obtained feature vector sequence and a target object in the image to be classified;

and classifying the images to be classified by adopting the target image classification model based on the key characteristic vector, and outputting the target category of the images to be classified.

In a second aspect, an image classification method is provided, including:

obtaining an image to be classified;

inputting the image to be classified into a trained target image classification model to obtain a target classification output by the target image classification model;

the target image classification model is obtained by performing multiple rounds of iterative training on the image classification model to be trained based on each sample data, each round of training at least performs multiple rounds of feature screening on a sample feature vector sequence of a sample image contained in the sample data, and in at least one round of feature screening, the following operations are performed: and aiming at a sample feature vector sequence obtained after the previous round of feature screening, before the current round of feature screening, filtering the obtained sample feature vector sequence based on the sample feature similarity between each sample candidate feature vector currently contained in the obtained sample feature vector sequence and a sample object in the sample image.

In a third aspect, an image classification apparatus is provided, including:

a first processing module: the image classification method comprises the steps of adopting a trained target image classification model for an image to be classified, dividing the image to be classified into a plurality of sub-image areas, respectively extracting respective alternative feature vectors of the plurality of sub-image areas, and generating a feature vector sequence;

the first processing module is further configured to: performing multiple rounds of feature screening on the feature vector sequence by adopting the target image classification model to generate a key feature vector, wherein the following operations are executed in at least one round of feature screening: aiming at a feature vector sequence obtained after the previous round of feature screening, before the current round of feature screening is carried out, filtering the obtained feature vector sequence based on the feature similarity between each candidate feature vector currently contained in the obtained feature vector sequence and a target object in the image to be classified;

a second processing module: and the image classification module is used for classifying the image to be classified based on the key feature vector by adopting the target image classification model and outputting the target category of the image to be classified.

Optionally, the first processing module is specifically configured to:

respectively extracting image characteristic vectors of the plurality of sub-image areas;

respectively generating corresponding region position vectors based on the positions of the sub-image regions in the image to be classified;

respectively fusing each image feature vector with the corresponding region position vector to respectively obtain the respective alternative feature vectors of the plurality of sub-image regions;

and sequentially arranging corresponding candidate feature vectors based on the position relation among the plurality of sub-image regions represented by the region position vectors to generate the feature vector sequence.

Optionally, the first processing module is specifically configured to, in each round of feature screening, perform the following operations:

acquiring a feature vector sequence and a key feature vector obtained after the previous round of feature screening;

respectively determining the feature similarity between each candidate feature vector currently contained in the obtained feature vector sequence and a target object contained in the image to be classified;

and based on each feature similarity, selecting a candidate feature vector with the feature similarity meeting the similarity condition from the candidate feature vectors currently contained in the obtained feature vector sequence, and updating the key feature vector.

Optionally, the first processing module is specifically configured to, in at least one round of feature screening, perform the following operations:

taking a feature vector sequence obtained after the previous round of feature screening as an initial feature vector sequence, and performing filtering processing on candidate feature vectors of which feature similarities do not meet the similarity condition in the candidate feature vectors currently contained in the initial feature vector sequence to generate a target feature vector sequence, wherein the number of the candidate feature vectors contained in the target feature vector sequence is smaller than the number of the candidate feature vectors contained in the initial feature vector sequence;

respectively determining the feature similarity between each candidate feature vector currently contained in the target feature vector sequence and the target object contained in the image to be classified, selecting candidate feature vectors with feature similarity meeting the similarity condition from each candidate feature vector currently contained in the target feature vector sequence, and updating the key feature vector.

Optionally, the first processing module is specifically configured to execute at least any one of the following manners:

performing convolution operation on each alternative feature vector contained in the initial feature vector sequence based on a specified convolution kernel and a specified convolution step length, and generating the target feature vector sequence based on each alternative feature vector obtained by the convolution operation;

and performing multilayer perception processing on each candidate feature vector contained in the initial feature vector sequence, fusing candidate feature vectors with feature similarity not meeting a similarity condition and candidate feature vectors with feature similarity meeting a similarity condition in the initial feature vector sequence by taking the number of designated vectors as a target, and generating the target feature vector sequence based on each candidate feature vector obtained by fusion, wherein the number of the designated vectors is less than the number of each candidate feature vector contained in the initial feature vector sequence.

In a fourth aspect, there is provided an image classification apparatus comprising:

an acquisition module: used for obtaining the image to be classified;

a processing module: the image classification method comprises the steps of inputting the image to be classified into a trained target image classification model to obtain a target classification output by the target image classification model;

the target image classification model is obtained by performing multiple rounds of iterative training on the image classification model to be trained based on each sample data, and each round of training at least executes the following operations: performing multi-round feature screening on a sample feature vector sequence of a sample image contained in sample data to generate a sample key feature vector for classifying the sample image; in at least one round of feature screening, performing the following operations: and aiming at the feature vector sequence obtained after the previous round of feature screening, before the feature screening of the current round, filtering the obtained feature vector sequence based on the feature similarity between each candidate feature vector currently contained in the obtained feature vector sequence and the target object in the image to be classified.

Optionally, the processing module is further configured to;

before inputting the image to be classified into a trained target image classification model and obtaining a target classification output by the target image classification model, obtaining each sample data, wherein each sample data comprises a sample image and a sample class corresponding to the sample image;

and performing multiple rounds of iterative training on the image classification model to be trained based on each sample data until the training loss meets the training target, and outputting the image classification model to be trained as a trained target image classification model.

Optionally, the processing module is specifically configured to, during each round of training:

aiming at a sample image contained in sample data, dividing the sample image into a plurality of sample sub-image areas by adopting an image classification model to be trained, respectively extracting sample alternative characteristic vectors of the plurality of sample sub-image areas, and generating a sample characteristic vector sequence;

performing multi-round feature screening on the sample feature vector sequence by adopting the image classification model to generate a sample key feature vector;

classifying the sample images by adopting the image classification model based on the sample key feature vectors, and outputting training classes of the sample images;

determining a training loss of the image classification model based on an error between the training class and a sample class included in the sample data.

Optionally, the processing module is specifically configured to:

respectively extracting sample image feature vectors of the plurality of sample sub-image areas;

generating respective sample region position vectors based on respective positions of the plurality of sample sub-image regions in the sample image;

respectively fusing each sample image feature vector with a corresponding sample region position vector to respectively obtain sample alternative feature vectors of the plurality of sample sub-image regions;

and sequentially arranging corresponding sample candidate feature vectors based on the position relation among the plurality of sample sub-image regions characterized by the respective sample region position vectors to generate the sample feature vector sequence.

Optionally, the processing module is specifically configured to, in each round of feature screening:

obtaining a sample feature vector sequence and a sample key feature vector obtained after the previous round of feature screening;

respectively determining sample feature similarity between each sample candidate feature vector currently contained in the obtained sample feature vector sequence and a sample object contained in the sample image;

based on the feature similarity of each sample, selecting a sample candidate feature vector with the sample feature similarity meeting the sample similarity condition from the sample candidate feature vectors currently contained in the obtained sample feature vector sequence, and updating the key feature vector of the sample.

Optionally, the processing module is specifically configured to, in at least one round of feature screening:

taking a sample feature vector sequence obtained after the previous round of feature screening as a sample initial feature vector sequence, and performing filtering processing on sample candidate feature vectors, of which sample feature similarities do not meet the sample similarity condition, in each sample candidate feature vector currently contained in the sample initial feature vector sequence to generate a sample target feature vector sequence, wherein the number of each sample candidate feature vector contained in the sample target feature vector sequence is less than the number of each sample candidate feature vector contained in the sample initial feature vector sequence;

and respectively determining sample candidate feature vectors currently contained in the sample target feature vector sequence and sample feature similarity between the sample candidate feature vectors and sample objects contained in the sample image, selecting the sample candidate feature vectors with the sample feature similarity meeting a sample similarity condition from the sample candidate feature vectors currently contained in the sample target feature vector sequence, and updating the sample key feature vectors.

Optionally, the processing module is specifically configured to execute at least any one of the following manners:

performing convolution operation on each sample candidate feature vector contained in the sample initial feature vector sequence based on a specified convolution kernel and a specified convolution step length, and generating a sample target feature vector sequence based on each sample candidate feature vector obtained by the convolution operation;

and performing multilayer perception processing on each sample candidate feature vector contained in the sample initial feature vector sequence, fusing the sample candidate feature vectors in the sample initial feature vector sequence, of which the sample feature similarity does not meet the sample similarity condition, and the sample candidate feature vectors of which the sample feature similarity meets the sample similarity condition, with the sample candidate feature vectors in the sample initial feature vector sequence as a target, and generating the sample target feature vector sequence based on each sample candidate feature vector obtained by fusion, wherein the number of the specified vectors is less than the number of each candidate feature vector contained in the initial feature vector sequence.

In a fifth aspect, a computer program product is provided, comprising a computer program which, when executed by a processor, implements the method according to the first or second aspect.

In a sixth aspect, there is provided a computer device comprising:

a memory for storing program instructions;

a processor for calling the program instructions stored in the memory and executing the method according to the first aspect or the second aspect according to the obtained program instructions.

In a seventh aspect, there is provided a computer-readable storage medium having stored thereon computer-executable instructions for causing a computer to perform the method of the first aspect or the second aspect.

In the embodiment of the application, the image to be classified is divided into a plurality of sub-image regions, the respective alternative feature vectors of each sub-image region are respectively extracted, and a feature vector sequence is generated, so that when the key feature vectors are determined based on the feature vector sequence, the key features related to the target object contained in the image to be classified in each sub-image region can be considered, the situation that the key features occupying a smaller proportion in the image to be classified are ignored when the key feature vectors are determined based on the whole image to be classified is avoided, and the accuracy of determining the target category of the image to be classified is improved to a certain extent.

Furthermore, when multiple rounds of feature screening are performed based on the feature vector sequence, in at least one round of feature screening, the obtained feature vector sequence is filtered before the feature vector sequence is subjected to the feature screening of the current round, so that the data amount required to be processed in the at least one round of feature screening process is reduced, the computing resource for determining the target category of the image to be classified is reduced, and the efficiency for determining the target category of the image to be classified is improved.

Furthermore, based on the feature similarity between each candidate feature vector currently contained in the obtained feature vector sequence and the target object in the image to be classified, the obtained feature vector sequence is filtered, so that the target image classification model can more specifically identify the image related to the target object in the image to be classified in the feature vector sequence, and the target category of the image to be classified can be more accurately determined.

Drawings

Fig. 1a is an application scenario one of the image classification methods provided in the embodiment of the present application;

fig. 1b is a schematic diagram illustrating a principle of an image classification method according to an embodiment of the present application;

fig. 1c is a second application scenario of the image classification method according to the embodiment of the present application;

fig. 2 is a schematic flowchart of an image classification method according to an embodiment of the present application;

fig. 3 is a schematic diagram illustrating a principle of an image classification method according to an embodiment of the present application;

fig. 4 is a schematic diagram illustrating a third principle of the image classification method according to the embodiment of the present application;

fig. 5a is a schematic diagram illustrating a principle of an image classification method according to an embodiment of the present application;

fig. 5b is a schematic diagram illustrating a principle of the image classification method according to the embodiment of the present application;

fig. 6a is a schematic diagram illustrating a principle of an image classification method according to an embodiment of the present application;

fig. 6b is a schematic diagram seven illustrating a principle of the image classification method according to the embodiment of the present application;

fig. 6c is a schematic diagram eight illustrating a principle of the image classification method according to the embodiment of the present application;

fig. 7 is a schematic structural diagram of an image classification apparatus according to an embodiment of the present application;

fig. 8 is a schematic structural diagram of an image classification apparatus according to an embodiment of the present application;

fig. 9 is a schematic structural diagram of a third image classification device according to an embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application.

Some terms in the embodiments of the present application are explained below to facilitate understanding by those skilled in the art.

(1) Image Classification (Image Classification):

the image classification is an image processing method for determining different images into different categories according to different characteristics reflected by each image, and the images are classified by carrying out quantitative analysis on the images to replace the visual interpretation of human beings.

(2) 10 hundred million Floating-point operands Per Second (GFLOPs):

the 10 hundred million floating point operands per second can be used as GPU performance parameters, and theoretically, the higher the value, the better the GPU performance, and 1GFLOPs is 1000 MFLOPs.

With the research and progress of artificial intelligence technology, the artificial intelligence technology is developed and researched and applied in a plurality of fields, such as common smart homes, smart wearable devices, virtual assistants, smart speakers, smart marketing, unmanned driving, automatic driving, unmanned aerial vehicles, robots, smart medical treatment, smart teaching, car networking, automatic driving, smart transportation and the like.

The embodiment of the application relates to the field of Artificial Intelligence (AI) and a block chain (Blockchain), is designed based on Machine Learning (ML) and Computer Vision technology (CV), and can be applied to the fields of cloud computing, intelligent traffic or auxiliary driving and the like.

Artificial intelligence is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. In other words, artificial intelligence is a comprehensive technique in computer science that studies the design principles and implementation of various machines in an attempt to understand the essence of intelligence and produces a new intelligent machine that can react in a manner similar to human intelligence, enabling the machine to have the functions of perception, reasoning and decision making.

The artificial intelligence is a comprehensive subject, and relates to a wide field, namely a hardware level technology and a software level technology. The basic technologies of artificial intelligence generally include sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technology, operation interaction systems, mechatronics, and the like. The artificial intelligent software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning, automatic driving, intelligent traffic and the like. With the development and progress of artificial intelligence, artificial intelligence is being researched and applied in a plurality of fields, for example, common fields such as smart homes, smart customer service, virtual assistants, smart speakers, smart marketing, smart wearable devices, unmanned driving, automatic driving, unmanned aerial vehicles, robots, smart medical care, internet of vehicles, automatic driving, and smart transportation. The scheme provided by the embodiment of the application relates to technologies such as artificial intelligence deep learning and augmented reality, and is further explained by the following embodiment.

Machine learning is a multi-field cross subject, relates to a plurality of subjects such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and the like, and is characterized in that a computer is specially researched to obtain new knowledge or skill by simulating the learning behavior of human beings, and reorganize the existing knowledge structure, so that the computer can continuously improve the performance.

Machine learning is the core of artificial intelligence, is a fundamental approach for enabling computers to have intelligence, and is applied to all fields of artificial intelligence; the core of machine learning is deep learning, which is a technology for realizing machine learning. Machine learning generally includes deep learning, reinforcement learning, migration learning, inductive learning, artificial Neural network, formal education learning, and the like, and deep learning includes Convolutional Neural Network (CNN), deep belief network, recurrent Neural network, auto-encoder, generation countermeasure network, and the like.

Computer vision is a science for researching how to make a machine "see", and further, it means that a camera and a computer are used to replace human eyes to perform machine vision such as identification, tracking and measurement on a target, and further image processing is performed, so that the computer processing becomes an image more suitable for human eyes to observe or transmitted to an instrument to detect. As a scientific discipline, computer vision research-related theories and techniques attempt to build artificial intelligence systems that can capture information from images or multidimensional data. The computer vision technology generally includes image processing, image recognition, image semantic understanding, image retrieval, OCR, video processing, video semantic understanding, video content/behavior recognition, three-dimensional object reconstruction, 3D technology, virtual reality, augmented reality, synchronous positioning and map construction, automatic driving, intelligent transportation and other technologies, and also includes common biometric identification technologies such as face recognition and fingerprint recognition.

The blockchain is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, a consensus mechanism and an encryption algorithm. The blockchain is essentially a decentralized database, which is a string of data blocks associated by using cryptography, each data block contains information of a batch of network transactions, and the information is used for verifying the validity (anti-counterfeiting) of the information and generating the next block. The blockchain may include a blockchain underlying platform, a platform product services layer, and an application services layer.

The block chain underlying platform can comprise processing modules such as user management, basic service, intelligent contract and operation monitoring. The user management module is responsible for identity information management of all blockchain participants, and comprises public and private key generation maintenance (account management), key management, user real identity and blockchain address corresponding relation maintenance (authority management) and the like, and under the authorization condition, the user management module supervises and audits the transaction condition of certain real identities and provides rule configuration (wind control audit) of risk control; the basic service module is deployed on all block chain node equipment and used for verifying the validity of the service request, recording the service request to storage after consensus on the valid request is completed, for a new service request, the basic service firstly performs interface adaptation analysis and authentication processing (interface adaptation), then encrypts service information (consensus management) through a consensus algorithm, transmits the service information to a shared account (network communication) completely and consistently after encryption, and performs recording and storage; the intelligent contract module is responsible for registering and issuing contracts, triggering the contracts and executing the contracts, developers can define contract logics through a certain programming language, issue the contract logics to a block chain (contract registration), call keys or other event triggering and executing according to the logics of contract clauses, complete the contract logics and simultaneously provide the function of upgrading and canceling the contracts; the operation monitoring module is mainly responsible for deployment, configuration modification, contract setting, cloud adaptation in the product release process and visual output of real-time states in product operation, such as: alarm, monitoring network conditions, monitoring node equipment health status, and the like.

The platform product service layer provides basic capability and an implementation framework of typical application, and developers can complete block chain implementation of business logic based on the basic capability and the characteristics of the superposed business. The application service layer provides the application service based on the block chain scheme for the business participants to use.

It should be noted that, in the embodiments of the present application, related data such as content input by a user need to obtain user permission or consent when the above embodiments of the present application are applied to specific products or technologies, and the collection, use and processing of related data need to comply with related laws and regulations and standards of related countries and regions.

The following briefly introduces an application field of the image classification method provided in the embodiment of the present application.

With the continuous development of science and technology, more and more devices can classify images based on the features contained in the images. For example, images are classified by language based on the language of characters included in the images; as another example, images are classified by species based on the species of the target object they contain; for example, the images are classified according to brand signals based on the brand numbers of the target objects included in the images.

Different image areas of the image respectively contain different features, and usually only some features contained in the target image area help the device to obtain the correct classification result of the image. For example, referring to fig. 1a, when classifying an image according to languages, only the features of the target text in the target image region containing text are helpful for determining the correct language classification result of the image, for example, in fig. 1a, the features in the target image region marked by a dashed box are helpful for determining the correct classification result of the image belonging to an image in the english language.

In order to solve the problem of low efficiency and accuracy of image classification, the present application provides an image classification method, please refer to fig. 1b, which employs a trained target image classification model to divide an image to be classified into a plurality of sub-image regions, and extracts respective alternative feature vectors of the plurality of sub-image regions respectively to generate a feature vector sequence. Performing multi-round feature screening on the feature vector sequence by adopting a target image classification model to generate a key feature vector, wherein the following operations are executed in at least one round of feature screening: and aiming at the feature vector sequence obtained after the previous round of feature screening, before the current round of feature screening, filtering the obtained feature vector sequence based on the feature similarity between each candidate feature vector currently contained in the obtained feature vector sequence and a target object in the image to be classified. And classifying the images to be classified by adopting a target image classification model based on the key characteristic vector, and outputting the target category of the images to be classified.

An application scenario of the image classification method provided in the present application is described below.

Please refer to fig. 1c, which is a schematic view of an application scenario of the image classification method provided in the present application. The application scenario includes a client 101 and a server 102. The client 101 and the server 102 may communicate with each other, and the communication may be performed by using a wired communication technology, for example, by using a connection network or a serial port; the communication may also be performed by using a wireless communication technology, for example, communication is performed by using technologies such as bluetooth or wireless fidelity (WIFI), and the like, which is not limited specifically.

The client 101 generally refers to a device that can provide the server 102 with the image to be classified, such as a terminal device, a web page accessible by the terminal device, or a third-party program accessible by the terminal device. The terminal equipment can be intelligent transportation equipment, a camera, a mobile phone, an intelligent electric appliance or a vehicle-mounted terminal and the like. The server 102 generally refers to a device, such as a terminal device or a server, which can process the image to be classified. Servers include, but are not limited to, cloud servers, local servers, or associated third party servers, etc. The client 101 and the server 102 can both adopt cloud computing to reduce the occupation of local computing resources; cloud storage can also be adopted to reduce the occupation of local storage resources.

As an embodiment, the client 101 and the server 102 may be the same device, and are not limited in particular. In the embodiment of the present application, the client 101 and the server 102 are respectively different devices for example.

Based on fig. 1c, taking the server 102 as a main body and the server 102 as a server as an example, the image classification method provided by the embodiment of the present application is specifically described below.

Please refer to fig. 2, which is a flowchart illustrating an image classification method according to an embodiment of the present disclosure.

S201, aiming at the image to be classified, dividing the image to be classified into a plurality of sub-image areas by adopting a trained target image classification model, respectively extracting respective alternative characteristic vectors of the plurality of sub-image areas, and generating a characteristic vector sequence.

The client side can respond to the classification operation triggered by the target account aiming at the image to be classified, the image to be classified is sent to the server, and the server receives the image to be classified sent by the client side; the server can also take the preset time length as a period, and sequentially reads the stored images to be classified, so that the images to be classified can be sequentially classified; the server may also use the received image as an image to be classified when receiving the image from the designated interface, or receiving the image sent by the designated device, or receiving the image sent by the designated network IP, and the like, which is not limited specifically.

After obtaining the image to be classified, the server may adopt a trained target image classification model to divide the image to be classified into a plurality of sub-image regions for the image to be classified. The server can divide the images to be classified based on the size of the designated area; the images to be classified can also be divided based on the number of the designated areas; the image to be classified may also be divided into a plurality of irregular sub-image regions and the like based on the super-pixel technology, and the details are not limited.

For example, referring to fig. 3, the size of the image to be classified is 128 × 96 × 3, and the server may divide the image to be classified based on the specified area size 16 × 16 × 3 to obtain 48 sub-image areas.

After obtaining the plurality of sub-image regions, the server may extract respective candidate feature vectors of the plurality of sub-image regions, respectively, to generate a feature vector sequence.

The server can adopt a feature extraction module in the target image classification target to respectively extract image feature vectors of a plurality of sub-image areas, and the extracted image feature vectors are used as alternative feature vectors corresponding to the corresponding sub-image areas.

The server may also generate corresponding region position vectors based on respective positions of the plurality of sub-image regions in the image to be classified, when extracting the image feature vectors of the plurality of sub-image regions. And the server respectively carries out fusion processing on each image feature vector and the corresponding region position vector to respectively obtain the respective alternative feature vectors of the plurality of sub-image regions. By fusing the region position vectors in the image feature vectors, the positions of the sub-image regions corresponding to the candidate feature vectors in the image to be classified can be considered when the target image classification model performs multi-round feature screening on the candidate feature vectors, so that the candidate feature vectors can be subjected to a feature screening process with a side emphasis point according to the positions. For example, for composition of an image, a target object included in the image is usually located in the middle of the image, and therefore, sub-image regions located at edge portions of the image to be classified, such as sub-image regions at four corners of the image to be classified, include fewer target objects, so that when feature screening is performed on the four sub-image regions, most features of the four sub-image regions can be filtered out, and the efficiency and accuracy of image classification are not affected.

When the image feature vector and the corresponding region position vector are subjected to fusion processing, vector summation processing can be carried out on the two vectors to obtain an alternative feature vector; or carrying out vector multiplication processing on the two vectors to obtain an alternative characteristic vector; or perform association processing on the two vectors to obtain a candidate feature vector including two sub-vectors, and the like, which is not limited specifically.

After obtaining the candidate feature vectors of each of the plurality of sub-image regions, the server may sequentially arrange the corresponding candidate feature vectors based on the positions of each of the plurality of sub-image regions in the image to be classified, or based on the position relationship between the plurality of sub-image regions characterized by the position vectors of each region, to generate a feature vector sequence.

As an embodiment, when extracting the image feature vectors of the plurality of sub-image regions, the server may extract the image feature matrices of the plurality of sub-image regions, and then map the image feature matrices, and expand the image feature matrices into the image feature vectors.

As an embodiment, after extracting the image feature vectors of each of the plurality of sub-image regions, the server may perform linear transformation on each image feature vector, compress the image feature vectors into a specified dimension, and the linear transformation may be implemented by using a full connection layer. Therefore, the dimensionalities of all calculated vectors of the target image classification model are uniform dimensionalities in the image classification process, and the situations that the dimensions of extracted feature vectors are different, the target image classification model cannot calculate the feature vectors, or partial features are directly discarded and the like due to the fact that the sizes of images to be classified are different or the contained feature quantities are different are avoided.

And S202, performing multi-round feature screening on the feature vector sequence by adopting a target image classification model to generate a key feature vector.

After the server obtains the feature vector sequence, the server can continue to adopt the target image classification model to perform multiple rounds of feature screening on the feature vector sequence to generate a key feature vector. The feature screening can be realized based on a multi-head attention mechanism, a self-attention mechanism and the like, and the target image classification model can adopt a coding module in a trained Transformer model to execute a multi-round feature screening process, and is not particularly limited. After each round of feature screening, the feature vector sequence and the key feature vector are obtained again, and the key feature vector obtained in the last round of feature screening is used as the finally generated key feature vector.

In at least one round of feature screening, performing the following operations: and aiming at the feature vector sequence obtained after the previous round of feature screening, before the current round of feature screening, filtering the obtained feature vector sequence based on the feature similarity between each candidate feature vector currently contained in the obtained feature vector sequence and a target object in the image to be classified.

Referring to fig. 4, after obtaining the feature vector sequence, taking the feature vector sequence as an example that the feature vector sequence includes a candidate feature vector a, a candidate feature vector B, a candidate feature vector C, a candidate feature vector D, and a candidate feature vector E, and five candidate feature vectors, the server may add a key feature vector at the head of the feature vector sequence, where each vector element in the key feature vector may initially be 0 or any random number, and the dimension of the key feature vector is the same as that of each candidate feature vector, and update each vector element in the key feature vector through each round of feature screening. And obtaining a final key feature vector after the last round of feature screening.

In the following, a round of feature screening is taken as an example for description, and the processes of other rounds of feature screening are similar and will not be described herein again.

After the server performs the previous round of feature screening, a feature vector sequence and a key feature vector obtained after the previous round of feature screening can be obtained. After obtaining the feature vector sequence and the key feature vector obtained after the previous round of feature screening, the server may respectively determine feature similarities between the candidate feature vectors currently included in the obtained feature vector sequence and the target object included in the image to be classified, select, based on the feature similarities, the candidate feature vector whose feature similarity satisfies the similarity condition from the candidate feature vectors currently included in the obtained feature vector sequence, and update the key feature vector.

The server determines the feature similarity, selects an alternative feature vector with the feature similarity meeting the similarity condition, and updates the key feature vector, so that the feature vector sequence and the key feature vector can be subjected to feature screening according to the model parameters obtained by training the target image classification model, for example, the feature vector sequence and the key feature vector are subjected to feature screening according to a coding module in a trained Transformer model, and the key feature vector is updated, so as to obtain the feature vector sequence and the key feature vector obtained by feature screening in the current round.

For example, the server may determine, for the similarity between each two adjacent vectors in the key feature vector and each candidate feature vector, the feature similarity between each candidate feature vector currently included in the obtained feature vector sequence and the target object included in the image to be classified. And fusing every two adjacent vectors based on the similarity between every two adjacent vectors, and taking the fused vectors as candidate feature vectors with the selected feature similarity meeting the similarity condition, so as to obtain regenerated key feature vectors and various candidate feature vectors, and obtain key feature vectors and feature vector sequences obtained after feature screening of the current round.

For another example, the server may further determine, for the key feature vector, a similarity between each candidate feature vector and the key feature vector, as a feature similarity between each candidate feature vector currently included in the obtained feature vector sequence and a target object included in the image to be classified. And based on the similarity between each candidate feature vector and the key feature vector, performing weighted summation on each candidate feature vector, based on a weighted summation result, taking the candidate feature vector with the feature similarity meeting the similarity condition as a selected feature vector, and regenerating the key feature vector. Similarly, the server determines the similarity between other candidate feature vectors and the current candidate feature vector according to each candidate feature vector in sequence, and regenerates the current candidate feature vector based on the similarity. Therefore, the server can obtain the key feature vector and the feature vector sequence obtained after the feature screening of the current round.

In at least one round of feature screening of the multiple rounds of feature screening, after obtaining the feature vector sequence and the key feature vector obtained after the previous round of feature screening, the server may use the feature vector sequence obtained after the previous round of feature screening as an initial feature vector sequence. The server may perform filtering processing on candidate feature vectors, of which feature similarities do not satisfy the similarity condition, in the candidate feature vectors currently included in the initial feature vector sequence to generate a target feature vector sequence, where the number of the candidate feature vectors included in the target feature vector sequence is smaller than the number of the candidate feature vectors included in the initial feature vector sequence.

Through the filtering processing, the number of the alternative characteristic vectors contained in the target characteristic vector sequence is reduced, so that the consumed computing resources are reduced when the characteristic screening is subsequently carried out, the computing efficiency is improved, and the image classification efficiency is improved.

In at least one round of feature screening, after the number of each candidate feature vector included in the target feature vector sequence is reduced, the server may perform feature screening on the obtained target feature vector sequence, that is, determine feature similarities between each candidate feature vector currently included in the target feature vector sequence and a target object included in the image to be classified, select a candidate feature vector whose feature similarity satisfies a similarity condition from each candidate feature vector currently included in the target feature vector sequence, update the key feature vector, improve the efficiency of the feature screening of the present round and the efficiency of subsequent multi-round feature screening, where specific feature screening processes may be referred to in the foregoing description, and are not described herein again.

As an embodiment, the server may perform a plurality of processes of filtering, by using two alternative feature vectors, the candidate feature vectors whose feature similarity does not satisfy the similarity condition, among the alternative feature vectors currently included in the initial feature vector sequence, and the specific filtering method is not limited.

The method comprises the following steps:

and filtering based on a convolution operation method.

The server can perform convolution operation on each candidate feature vector contained in the initial feature vector sequence based on the specified convolution kernel and the specified convolution step. And generating a target feature vector sequence based on each candidate feature vector obtained by convolution operation.

Referring to fig. 5a, an example is given in which the initial feature vector sequence includes 6 candidate feature vectors, i.e., a first candidate feature vector, a second candidate feature vector, a third candidate feature vector, a fourth candidate feature vector, a fifth candidate feature vector, and a sixth candidate feature vector.

If the size of the specified convolution kernel is 2 and the specified convolution step is 2, the server performs convolution operation on the first candidate feature vector and the second candidate feature vector with the specified convolution kernel to obtain a seventh candidate feature vector; performing convolution operation on the third candidate feature vector and the fourth candidate feature vector and a specified convolution kernel to obtain an eighth candidate feature vector; and performing convolution operation on the fifth candidate feature vector and the sixth candidate feature vector and a specified convolution kernel to obtain a ninth candidate feature vector.

And the server generates a target feature vector sequence based on the obtained seventh candidate feature vector, the eighth candidate feature vector and the ninth candidate feature vector.

For example, please refer to table 1, which is an example of performing filtering based on a convolution operation method.

TABLE 1

Wherein, B is batch size, which represents the batch size, N is batch token, which represents the number of the candidate feature vectors, and D represents hidden layer dimension.

The second method comprises the following steps:

and performing filtering processing based on the multi-layer sensing processing.

The server can perform multilayer perception processing on each candidate feature vector contained in the initial feature vector sequence, and with the number of the designated vectors as a target, fuse the candidate feature vectors in the initial feature vector sequence, the feature similarity of which does not meet the similarity condition, and the candidate feature vectors, the feature similarity of which meets the similarity condition, and generate a target feature vector sequence based on each candidate feature vector obtained by fusion, wherein the number of the designated vectors is smaller than the number of each candidate feature vector contained in the initial feature vector sequence.

And when the candidate feature vectors with the feature similarity not meeting the similarity condition in the initial feature vector sequence are fused, the server can realize the fusion through a plurality of activation functions when the candidate feature vectors with the feature similarity meeting the similarity condition are fused with the candidate feature vectors with the feature similarity meeting the similarity condition, the weight of each activation function is learned through the training process of the target image classification model, and therefore after the trained target image classification model is obtained, the server can carry out filtering processing on the initial feature vector sequence based on the activation functions, obtain each candidate feature vector obtained through fusion based on the weighted sum of the calculation results of the activation functions, and generate the target feature vector sequence.

Referring to fig. 5b, the example that the initial feature vector sequence includes 6 candidate feature vectors, i.e., a first candidate feature vector, a second candidate feature vector, a third candidate feature vector, a fourth candidate feature vector, a fifth candidate feature vector, and a sixth candidate feature vector, is continued.

If the multi-layer perception processing comprises two hidden layers and the number of the designated vectors is 3, the server inputs a first candidate feature vector, a second candidate feature vector, a third candidate feature vector, a fourth candidate feature vector, a fifth candidate feature vector and a sixth candidate feature vector into the first hidden layer, and then inputs the calculation result of the first hidden layer into the second hidden layer. And obtaining a seventh candidate feature vector, an eighth candidate feature vector and a ninth candidate feature vector through the calculation result of the second hidden layer.

For example, please refer to table 2, which is an example of performing the filtering process based on the convolution operation method.

TABLE 2

And S203, classifying the images to be classified by adopting a target image classification model based on the key characteristic vector, and outputting the target category of the images to be classified.

After obtaining the key feature vector, the server may perform classification processing on the image to be classified by using a target image classification model based on the key feature vector, and output a target category of the image to be classified. And the classification processing is realized by a Softmax module, the probability that the key characteristic vectors belong to different classes is calculated, and then the target class of the image to be classified is determined based on the calculated probability. And determining the target category of the image to be classified based on the calculated magnitude of each probability, wherein the category corresponding to the maximum probability may be used as the target category of the image to be classified, or the category of which the probability is greater than a probability threshold may be used as the target category of the image to be classified, and the like, and the method is not limited specifically.

The image classification method provided by the embodiment of the present application is described below by taking as an example a process of classifying an image to be classified according to a species of a target object included in the image to be classified.

Referring to fig. 6a, the image to be classified obtained by the server is an image containing a target object of "cat", and the size of the image to be classified is 64 × 48 × 3. After obtaining the image to be classified, the server divides the image to be classified into 12 sub-image areas by adopting a target image classification model, wherein the size of each sub-image area is 16 multiplied by 3. After obtaining the 12 sub-image regions, the server extracts respective candidate feature vectors of each sub-image region, and obtains 12 candidate feature vectors respectively. And arranging each alternative feature vector in the image to be classified according to the arrangement sequence from left to right and from top to bottom in the corresponding sub-image region to generate a feature vector sequence.

Referring to fig. 6b, after obtaining a feature vector sequence including 12 candidate feature vectors, the server uses a target image classification model, adds a key feature vector with each vector element being 0 to the head of the feature vector sequence, performs 10 rounds of feature screening on the feature vector sequence, updates the key feature vector after each round of feature screening, and generates a final key feature vector after the 10 th round of feature screening.

During the 1 st round of feature screening, feature screening is performed on features, of 12 candidate feature vectors included in the feature vector sequence, of which feature similarity with a target object included in an image to be classified meets a similarity condition, based on model parameters of a target image classification model, key feature vectors are updated based on screening results, and 12 candidate feature vectors are regenerated to obtain the feature vector sequence and the key feature vectors obtained by the round of feature screening. The processes of the 2 nd round feature screening, the 4 th to 5 th round feature screening, the 7 th to 8 th round feature screening and the 10 th round feature screening are similar, and are not repeated herein.

In the 3 rd round of feature screening, the feature vector sequence obtained after the previous round of feature screening is firstly filtered, that is, the feature vector sequence obtained in the 2 nd round of feature screening is filtered, and the number of each candidate feature vector contained in the feature vector sequence is reduced to half of the original number, that is, 12 candidate feature vectors are reduced to 6 candidate feature vectors. For example, the server performs convolution operation with a convolution step of 2 on 12 candidate feature vectors based on a specified convolution kernel with a size of 2, so as to obtain 6 regenerated candidate feature vectors, and obtain a feature vector sequence obtained after filtering processing. The 6 th round of feature screening is similar to the 9 th round of feature screening, and 6 candidate feature vectors are reduced into 3 candidate feature vectors through filtering processing during the 6 th round of feature screening. And during the 9 th round of feature screening, reducing the 3 candidate feature vectors into 2 candidate feature vectors by filtering processing, which is not described herein again.

And performing feature screening on a feature vector sequence which is obtained after the filtering process and contains 6 candidate feature vectors. The feature screening process is similar to the 1 st round of feature screening process and will not be described herein.

Referring to fig. 6c, after obtaining the key feature vector through 10 rounds of feature screening, the server may determine a target category of the image to be classified based on the key feature vector by using a target image classification model, where the target category is "cat".

In the embodiment of the application, filtering processing can be performed before any one or more rounds of feature screening, so that the purpose of reducing the data volume needing to be processed is achieved. In order to balance the image classification efficiency and accuracy, filtering may be performed before at least one round of feature screening, or filtering may be performed before a specified round of feature screening, which is specifically set according to an actual usage scenario, and is not limited herein.

For example, before the 3 rd, 6 th and 9 th round feature screening, the filtering process is performed, and the comparison is made with the case where the filtering process is not performed, please refer to table 3.

Although the parameter data amount is slightly increased, the floating point number calculation amount is greatly reduced, namely the calculation amount is reduced to half of the original calculation amount, the image classification efficiency and accuracy are improved, and the image classification time is shortened by even 30ms in the classification process for other large-size images.

As an example, the target image classification target may include 12 rounds of feature filtering, and the server may perform filtering processing before 4, 7, and 10 rounds of feature filtering, and the like, which is not limited in particular.

Based on the same inventive concept, the embodiment of the application provides an image classification method, and the method inputs the image to be classified into a trained target image classification model after obtaining the image to be classified, so as to obtain the target classification output by the target image classification model. The target image classification model is obtained by performing multiple rounds of iterative training on the image classification model to be trained based on each sample data, and each round of training at least executes the following operations: performing multi-round feature screening on a sample feature vector sequence of a sample image contained in sample data to generate a sample key feature vector for classifying the sample image; in at least one round of feature screening, performing the following operations: and aiming at the feature vector sequence obtained after the previous round of feature screening, before the current round of feature screening, filtering the obtained feature vector sequence based on the feature similarity between each candidate feature vector currently contained in the obtained feature vector sequence and a target object in the image to be classified.

In the embodiment of the application, in each round of training process, in at least one round of feature screening, the obtained feature vector sequence is filtered, so that the data quantity required to be processed in the current round of feature screening and subsequent feature screening is reduced, the efficiency of training the image classification model is improved, and the efficiency of image classification is improved by using the trained target image classification model.

The following describes the process of training the image classification model to be trained.

The server may obtain various sample data from a network resource or other device, each sample data containing a sample image and a sample category. And the server performs multiple rounds of iterative training on the image classification model to be trained based on each sample data until the training loss meets the training target, and outputs the image classification model to be trained as the trained target image classification model.

Each round of iterative training process is similar, and one of the rounds of iterative training processes is described as an example below.

The server divides the sample image into a plurality of sample sub-image areas by adopting an image classification model to be trained aiming at the sample image contained in the sample data, extracts sample candidate characteristic vectors of the plurality of sample sub-image areas respectively and generates a sample characteristic vector sequence. And performing multi-round feature screening on the sample feature vector sequence by adopting an image classification model to be trained to generate a sample key feature vector. And classifying the sample images by adopting an image classification model to be trained based on the key characteristic vectors of the samples, and outputting the training categories of the sample images. The process of obtaining the training category of the sample image by the server using the image classification model to be trained is similar to the process of obtaining the target category of the image to be classified by the server using the target image classification model described above, and is not described herein again.

After obtaining the training class, the server may determine a training loss of the image classification model based on an error between the training class and a sample class included in the sample data. If the training loss does not meet the training target, the model parameters of the image classification model to be trained can be adjusted, the next round of iterative training is continued until the training loss meets the training target, and the image classification model to be trained is output and serves as the trained target image classification model.

Based on the same inventive concept, the embodiment of the present application provides an image classification device, which can implement the corresponding functions of the foregoing image classification method. Referring to fig. 7, the apparatus includes a first processing module 701 and a second processing module 702, wherein:

the first processing module 701: the image classification method comprises the steps of adopting a trained target image classification model for an image to be classified, dividing the image to be classified into a plurality of sub-image areas, respectively extracting alternative feature vectors of the sub-image areas, and generating a feature vector sequence;

the first processing module 701 is further configured to: performing multi-round feature screening on the feature vector sequence by adopting a target image classification model to generate a key feature vector, wherein the following operations are executed in at least one round of feature screening: aiming at a feature vector sequence obtained after the previous round of feature screening, before the current round of feature screening, filtering the obtained feature vector sequence based on the feature similarity between each candidate feature vector currently contained in the obtained feature vector sequence and a target object in an image to be classified;

the second processing module 702: and the image classification module is used for classifying the images to be classified by adopting a target image classification model based on the key characteristic vector and outputting the target categories of the images to be classified.

In a possible embodiment, the first processing module 701 is specifically configured to:

respectively extracting image characteristic vectors of a plurality of sub-image areas;

respectively generating corresponding region position vectors based on the positions of the plurality of sub-image regions in the image to be classified;

respectively fusing each image feature vector with the corresponding region position vector to respectively obtain respective alternative feature vectors of a plurality of sub-image regions;

and sequentially arranging corresponding candidate feature vectors based on the position relation among the plurality of sub-image regions represented by the region position vectors to generate a feature vector sequence.

In a possible embodiment, the first processing module 701 is specifically configured to, in each round of feature screening, perform the following operations:

based on each feature similarity, selecting the candidate feature vector with the feature similarity meeting the similarity condition from the candidate feature vectors currently contained in the obtained feature vector sequence, and updating the key feature vector.

In a possible embodiment, the first processing module 701 is specifically configured to, in at least one round of feature screening, perform the following operations:

respectively determining the feature similarity between each candidate feature vector currently contained in the target feature vector sequence and a target object contained in the image to be classified, selecting the candidate feature vectors with the feature similarity meeting the similarity condition from the candidate feature vectors currently contained in the target feature vector sequence, and updating the key feature vectors.

In a possible embodiment, the first processing module 701 is specifically configured to perform at least any one of the following manners:

performing convolution operation on all candidate feature vectors contained in the initial feature vector sequence based on a specified convolution kernel and a specified convolution step length, and generating a target feature vector sequence based on all candidate feature vectors obtained by the convolution operation;

and performing multilayer perception processing on each candidate feature vector contained in the initial feature vector sequence, fusing candidate feature vectors with feature similarity not meeting the similarity condition and candidate feature vectors with feature similarity meeting the similarity condition in the initial feature vector sequence by taking the number of the designated vectors as a target, and generating a target feature vector sequence based on each candidate feature vector obtained by fusion, wherein the number of the designated vectors is less than the number of each candidate feature vector contained in the initial feature vector sequence.

Based on the same inventive concept, the embodiment of the present application provides an image classification device, which can implement the corresponding functions of the foregoing image classification method. Referring to fig. 8, the apparatus includes an obtaining module 801 and a processing module 802, wherein:

the acquisition module 801: used for obtaining the image to be classified;

the processing module 802: the image classification method comprises the steps of inputting an image to be classified into a trained target image classification model to obtain a target classification output by the target image classification model;

the target image classification model is obtained by performing multiple rounds of iterative training on the image classification model to be trained based on each sample data, each round of training at least performs multiple rounds of feature screening on a sample feature vector sequence of a sample image contained in the sample data, and in at least one round of feature screening, the following operations are performed: and aiming at the sample feature vector sequence obtained after the previous round of feature screening, before the current round of feature screening, filtering the obtained sample feature vector sequence based on the sample feature similarity between each sample candidate feature vector currently contained in the obtained sample feature vector sequence and a sample object in the sample image.

In one possible embodiment, the processing module 802 is further configured to;

before inputting an image to be classified into a trained target image classification model and obtaining a target classification output by the target image classification model, obtaining sample data, wherein each sample data comprises a sample image and a sample class corresponding to the sample image;

In one possible embodiment, the processing module 802 is specifically configured to, during each training round:

aiming at a sample image contained in sample data, dividing the sample image into a plurality of sample sub-image areas by adopting an image classification model to be trained, respectively extracting sample candidate characteristic vectors of the plurality of sample sub-image areas, and generating a sample characteristic vector sequence;

performing multi-round feature screening on the sample feature vector sequence by adopting an image classification model to generate a sample key feature vector;

classifying the sample images by adopting an image classification model based on the key characteristic vectors of the samples, and outputting training classes of the sample images;

and determining the training loss of the image classification model based on the error between the training class and the sample class contained in the sample data.

In a possible embodiment, the processing module 802 is specifically configured to:

respectively extracting sample image feature vectors of a plurality of sample sub-image areas;

respectively fusing each sample image feature vector with a corresponding sample region position vector to respectively obtain sample alternative feature vectors of a plurality of sample sub-image regions;

and sequentially arranging corresponding sample candidate feature vectors based on the position relation among a plurality of sample sub-image regions characterized by the sample region position vectors to generate a sample feature vector sequence.

In one possible embodiment, the processing module 802 is specifically configured to, in each round of feature screening:

based on the feature similarity of each sample, selecting the sample candidate feature vector with the sample feature similarity meeting the sample similarity condition from the sample candidate feature vectors currently contained in the obtained sample feature vector sequence, and updating the key feature vector of the sample.

In one possible embodiment, the processing module 802 is specifically configured to, in at least one round of feature screening:

taking a sample feature vector sequence obtained after the previous round of feature screening as a sample initial feature vector sequence, and performing filtering processing on sample candidate feature vectors of which the sample feature similarity does not meet the sample similarity condition in the sample candidate feature vectors currently contained in the sample initial feature vector sequence to generate a sample target feature vector sequence, wherein the number of the sample candidate feature vectors contained in the sample target feature vector sequence is less than that of the sample candidate feature vectors contained in the sample initial feature vector sequence;

respectively determining sample candidate feature vectors currently contained in the sample target feature vector sequence and sample feature similarity between sample objects contained in the sample image, selecting the sample candidate feature vectors with the sample feature similarity meeting a sample similarity condition from the sample candidate feature vectors currently contained in the sample target feature vector sequence, and updating the sample key feature vectors.

In a possible embodiment, the processing module 802 is specifically configured to perform at least any one of the following:

and carrying out multilayer perception processing on each sample candidate feature vector contained in the sample initial feature vector sequence, fusing the sample candidate feature vectors with the sample feature similarity not meeting the sample similarity condition and the sample candidate feature vectors with the sample feature similarity meeting the sample similarity condition in the sample initial feature vector sequence by taking the number of the designated vectors as a target, and generating a sample target feature vector sequence based on each sample candidate feature vector obtained by fusion, wherein the number of the designated vectors is less than the number of each candidate feature vector contained in the initial feature vector sequence.

Referring to fig. 9, the image classification apparatus may be run on a computer device 900, and a current version and a historical version of a data storage program and application software corresponding to the data storage program may be installed on the computer device 900, where the computer device 900 includes a processor 980 and a memory 920. In some embodiments, the computer device 900 may include a display unit 940, the display unit 940 including a display panel 941 for displaying an interface for interaction by a user, and the like.

In one possible embodiment, the Display panel 941 may be configured in the form of a Liquid Crystal Display (LCD) or an Organic Light-Emitting Diode (OLED) or the like.

The processor 980 is configured to read the computer program and then execute a method defined by the computer program, for example, the processor 980 reads a data storage program or a file, etc., so as to run the data storage program on the computer device 900 and display a corresponding interface on the display unit 940. The Processor 980 may include one or more general-purpose processors and may further include one or more DSPs (Digital Signal processors) for performing relevant operations to implement the solutions provided by the embodiments of the present application.

Memory 920 typically includes both internal and external memory, which may be Random Access Memory (RAM), Read Only Memory (ROM), and CACHE memory (CACHE). The external memory can be a hard disk, an optical disk, a USB disk, a floppy disk or a tape drive. The memory 920 is used for storing a computer program including an application program corresponding to each client and other data, which may include data generated after an operating system or the application program is executed, including system data (e.g., configuration parameters of the operating system) and user data. Program instructions in the embodiments of the present application are stored in memory 920 and executed by processor 980 in memory 920 to implement any of the methods discussed in the previous figures.

The display unit 940 is used to receive input digital information, character information, or touch operation/non-touch gesture, and generate signal input related to user setting and function control of the computer apparatus 900, and the like. Specifically, in the embodiment of the present application, the display unit 940 may include a display panel 941. The display panel 941, for example, a touch screen, can collect touch operations by a user (for example, operations of the user on the display panel 941 or on the display panel 941 by using a finger, a stylus pen, or any other suitable object or attachment), and drive a corresponding connection device according to a preset program.

In one possible embodiment, the display panel 941 may include two portions of a touch detection device and a touch controller. The touch detection device detects the touch direction of a player, detects a signal brought by touch operation and transmits the signal to the touch controller; the touch controller receives touch information from the touch sensing device, converts the touch information into touch point coordinates, sends the touch point coordinates to the processor 980, and can receive and execute commands sent by the processor 980.

The display panel 941 may be implemented by a plurality of types, such as a resistive type, a capacitive type, an infrared ray, and a surface acoustic wave. In addition to the display unit 940, in some embodiments, the computer device 900 may also include an input unit 930, which input unit 930 may include an image input device 931 and other input devices 932, which may include, but are not limited to, one or more of a physical keyboard, function keys (such as volume control keys, switch keys, etc.), a trackball, a mouse, a joystick, and the like.

In addition to the above, the computer device 900 may also include a power supply 990 for powering the other modules, an audio circuit 960, a near field communication module 970, and an RF circuit 910. The computer device 900 may also include one or more sensors 950, such as acceleration sensors, light sensors, pressure sensors, and the like. The audio circuit 960 specifically includes a speaker 961 and a microphone 962, and the computer device 900 may collect the user's voice through the microphone 962, perform corresponding operations, and so on.

For one embodiment, the number of the processors 980 may be one or more, and the processors 980 and the memories 920 may be coupled or relatively independent.

As an example, the processor 980 in fig. 9 may be used to implement the functionality of the first processing module 801 and the second processing module 802 in fig. 8.

As an example, the processor 980 in fig. 9 may be configured to implement the corresponding functions of the server or the terminal device discussed above.

Those of ordinary skill in the art will understand that: all or part of the steps for implementing the method embodiments may be implemented by hardware related to program instructions, and the program may be stored in a computer readable storage medium, and when executed, the program performs the steps including the method embodiments; and the aforementioned storage medium includes: a mobile storage device, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

Alternatively, the integrated unit of the present invention may be stored in a computer-readable storage medium if it is implemented in the form of a software functional module and sold or used as a separate product. Based on this understanding, the technical solutions of the embodiments of the present invention may be embodied in the form of a software product, for example, a computer program product stored in a storage medium and including instructions for causing a computer device to perform all or part of the methods according to the embodiments of the present invention. And the aforementioned storage medium includes: a removable storage device, a ROM, a RAM, a magnetic or optical disk, or various other media that can store program code.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present application without departing from the spirit and scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is intended to include such modifications and variations as well.

Claims

1. An image classification method, comprising:

2. The method according to claim 1, wherein extracting candidate feature vectors of the plurality of sub-image regions respectively to generate a feature vector sequence comprises:

3. The method according to claim 1 or 2, wherein performing multiple rounds of feature screening on the feature vector sequence using the target image classification model to generate a key feature vector comprises:

in each round of feature screening, the following operations are performed:

4. The method according to claim 3, wherein based on each feature similarity, selecting a candidate feature vector whose feature similarity satisfies a similarity condition from among candidate feature vectors currently included in the obtained feature vector sequence, and updating the key feature vector, includes:

in at least one round of feature screening, performing the following operations:

taking a feature vector sequence obtained after the previous round of feature screening as an initial feature vector sequence, and filtering candidate feature vectors of which feature similarity does not meet the similarity condition in all candidate feature vectors currently contained in the initial feature vector sequence to generate a target feature vector sequence;

5. The method according to claim 4, wherein filtering candidate feature vectors whose feature similarity does not satisfy a similarity condition among candidate feature vectors currently included in the initial feature vector sequence to generate a target feature vector sequence comprises at least one of the following modes:

6. An image classification method, comprising:

obtaining an image to be classified;

7. An image classification apparatus, comprising:

8. An image classification apparatus, comprising:

an acquisition module: used for obtaining the image to be classified;

9. A computer program product comprising a computer program, characterized in that the computer program realizes the steps of the method according to claims 1-6 when executed by a processor.

10. A computer device, comprising:

a memory for storing program instructions;

a processor for calling the program instructions stored in the memory and executing the method according to any one of claims 1 to 6 according to the obtained program instructions.

11. A computer-readable storage medium having stored thereon computer-executable instructions for causing a computer to perform the method of any one of claims 1 to 6.