CN113704534A

CN113704534A - Image processing method and device and computer equipment

Info

Publication number: CN113704534A
Application number: CN202110398503.9A
Authority: CN
Inventors: 郭卉
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2021-04-13
Filing date: 2021-04-13
Publication date: 2021-11-26

Abstract

The embodiment of the application discloses an image processing method, an image processing device and computer equipment, and relates to the technical field of computers, wherein the method comprises the following steps: the method comprises the steps of obtaining an image to be retrieved, determining target classifications matched with the image to be retrieved from a plurality of classifications, wherein each classification in the plurality of classifications comprises one or more clusters, the one or more clusters are obtained by aggregating the images included in each classification, determining the target clusters matched with the image to be retrieved from the one or more clusters included in the target classifications, and then determining the image matched with the image to be retrieved from the images included in the target clusters. According to the method, the target classification is determined firstly, then the target clustering is determined, and then the similarity between the image to be retrieved and the image included in the target clustering is calculated, so that the matched image can be determined, and the similarity with all the images in the retrieval image library does not need to be calculated one by one, so that the calculation amount of image retrieval can be reduced, and the efficiency of image retrieval can be improved.

Description

Image processing method and device and computer equipment

Technical Field

The present application relates to the field of computer technologies, and in particular, to an image processing method and apparatus, and a computer device.

Background

With the continuous development of computer technology, images in computers are also increasing dramatically. In daily life, a user may have a need to find a similar image of a certain image from a large number of images in a computer. The act of a computer device retrieving an image from a large number of images that is similar to an image (which may be referred to as the image to be retrieved) may be referred to as image retrieval.

At present, image retrieval is mainly realized by comparing an image to be retrieved with images in a retrieval image library one by one. In this process, a large amount of calculation is consumed, resulting in inefficient image retrieval. How to improve the efficiency of image retrieval becomes a hot issue of research today.

Disclosure of Invention

The embodiment of the application provides an image processing method, an image processing device and computer equipment, which can reduce the calculation amount of image retrieval and improve the image retrieval efficiency.

In one aspect, an embodiment of the present application provides an image processing method, including:

acquiring an image to be retrieved;

determining a target classification matched with the image to be retrieved from a plurality of classifications, wherein each classification in the plurality of classifications comprises one or more clusters, and the one or more clusters are obtained by aggregating the images included in each classification;

determining a target cluster matched with the image to be retrieved from one or more clusters included in the target classification;

and determining images matched with the images to be retrieved from the images included in the target cluster.

In one aspect, an embodiment of the present invention provides an image processing apparatus, including:

the device comprises an acquisition unit, a retrieval unit and a retrieval unit, wherein the acquisition unit is used for acquiring an image to be retrieved;

a processing unit, configured to determine a target classification matching the image to be retrieved from a plurality of classifications, where each of the plurality of classifications includes one or more clusters, and the one or more clusters are obtained by aggregating images included in each classification;

the processing unit is further configured to determine a target cluster matched with the image to be retrieved from one or more clusters included in the target classification;

the processing unit is further configured to determine an image matched with the image to be retrieved from the images included in the target cluster.

In one aspect, an embodiment of the present invention provides a computer device, including: a processor, a memory, and a network interface;

the processor is connected with the memory and the network interface, wherein the network interface is used for providing a network communication function, the memory is used for storing program codes, and the processor is used for calling the program codes to execute the method in the embodiment of the application.

In one aspect, an embodiment of the present invention provides a computer-readable storage medium, where a computer program is stored, where the computer program includes program instructions, and when the program instructions are executed by a processor, the computer program executes the method in the embodiment of the present application.

In one aspect, embodiments of the present invention provide a computer program product or computer program comprising computer instructions stored in a computer-readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions to cause the computer device to perform the method in the embodiment of the present application.

In the embodiment of the application, an image to be retrieved can be obtained, a target classification matched with the image to be retrieved is determined from a plurality of classifications, each classification in the plurality of classifications comprises one or more clusters, the one or more clusters are obtained by aggregating images included in each classification, a target cluster matched with the image to be retrieved is determined from the one or more clusters included in the target classification, and then an image matched with the image to be retrieved is determined from images included in the target cluster. The matched images can be determined by determining the target classification, then determining the target cluster and then calculating the similarity between the images to be retrieved and the images included in the target cluster, and the similarity does not need to be calculated with all the images in the retrieval image library one by one, so that the calculation amount of image retrieval can be reduced, and the efficiency of image retrieval can be improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 is a schematic structural diagram of an image processing system according to an embodiment of the present application;

fig. 2 is a schematic flowchart of an image processing method according to an embodiment of the present application;

fig. 3a is a schematic diagram of a framework of an image processing method according to an embodiment of the present application;

FIG. 3b is a schematic diagram of a framework of another image processing method provided in the embodiment of the present application;

FIG. 4 is a schematic flowchart of another image processing method provided in the embodiments of the present application;

FIG. 5 is a schematic diagram of a sample image provided by an embodiment of the present application;

FIG. 6 is a schematic diagram of a classification and clustering process provided in an embodiment of the present application;

fig. 7 is a schematic structural diagram of an image processing apparatus according to an embodiment of the present application;

fig. 8 is a schematic structural diagram of a computer device according to an embodiment of the present application.

Detailed Description

In order to make the technical solutions of the present application better understood, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

Artificial Intelligence (AI) is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human Intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making.

The artificial intelligence technology is a comprehensive subject and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.

Computer Vision technology (CV) is a science for researching how to make a machine "see", and further refers to that a camera and a Computer are used to replace human eyes to perform machine Vision such as identification, tracking and measurement on a target, and further image processing is performed, so that the Computer processing becomes an image more suitable for human eyes to observe or is transmitted to an instrument to detect. As a scientific discipline, computer vision research-related theories and techniques attempt to build artificial intelligence systems that can capture information from images or multidimensional data. Computer vision technologies generally include image processing, image recognition, image semantic understanding, image retrieval, OCR, video processing, video semantic understanding, video content/behavior recognition, three-dimensional object reconstruction, 3D technologies, virtual reality, augmented reality, synchronous positioning, map construction, and other technologies, and also include common biometric technologies such as face recognition and fingerprint recognition.

Machine Learning (ML) is a multi-domain cross discipline, and relates to a plurality of disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and the like. The special research on how a computer simulates or realizes the learning behavior of human beings so as to acquire new knowledge or skills and reorganize the existing knowledge structure to continuously improve the performance of the computer. Machine learning is the core of artificial intelligence, is the fundamental approach for computers to have intelligence, and is applied to all fields of artificial intelligence. Machine learning and deep learning generally include techniques such as artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning, and formal education learning.

The scheme provided by the embodiment of the application relates to the technologies of artificial intelligence, such as computer vision, machine learning and deep learning, and is specifically explained by the following embodiments:

the embodiment of the application provides an image processing method, which comprises the following steps: the method comprises the steps of obtaining an image to be retrieved, determining a target classification matched with the image to be retrieved from a plurality of classifications, wherein each classification in the plurality of classifications comprises one or more clusters, the one or more clusters are obtained by aggregating the images included in each classification, determining the target cluster matched with the image to be retrieved from the one or more clusters included in the target classification, and determining the image matched with the image to be retrieved from the images included in the target cluster. The image processing method determines the target classification first, then determines the target clustering, and then determines the matched images by calculating the similarity between the images to be retrieved and the images included in the target clustering, without calculating the similarity with all the images in the retrieval image library one by one, so that the calculation amount of image retrieval can be reduced, and the efficiency of image retrieval can be improved.

In actual sample data, it is considered that the number of images of a common image type (may also be referred to as a head type) is large, and the number of images of other image types (may also be referred to as a long-tail type) is divided into long-tail categories, that is, the number of types of long-tail categories is large and the number of images of long-tail categories is small. For example, the number of images in the long-tailed category may be 1% or less of the number of images in the head category, and if all the images in the search image library are directly aggregated, the images in the long-tailed category and the images in the head category are easily separated into the same clustering bucket, which results in poor search effect in image search. According to the method and the device, each image in the retrieval image library is classified based on the semantics to obtain a plurality of classifications, and then the images included in each classification are subjected to aggregation processing, so that the images included in each classification are images with the same semantics, the problem that the images of the long-tail classification and the images of the head classification are divided into the same barrel can be effectively controlled, and the accuracy of image retrieval is improved.

Referring to fig. 1, fig. 1 is a schematic structural diagram of an image processing system according to an embodiment of the present disclosure. As shown in fig. 1, the image processing system may include a server 101 and one or more terminal devices 102, each terminal device 102 may be in network connection with the server 101, the network connection may include a wired connection or a wireless connection, so that each terminal device 102 may interact with the server 101 through the network connection, and the server 101 may receive data from each terminal device 102.

Each terminal device 102 may be configured to present service data to a user, where the service data may specifically include multimedia data such as images or videos, and each terminal device 102 may also be configured to store the service data, and may also be configured to transmit the service data to other computer devices, such as the server 101. The server 101 may interact with each terminal device 102 for service data, and may further process the service data, such as executing the image processing method described above.

In the embodiment of the application, taking the service data as an image as an example, a user may select an image to be retrieved on the terminal device 102, the terminal device 102 sends the image to be retrieved selected by the user to the server 101, and the server 101 retrieves an image matched with the image to be retrieved and displays the image to the user through the terminal device 102. The method specifically comprises the following steps: the server 101 runs an image classification model based on deep learning, and the image classification model can be used for extracting feature vectors of images and determining the classification of the images. The server 101 obtains the feature vector of the image to be retrieved by using the image classification model, determines the target classification matched with the image to be retrieved according to the feature vector of the image to be retrieved, determines the target cluster matched with the image to be retrieved from the clusters included in the target classification, and then determines the image matched with the image to be retrieved from the images included in the target cluster. The server 101 returns the image to the terminal device 102, and the terminal device 102 may present the image to the user and, in addition, the user may download the image from the terminal device 102. By determining the target classification first and then determining the target cluster, the server 101 can calculate the similarity between the image to be retrieved and the images included in the target cluster, and does not need to calculate the similarity with all the images in the retrieved image library one by one, which is beneficial to reducing the calculated amount of image retrieval and improving the image retrieval efficiency.

The terminal device 102 in the embodiment of the present application may include: the smart terminal device includes, but is not limited to, a smart phone, a tablet computer, a notebook computer, a desktop computer, a smart television, a smart speaker, a desktop computer, and a smart watch.

It is understood that the method provided by the embodiment of the present application can be executed by a computer device, including but not limited to the server 101 described above. The server 101 may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing basic cloud computing services such as a cloud service, a cloud database, cloud computing, a cloud function, cloud storage, a network service, cloud communication, a middleware service, a domain name service, a security service, a CDN, a big data and artificial intelligence platform, and the like.

Referring to fig. 2, fig. 2 is a schematic flowchart of an image processing method according to an embodiment of the present disclosure. The execution main body in this embodiment may be a computer device or a cluster formed by computer devices, and the computer device may be a terminal device or a server. The following describes an execution subject in the present embodiment as a server (such as the server 101 in the image processing system shown in fig. 1). The image processing method comprises the following steps:

s201, obtaining an image to be retrieved.

The image to be retrieved refers to an image to be queried. For the image to be retrieved, an image matching with the image to be retrieved needs to be queried, for example, the image matching with the image to be retrieved may be an image similar to the image to be retrieved. The server can obtain the image to be retrieved from the terminal equipment which stores the image to be retrieved. The image to be retrieved in the terminal device may be uploaded by a user or generated by the terminal device according to an application requirement.

S202, determining target classifications matched with the images to be retrieved from a plurality of classifications, wherein each classification in the plurality of classifications comprises one or more clusters, and the one or more clusters are obtained by aggregating the images included in each classification.

The classification refers to a classification of images, for example, if a plurality of classifications are human, animal, plant, or the like, each image may belong to one of the classifications. Each classification includes images that are aggregated into one or more clusters, e.g., an "animal" classification that includes multiple images, such as images including cats, dogs, etc., then images for cats may be aggregated into the same cluster and images for dogs may be aggregated into another cluster.

Specifically, the classification with the highest similarity among the plurality of classifications may be used as the target classification matched with the image to be retrieved. For example, the plurality of classifications includes classification 1, classification 2, and classification 3, and the similarity between the feature vector of the image to be retrieved and the classification center of classification 1 is 45%, the similarity between the feature vector of the image to be retrieved and the classification center of classification 2 is 95%, and the similarity between the feature vector of the image to be retrieved and the classification center of classification 3 is 86%, then classification 2 may be regarded as the target classification.

Optionally, one or more of the plurality of classifications with the similarity exceeding the first similarity threshold may be used as a target classification matched with the image to be retrieved. The first similarity threshold may be a numerical value set by the server according to the features of the image to be retrieved, or a numerical value customized by the user, where the setting mode of the first similarity threshold is not limited. For example, the plurality of classifications includes classification 1, classification 2, and classification 3, where the similarity between the feature vector of the image to be retrieved and the classification center of classification 1 is 45%, the similarity between the feature vector of the image to be retrieved and the classification center of classification 2 is 95%, the similarity between the feature vector of the image to be retrieved and the classification center of classification 3 is 86%, and the first similarity threshold is 80%, then both classification 2 and classification 3 may be classified as the target.

In an alternative embodiment, a specific implementation manner of determining the similarity between each classification and the image to be retrieved may be: calling an image classification model to obtain a feature vector of an image to be retrieved, obtaining a classification center of each classification in a plurality of classifications, determining the similarity between the feature vector of the image to be retrieved and the classification center of each classification, and taking the similarity as the similarity between the corresponding classification and the image to be retrieved. Wherein the image classification model may be used to determine a feature vector of the image. The similarity between the feature vector of the image to be retrieved and the classification center of each classification may be cosine (cosine) similarity, or may also be the minimum distance of the L2 norm (L2 norm), and the calculation method of the similarity is not limited herein.

Optionally, the image classification model may include a feature extraction module and a classification module, the feature extraction module may be configured to determine a feature vector of the image, and a parameter of the classification module includes a classification center of each of the plurality of classifications, and the classification center may be represented in a vector form. Correspondingly, calling an image classification model to obtain a feature vector of the image to be retrieved, and obtaining a classification center of each of the multiple classifications may be: and calling a feature extraction module of the image classification model to obtain a feature vector of the image to be retrieved, and obtaining a classification center of each of a plurality of classifications from a classification module of the image classification model.

In an alternative embodiment, the image classification model may also be used to determine the classification to which the image belongs, and in particular, the classification module of the image classification model may determine the classification to which the image belongs based on the feature vector of the image obtained by the feature extraction module. Correspondingly, determining the target classification matching the image to be retrieved from the plurality of classifications may be: and calling an image classification model to obtain the classification of the image to be retrieved, and taking the classification as a target classification matched with the image to be retrieved in a plurality of classifications. Specifically, the server obtains the feature vector of the image to be retrieved through the feature extraction module of the image classification model, and determines the classification of the image to be retrieved based on the feature vector of the image to be retrieved through the classification module of the image classification model.

Therefore, the target classification matched with the image to be retrieved is determined from the plurality of classifications, the image retrieval range can be greatly reduced, and the image with semantic similarity to the image to be retrieved can be retrieved, namely the image belonging to the same classification as the image to be retrieved is retrieved. For example, if the search image library includes 1 hundred million images, there are 5 target classifications matching the image to be searched, and the 5 target classifications include 1 ten thousand images in total, then it may be necessary to determine whether the number of images matching the image to be searched is reduced from 1 hundred million to 1 ten thousand, which reduces 4 sets of numbers, and greatly improves the efficiency of image search.

S203, determining a target cluster matched with the image to be retrieved from one or more clusters included in the target cluster.

Specifically, if the target classification includes one cluster, the server may take the cluster as a target cluster. If the target classification includes multiple clusters, the server may determine the similarity between each cluster and the image to be retrieved, and take the cluster with the highest similarity between the multiple clusters and the image to be retrieved as a matched target cluster. For example, the plurality of clusters included in the target classification includes cluster 1, cluster 2, and cluster 3, and the similarity between the feature vector of the image to be retrieved and the cluster center of cluster 1 is 91%, the similarity between the feature vector of the image to be retrieved and the cluster center of cluster 2 is 84%, and the similarity between the feature vector of the image to be retrieved and the cluster center of cluster 3 is 98%, then cluster 3 may be regarded as the target cluster.

Optionally, the server may take one or more clusters of the plurality of clusters with the similarity exceeding the second similarity threshold as the target cluster. The second similarity threshold may be a numerical value set by the server according to the features of the image to be retrieved, or a numerical value customized by the user, where the setting mode of the second similarity threshold is not limited. For example, the plurality of clusters include cluster 1, cluster 2, and cluster 3, where the similarity between the feature vector of the image to be retrieved and the cluster center of cluster 1 is 91%, the similarity between the feature vector of the image to be retrieved and the cluster center of cluster 2 is 84%, the similarity between the feature vector of the image to be retrieved and the cluster center of cluster 3 is 98%, and the second similarity threshold is 90%, then both cluster 1 and cluster 3 may be regarded as target clusters.

In an optional implementation manner, a specific implementation manner of determining the similarity between each cluster and the image to be retrieved may be: the method comprises the steps of obtaining a cluster center of each cluster in one or more clusters included in target classification, determining the similarity between a feature vector of an image to be retrieved and the cluster center of each cluster, and taking the similarity as the similarity between the corresponding cluster and the image to be retrieved, wherein the feature vector of the image to be retrieved is obtained by calling an image classification model. And the cluster center of each cluster in one or more clusters included in each classification is obtained by aggregating the images included in each classification, and the cluster center of each cluster can be represented in a vector form.

In addition, the manner of determining the similarity between the image to be retrieved and each of the classification centers in the plurality of classifications in step S202 may be the same as or different from the manner of determining the similarity between the image to be retrieved and the cluster center of each of the clusters included in the target classification in step S203. When the two are the same, the tool of the bucket search can be reused to determine the target classification and the target clustering.

S204, determining an image matched with the image to be retrieved from the images included in the target cluster.

Specifically, the similarity between the image to be retrieved and each image included in the target cluster may be determined, and the image with the highest similarity may be used as the image matched with the image to be retrieved. Or, one or more images with the similarity reaching a preset threshold may be regarded as images matched with the image to be retrieved, where the similarity reaching the preset threshold may refer to that the similarity is greater than the preset threshold, or may refer to that the similarity is greater than or equal to the preset threshold. In addition, the preset threshold is uniform for each classification, that is, the preset threshold is the same regardless of which classification the target cluster belongs to. Still alternatively, the top n (n is a positive integer) images with high similarity may be used as the images matching the image to be retrieved, for example, if the target cluster includes image 1, image 2, image 3, and image 4, the similarities between image 1, image 2, image 3, and image 4 and the image to be retrieved are 60%, 75%, 89%, and 95%, respectively, and n is 2, then both image 3 and image 4 are used as the images matching the image to be retrieved.

In an alternative embodiment, different classifications may correspond to different similarity thresholds. The method specifically comprises the following steps: and acquiring a similarity threshold corresponding to the target classification, determining the similarity between the image to be retrieved and each image included in the target cluster, and taking the image with the similarity reaching the similarity threshold as an image matched with the image to be retrieved. For each of the multiple classifications, a corresponding similarity threshold may be set in a customized manner according to business requirements, so that the retrieval result is more complete. And the similarity thresholds corresponding to different classifications may be the same or different. For example, in the case of a fierce fighting scene, the image changes too fast, the feature distribution of the image is relatively dispersed, and the similarity threshold corresponding to the classification to which the image belongs may be set relatively smaller. For the images of the human faces, the feature distribution is relatively concentrated, and the images of the same human face need to be retrieved, so that the similarity threshold corresponding to the classification to which the images belong can be set relatively larger, thereby avoiding retrieving the images of other human faces and being beneficial to improving the accuracy of image retrieval. In addition, the server can also verify the determined image matched with the image to be retrieved according to the similarity threshold value, and judge whether the image is matched with the image to be retrieved.

Referring to fig. 3a, fig. 3a is a schematic diagram of a framework of an image processing method provided in an embodiment of the present application, and it can be seen that, in the image processing method provided in the embodiment of the present application, a feature vector of an image to be retrieved is firstly classified and compared with a plurality of classified classification centers, a target classification (e.g., classification i corresponding to classification center i in fig. 3 a) matching the image to be retrieved is determined, then, a feature vector of the image to be retrieved is clustered and compared with a cluster center of each cluster included in the target classification i, a target cluster (e.g., cluster j corresponding to cluster center j in fig. 3 a) matching the image to be retrieved is determined, then, an image included in the target cluster is recalled, an image matching the image to be retrieved is determined from images included in the target cluster, and compared with a scheme of directly performing global clustering on images in a retrieved image library, and a feature vector of the image to be retrieved is compared with centers of each cluster during image retrieval The calculation amount of the image retrieval can be reduced, and the efficiency of the image retrieval is improved.

In summary, when the image to be retrieved is retrieved, the target classification matched with the image to be retrieved may be determined from the multiple classifications, then the target cluster matched with the image to be retrieved is determined from one or more clusters included in the target classification, and then the image matched with the image to be retrieved is determined from the images included in the target cluster. In the image processing method, the target classification is determined firstly, then the target clustering is determined, the matched images can be determined by calculating the similarity between the images to be retrieved and the images included in the target clustering, the similarity with all the images in the retrieved image library does not need to be calculated one by one, the calculated amount of image retrieval can be reduced, and the efficiency of image retrieval is improved.

In addition, in the above image processing method, the classification and the recall of the clusters are two-stage, that is, the server determines the target classification from the plurality of classifications and recalls the images included in the target classification, and then determines the target cluster from the clusters included in the target classification and recalls the images included in the target cluster. In addition to the two-stage recall implementation of image retrieval, image retrieval may also be implemented by one-stage recall. Specifically, after determining the clusters included in each category, the category center of each category is spliced with each cluster center included in the category, so as to obtain the category centers of a plurality of spliced categories. In the process of image retrieval, copying the characteristic vector of the image to be retrieved, splicing the characteristic vector of the image to be retrieved with the copied characteristic vector, then determining the similarity between the characteristic vector of the spliced image to be retrieved and the center of each class, determining a target class matched with the image to be retrieved according to the similarity, and then determining the image matched with the image to be retrieved from the images included in the target class. Wherein, the image included in each class refers to the image included in the corresponding cluster of the class. The mode can improve the efficiency of image retrieval.

For example, the total number of classifications is P, and the number of clusters included in the ith classification is C_iThen, can obtain

Each class corresponds to a class center. Wherein, the total classification amount refers to the number of preset classifications, P is a positive integer, and i is a positive integer less than or equal to P.

With reference to fig. 3b, fig. 3b is a schematic diagram of a framework of another image processing method according to an embodiment of the present application. In the image processing method, the characteristic vectors of the images to be retrieved are spliced by self, and the classification center of each classification is spliced with the clustering center of the cluster included in the classification. In fig. 3b, the total number of classifications is P, the total number of clusters is M, and the total number of clusters refers to the number of preset clusters. For example, if the class 1 includes the cluster 1 and the cluster 2, the class center 1 of the class 1 is spliced with the cluster center 1 of the cluster 1 to obtain the class center 1 of the class 1. And splicing the classification center 1 with the clustering center 2 of the clustering 2 to obtain the class center 2 of the class 2. In the process of image retrieval, the self-spliced feature vectors of the images to be retrieved are compared with the class centers, a target class (class j in fig. 3 b) matched with the images to be retrieved is determined, the images included in the target class are recalled, and then the images matched with the images to be retrieved are determined from the images included in the target class.

Referring to fig. 4, fig. 4 is a schematic flowchart of another image processing method according to an embodiment of the present disclosure. The image processing method is applicable to the image processing system shown in fig. 1, and is executed by a server. The image processing method comprises the following steps:

s401, a plurality of images included in the search image library are acquired.

Wherein, each image in the retrieval image library corresponds to an image Identity (ID).

S402, calling an image classification model to determine the classification of each image in the plurality of images from the plurality of classifications, and obtaining the images included in each classification.

In particular, the image classification model may be used to determine the classification to which the image belongs. Each image in the search image library can be input into the image classification model, and the obtained output result comprises the classification to which the image belongs, so that the classification to which each image in the search image library belongs can be obtained. Further, all the images and the corresponding classifications are sorted, and the images included in each classification can be obtained.

In an alternative embodiment, the image classification model is obtained by training an initial classification model. The image processing method further includes: the method comprises the steps of obtaining a sample data set, wherein the sample data set comprises a plurality of sample images and a reference class label of each sample image, the reference class label comprises one or more of a plurality of classes, then performing feature extraction processing on each sample image, determining a feature vector of each sample image, inputting the feature vector of each sample image into an initial classification model for processing to obtain a prediction class label of each sample image, and then training the initial classification model according to the prediction class label and the reference class label of each sample image to obtain an image classification model. The image classification model may be used for image recognition. The image recognition is recognition at a class level, and regardless of a specific instance of an object (image), only recognition performed by considering a class of the object (e.g., human, dog, cat, bird, etc.) is considered and a class to which the object belongs is given, for example, a recognition task in a large general object recognition source data set (ImageNet), so that which one of 1000 classes a certain object belongs to can be recognized. In the embodiment of the application, the classification to which the image belongs can be determined through an image classification model.

Optionally, the reference category label of each sample image in the sample data set may be a label labeled manually or a label labeled by a computer device. For example, a plurality of frames of images in the video may be used as sample images, a classification to which each frame of image belongs may be determined from a plurality of preset classifications (e.g., human face, human body, city wall, mountain top, etc.), and a reference category label corresponding to each frame of image may be labeled based on the classification. In addition, the number of classifications included in the reference category label for each sample image in the sample data set may be one or more. Specifically, if the sample image has multiple attributes, that is, the sample image can belong to multiple classifications, the reference class label of the sample image includes multiple classifications. The recognition task with sample images belonging to multiple classifications may be referred to as image multi-label recognition. Image multi-label recognition refers to recognizing whether an image has a combination of formulated labels through a computer device.

Optionally, the sample data set may include correct sample data and noise sample data. The reference class label of the sample image included in the correct sample data matches the class to which the sample image actually belongs. The noise sample data includes a reference class label for the sample image that does not match the classification to which the sample image actually belongs. The mismatch between the reference class label of the sample image included in the noise sample data and the classification to which the sample image actually belongs may be caused by a mistake of a labeling person or may be caused by a concept that a part of the overlap exists between different classifications. For example, there is a conceptual overlap between two classifications, "person" and "face", and for the sample image shown in fig. 5, the sample image has the attributes of two classifications, "person" and "face", and if the reference class label of the sample image only includes "person" or "face", the sample image and the reference class label of the sample image can be used as noise sample data. The recognition task using noisy sample data may be referred to as noisy recognition.

Alternatively, the network structure of the initial classification model may be an embedded (embedding) network based on deep learning. For example, the embedding network can be an embedding network of a ResNet-101-based feature network, and the ResNet-101 refers to a depth residual network of a 101 layer. The parameters related to the feature extraction module structure of ResNet-101 are shown in Table 1 below, and the feature extraction module structure of ResNet-101 includes convolutional layers Conv1-Conv5, and the parameters of convolutional layers Conv1-Conv5 are those with good feature pre-training. Such as parameters of the ImageNet pre-training model, or parameters derived for training the retrieved features. The ImageNet pre-training model is obtained by training the deep learning network model based on ImageNet. In fact, since the embedding network is a trained network, the parameters of the embedding network used in the embodiment of the present application do not need to be retrained, and the parameters shown in table 1 may be directly adopted. It should be noted that the initial image recognition model may also use other different network structures and different pre-training model weights as a basic model, which may be determined specifically according to an extraction scheme of feature vectors, and is not limited herein.

TABLE 1 ResNet-101 feature extraction Module Structure Table

In addition, a classification module (also referred to as a classification Layer) including a max pooling Layer (MaxPoolLayer), a Norm Layer (Norm Layer), and a Full connection Layer (Full connection Layer) may be constructed using ResNet-101 as shown in table 2 below. The server inputs the image into the above-mentioned embedding network to obtain the feature vector of the image. The feature vectors of the image are input into the pooling layer to be pooled, the dimensionality of image features can be reduced, overfitting is reduced through the amount of compressed data and parameters, and the largest element can be selected from a window through the largest pooling layer, so that data compression achieves a better effect. Then, the pooled feature vectors are input into a norm layer to be normalized based on the feature vectors of L2 norm, so that the similarity of the feature vectors of the images to a classification center and a clustering center can be conveniently determined subsequently, and the classification and the clustering of the images are determined.

Wherein, the full connection layer can be initialized by adopting a Gaussian distribution with the variance of 0.01 and the mean value of 0. In table 2, N is the number of learned categories (number of classes), and the parameters (weights) of the fully-connected layers include a matrix of N × 2048, where each vector of 1 × 2048 corresponds to a classified class vector. The parameters of the fully connected layer are also parameters to be trained in the embodiment of the present application, and can be set to a state needing learning. In the parameters of the full-link layer included in the image classification model obtained after training, each classification vector can be used as a classification center of the corresponding classification. In addition to the structure of the pooling layer, the norm layer and the full-link layer, the classification module may further include a plurality of convolution layers before the pooling layer or a plurality of full-link layers with activation functions in order to obtain a larger nonlinear effect in consideration of the imbedding network expression capability.

TABLE 2 Classification Module Structure Table based on ResNet-101

Layer name	Output size	Layer(s)
			Pool_cr	1x2048	Maximum pooling (Max Pool)
Norm	1x2048	Norm (Norm)
			Fc_cr	1xN	Full connection (Full connection)

Therefore, the server can input each sample image into the feature extraction module to perform feature extraction processing to obtain a feature vector of each sample image, then input the feature vector of each sample image into the classification module to perform forward calculation to obtain a classification prediction result, i.e., a prediction class label of each sample image, that is, a classification to which each sample image belongs, which is determined based on the classification module, can be obtained, and then train the initial classification model according to the prediction class label and the reference class label of each sample image to obtain the image classification model.

Optionally, training an initial classification model according to the prediction class label and the reference class label of each sample image to obtain an image classification model, which may include: obtaining a classification loss value (classification loss) of the model according to the prediction class label and the reference class label of each sample image, and adjusting model parameters of the initial classification model according to the classification loss value until the classification loss value meets a convergence condition to obtain an image classification model and a classification center of each of a plurality of classifications; wherein the classification center of each of the plurality of classifications is determined according to a model parameter of the image classification model. The convergence condition refers to a condition for stopping the model training, for example, the convergence condition may be that the classification loss value is less than or equal to a preset classification loss threshold. Wherein, adjusting the model parameters of the initial classification model according to the classification loss value may be: and carrying out gradient backward calculation on the classification loss value to obtain an updated value of the model parameter, and updating the table 2. After the initial classification model is trained based on each sample image, the parameters of the model may be updated. Correspondingly, the parameters of the model input by the current sample image are updated parameters after model training based on the previous sample image, and similarly, the parameters of the model input by the next sample image are updated parameters after model training based on the current sample image. The convergence condition may be that the classification loss value is less than or equal to a preset classification loss threshold.

Therefore, in the embodiment of the application, the classification model is directly added to the trained embedding network for training, the classification center of each classification can be determined, and the image classification model obtained through training can be used for determining the classification of the image. The classification bucket processing based on the classifications is added into the bucket-dividing retrieval mode based on the global clustering, so that the retrieval with semantic similarity can be realized, the retrieval efficiency is improved under the condition of not influencing the overall performance, the complexity of system modification and maintenance in subsequent image retrieval application can be reduced, and the application cost is reduced.

And S403, performing aggregation processing on the images included in each classification to obtain one or more clusters included in each classification.

Specifically, the images included in each category are aggregated, for example, the category "animal" may include images of cats, dogs, birds, etc., and all the images in the category "animal" are aggregated, so that one cluster to which the image about cats belongs, another cluster to which the image about dogs belongs, and another cluster to which the image about birds belongs can be obtained. All three clusters belong to the category of "animals".

In an optional implementation, performing an aggregation process on the images included in each classification to obtain one or more clusters included in each classification may include: the method comprises the steps of obtaining the number of images, the total clustering amount and the number of images included in each classification in a retrieval image library, determining the number of clusters included in each classification according to the number of images, the total clustering amount and the number of images included in each classification, and then aggregating the images included in each classification according to the number of clusters included in each classification and the feature vector of the images included in each classification to obtain one or more clusters included in each classification and the clustering center of each cluster, wherein the feature vector of the images included in each classification is obtained by calling an image classification model. For the images included in each classification, the cluster corresponding to the nearest cluster center is taken as the cluster to which the image belongs.

The total clustering amount refers to the number of clusters included in a plurality of preset classifications, that is, the number of clusters corresponding to all images included in the search image library. For example, the number of images included in the acquired search image library is N, the total number of clusters is M, and the number of images included in the ith classification is S_iThe number of clusters C included in the ith classification can be calculated by formula (1)_i。

N, M, S therein_i、C_iAre all positive integers, and i is a positive integer less than or equal to the total number of classifications. The total number of classifications is a preset number of classifications. It can be seen that each classification comprises at least one cluster. Then, for a classification comprising a smaller number of images, it is also guaranteed that one cluster is comprised. It should be noted that, besides formula (1), the number of clusters included in each category may also be calculated by other calculation methods, which is not limited herein.

Optionally, the aggregating of the images included in each classification may be: the images included in each classification are aggregated by a k-means clustering algorithm (k-means clustering algorithm). The k-means clustering algorithm is a clustering analysis algorithm for iterative solution. In the embodiment of the present application, k is the number of clusters included in each classification. For example, in the process of aggregating the images included in the ith classification by using a k-means clustering algorithm, k is the number C of clusters included in the ith classification_i. After the images included in all the classifications are clustered, the cluster serial numbers corresponding to the clusters can be recorded according to the classification sequence, including cluster 1, cluster 2. In the embodiment, the images in the retrieval sample library are decomposed into various categories, and then the images included in each category are clustered respectively, so that compared with a mode of directly carrying out global clustering on the images in the retrieval image library, the number of the images participating in clustering at each time is greatly reduced, and the method is favorable for overcoming the defect that a k-means clustering algorithm cannot be carried out under limited resourcesThe problem of clustering massive data or massive clustering centers is solved, and the clustering of large-scale data samples (sample images) under limited computer resources is effectively realized.

Referring to fig. 6, fig. 6 is a schematic diagram illustrating a classification and clustering process according to an embodiment of the present application. Specifically, an image classification model obtained based on deep learning training is called to perform semantic inference on each image in a database (namely, the search image library), so that a feature vector of each image in the search image library is obtained. And then calling an image classification model to perform deep learning semantic cluster division, namely determining the classification of each image, and then performing intra-cluster clustering, namely performing aggregation processing on the images included in each classification to determine the clusters included in each classification. For example, the images included in the category 2 in fig. 6 are clustered in the cluster 2, and a subclass 1 (cluster 1), a subclass 2 (cluster 2),.. or a subclass K (cluster K) is obtained.

Optionally, after determining the clusters included in each category, all clusters may be further subjected to cluster merging (e.g., cluster merging in fig. 6). Clustering centers with higher similarity (or the similarity exceeds a preset threshold) may exist among clustering centers of different clusters, and clusters corresponding to the clustering centers with higher similarity can be merged into one cluster, so that the redundancy of the clusters is reduced, and the data calculation amount of image retrieval is further reduced. The clusters corresponding to the cluster centers with higher similarity can belong to the same category or belong to different categories. The cluster center of the cluster obtained by clustering and merging the multiple clusters with higher similarity may be obtained by averaging the cluster centers of the multiple clusters, or may be the cluster center of any one of the multiple clusters, which is not limited herein. In addition, the images included in each of the plurality of clusters belong to the cluster obtained after the clusters are merged. For example, the category 1 includes a cluster 1, a cluster 2, and a cluster 3, where the similarity between the cluster center of the cluster 1 and the cluster center of the cluster 3 is high, the cluster 1 and the cluster 3 may be merged into one cluster, and the merged cluster is a cluster 4. Accordingly, the images included in cluster 1 and the images included in cluster 3 are both attributed to cluster 4.

S404, acquiring the image to be retrieved.

S405, determining a target classification matched with the image to be retrieved from a plurality of classifications, wherein each classification in the plurality of classifications comprises one or more clusters, and the one or more clusters are obtained by aggregating the images included in each classification.

S406, determining a target cluster matched with the image to be retrieved from one or more clusters included in the target cluster.

S407, determining an image matched with the image to be retrieved from the images included in the target cluster.

The relevant description of steps S404 to S407 can refer to the relevant contents in the image processing method shown in fig. 2, and will not be described in detail here.

In addition, the image processing method provided by the embodiment of the application can also be used for video retrieval. Specifically, a plurality of images are extracted from each video in the video retrieval library, an associated identifier exists between each video and the images included in the video, an image classification model is called to determine the classification of each image, and then the images included in each classification are aggregated to obtain a cluster corresponding to each classification. In video retrieval, a retrieval video is acquired, the retrieval video is split into a plurality of images to be retrieved, and the image processing method provided by the embodiment of the application is executed for each image to be retrieved, and the method comprises the following steps: and determining a target classification matched with the image to be retrieved, and determining a target cluster from clusters corresponding to the target classification. And recalling videos to which the images included in the target cluster belong respectively according to the associated identifiers, and then determining videos matched with the retrieval videos from the recalled videos. For example, the video matching the retrieved video may be a video having 10% or more identical/similar images with the retrieved video.

In summary, in the image processing method, the pre-classification subset data based on the deep learning semantic features is divided, that is, an image processing model based on deep learning training is invoked to determine the classification to which each image in the search image library belongs, so as to obtain the images included in each classification. Then, the images included in each classification are subjected to aggregation processing, and one or more clusters included in each classification are obtained. And then finding a target cluster corresponding to the image to be retrieved in the plurality of classifications, and realizing the retrieval recall of the barrel with semantic similarity. And determining an image matched with the image to be retrieved from the images included in the recalled target cluster. The matched images can be determined by calculating the similarity between the images to be retrieved and the images included in the target cluster, and the similarity does not need to be calculated with all the images in the retrieval image library one by one, so that the calculation amount of image retrieval can be reduced, and the efficiency of image retrieval is improved.

Referring to fig. 7, fig. 7 is a schematic structural diagram of an image processing apparatus according to an embodiment of the present disclosure. The image processing apparatus may be a computer program (including program code) running on a computer device, for example, the image processing apparatus is an application software; the apparatus may be used to perform the corresponding steps in the methods provided by the embodiments of the present application. The image processing apparatus 70 includes: an acquisition unit 701 and a processing unit 702. Wherein:

an obtaining unit 701, configured to obtain an image to be retrieved;

a processing unit 702, configured to determine a target classification matching the image to be retrieved from multiple classifications, where each of the multiple classifications includes one or more clusters, and the one or more clusters are obtained by aggregating images included in each classification;

the processing unit 702 is further configured to determine, from one or more clusters included in the target classification, a target cluster matching the image to be retrieved;

the processing unit 702 is further configured to determine an image matching the image to be retrieved from the images included in the target cluster.

In an alternative embodiment, before performing the step of determining the target classification matching the image to be retrieved from the plurality of classifications, the processing unit 702 further performs the following steps:

acquiring a plurality of images included in a retrieval image library; calling an image classification model to determine the classification of each image in the plurality of images from a plurality of classifications to obtain the image included in each classification; and performing aggregation processing on the images included in each classification to obtain one or more clusters included in each classification.

In an optional implementation manner, when the processing unit 702 performs an aggregation process on the images included in each of the classifications to obtain one or more clusters included in each of the classifications, the following steps are performed:

acquiring the number of images included in the retrieval image library, the total clustering amount and the number of images included in each classification; determining the number of clusters included in each classification according to the number of images included in the retrieval image library, the total number of clusters and the number of images included in each classification;

and according to the number of the clusters included in each class and the feature vector of the image included in each class, aggregating the images included in each class to obtain one or more clusters included in each class and the cluster center of each cluster, wherein the feature vector of the image included in each class is obtained by calling the image classification model.

In an alternative embodiment, the processing unit 702, when determining an image matching the image to be retrieved from the images included in the target cluster, performs the following steps:

obtaining a similarity threshold corresponding to the target classification; determining the similarity between the image to be retrieved and each image included in the target cluster; and taking the image with the similarity reaching the similarity threshold as the image matched with the image to be retrieved.

In an alternative embodiment, the processing unit 702, when determining the target classification matching the image to be retrieved from a plurality of classifications, performs the following steps:

calling an image classification model to obtain a feature vector of the image to be retrieved; obtaining a classification center of each of a plurality of classifications; determining the similarity between the feature vector of the image to be retrieved and the classification center of each classification; and determining a target classification matched with the image to be retrieved from the plurality of classifications according to the similarity.

In an alternative embodiment, the processing unit 702, when determining the target cluster matching the image to be retrieved from the one or more clusters included in the target classification, performs the following steps:

acquiring a cluster center of each cluster in one or more clusters included in the target classification; determining the similarity between the feature vector of the image to be retrieved and the clustering center of each cluster, wherein the feature vector of the image to be retrieved is obtained by calling the image classification model; and determining a target cluster matched with the image to be retrieved from one or more clusters included in the target cluster according to the similarity.

In an optional implementation manner, the obtaining unit 701 is further configured to obtain a sample data set, where the sample data set includes a plurality of sample images and a reference category label of each sample image, and the reference category label includes one or more of the plurality of categories;

the processing unit 702 is further configured to perform the following steps:

performing feature extraction processing on each sample image, and determining a feature vector of each sample image; inputting the feature vector of each sample image into an initial classification model for processing to obtain a prediction category label of each sample image; and training the initial classification model according to the prediction class label and the reference class label of each sample image to obtain an image classification model.

In an alternative embodiment, when the initial classification model is trained according to the prediction class label and the reference class label of each sample image, the processing unit 702 performs the following steps:

obtaining a classification loss value according to the prediction class label and the reference class label of each sample image; adjusting the model parameters of the initial classification model according to the classification loss values until the classification loss values meet a convergence condition, so as to obtain an image classification model and a classification center of each of the plurality of classifications; wherein the classification center of each of the plurality of classifications is determined from model parameters of the image classification model.

For the device embodiments, since they are substantially similar to the method embodiments, reference may be made to some of the descriptions of the method embodiments for relevant points.

Referring to fig. 8 and fig. 8, a schematic structural diagram of a computer device 80 according to an embodiment of the present application may include a processor 801, a memory 802, a network interface 803, and at least one communication bus 804. The processor 801 is used for scheduling computer programs, and may include a central processing unit, a controller, and a microprocessor; the memory 802 is used to store computer programs and may include high speed random access memory, non-volatile memory, such as magnetic disk storage devices, flash memory devices; a network interface 803 provides data communication functions, and a communication bus 804 is responsible for connecting the various communication elements.

Among other things, the processor 801 may be configured to invoke a computer program in memory to perform the following operations:

acquiring an image to be retrieved; determining a target classification matched with the image to be retrieved from a plurality of classifications, wherein each classification in the plurality of classifications comprises one or more clusters, and the one or more clusters are obtained by aggregating the images included in each classification; determining a target cluster matched with the image to be retrieved from one or more clusters included in the target classification; and determining images matched with the images to be retrieved from the images included in the target cluster.

In an alternative embodiment, the processor 801 is further configured to, before determining the target classification matching the image to be retrieved from the plurality of classifications: acquiring a plurality of images included in a retrieval image library; calling an image classification model to determine the classification of each image in the plurality of images from a plurality of classifications to obtain the image included in each classification; and performing aggregation processing on the images included in each classification to obtain one or more clusters included in each classification.

In an optional implementation manner, when the processor 801 performs aggregation processing on the images included in each classification to obtain one or more clusters included in each classification, the processor is specifically configured to perform:

In an alternative embodiment, when determining an image matching the image to be retrieved from the images included in the target cluster, the processor 801 is specifically configured to:

In an alternative embodiment, when determining the target classification matching the image to be retrieved from the plurality of classifications, the processor 801 is specifically configured to perform:

In an alternative embodiment, the processor 801 is specifically configured to, when determining, from one or more clusters included in the target classification, a target cluster matching the image to be retrieved, perform:

In an alternative embodiment, the processor 801 is further configured to: obtaining a sample data set, wherein the sample data set comprises a plurality of sample images and a reference category label of each sample image, and the reference category label comprises one or more categories of the plurality of categories; performing feature extraction processing on each sample image, and determining a feature vector of each sample image; inputting the feature vector of each sample image into an initial classification model for processing to obtain a prediction category label of each sample image; and training the initial classification model according to the prediction class label and the reference class label of each sample image to obtain an image classification model.

In an alternative embodiment, the processor 801 is specifically configured to, when training the initial classification model according to the prediction class label and the reference class label of each sample image to obtain an image classification model, perform:

It should be understood that the computer device described in the embodiment of the present application may implement the description of the image processing method in the embodiment, and may also perform the description of the image processing apparatus according to the embodiment, which is not described herein again. In addition, the description of the beneficial effects of the same method is not repeated herein.

In addition, it should be further noted that, in this embodiment of the present application, a computer-readable storage medium is further provided, where a computer program of the foregoing image processing method is stored in the computer-readable storage medium, where the computer program includes program instructions, and when the program instructions are loaded and executed by one or more processors, the description of the image processing method in the embodiment may be implemented, which is not described herein again, and beneficial effects of the same method are also described herein without being described herein again. It will be understood that the program instructions may be deployed to be executed on one computer device or on multiple computer devices that are capable of communicating with each other.

Embodiments of the present application also provide a computer program product or computer program comprising computer instructions stored in a computer-readable storage medium. The computer instructions are read by a processor of a computer device from a computer-readable storage medium, and the computer instructions are executed by the processor to cause the computer device to perform the steps performed in the embodiments of the methods described above.

Finally, it should be further noted that the terms in the description and claims of the present application and the above-described drawings, such as first and second, etc., are merely used to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or terminal that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or terminal. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or terminal that comprises the element.

While the invention has been described with reference to specific embodiments, the invention is not limited thereto, and various equivalent modifications and substitutions can be easily made by those skilled in the art within the technical scope of the invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. An image processing method, characterized in that the method comprises:

acquiring an image to be retrieved;

2. The method of claim 1, wherein prior to determining a target classification from the plurality of classifications that matches the image to be retrieved, the method further comprises:

acquiring a plurality of images included in a retrieval image library;

calling an image classification model to determine the classification of each image in the plurality of images from a plurality of classifications to obtain the image included in each classification;

and performing aggregation processing on the images included in each classification to obtain one or more clusters included in each classification.

3. The method according to claim 2, wherein the aggregating the images included in each of the classifications to obtain one or more clusters included in each of the classifications comprises:

acquiring the number of images included in the retrieval image library, the total clustering amount and the number of images included in each classification;

determining the number of clusters included in each classification according to the number of images included in the retrieval image library, the total number of clusters and the number of images included in each classification;

4. The method according to claim 1, wherein the determining an image matching the image to be retrieved from the images included in the target cluster comprises:

obtaining a similarity threshold corresponding to the target classification;

determining the similarity between the image to be retrieved and each image included in the target cluster;

and taking the image with the similarity reaching the similarity threshold as the image matched with the image to be retrieved.

5. The method according to any one of claims 1 to 4, wherein the determining a target classification matching the image to be retrieved from a plurality of classifications comprises:

calling an image classification model to obtain a feature vector of the image to be retrieved;

obtaining a classification center of each of a plurality of classifications;

determining the similarity between the feature vector of the image to be retrieved and the classification center of each classification;

and determining a target classification matched with the image to be retrieved from the plurality of classifications according to the similarity.

6. The method according to any one of claims 1 to 4, wherein the determining a target cluster matching the image to be retrieved from one or more clusters included in the target classification comprises:

acquiring a cluster center of each cluster in one or more clusters included in the target classification;

determining the similarity between the feature vector of the image to be retrieved and the clustering center of each cluster, wherein the feature vector of the image to be retrieved is obtained by calling the image classification model;

and determining a target cluster matched with the image to be retrieved from one or more clusters included in the target cluster according to the similarity.

7. The method according to any one of claims 2 to 4, further comprising:

obtaining a sample data set, wherein the sample data set comprises a plurality of sample images and a reference category label of each sample image, and the reference category label comprises one or more categories of the plurality of categories;

performing feature extraction processing on each sample image, and determining a feature vector of each sample image;

inputting the feature vector of each sample image into an initial classification model for processing to obtain a prediction category label of each sample image;

and training the initial classification model according to the prediction class label and the reference class label of each sample image to obtain the image classification model.

8. The method of claim 7, wherein the training the initial classification model according to the prediction class label and the reference class label of each sample image to obtain the image classification model comprises:

obtaining a classification loss value according to the prediction class label and the reference class label of each sample image;

adjusting model parameters of the initial classification model according to the classification loss values until the classification loss values meet a convergence condition, so as to obtain the image classification model and a classification center of each of the plurality of classifications; wherein the classification center of each of the plurality of classifications is determined from model parameters of the image classification model.

9. An image processing apparatus, characterized in that the apparatus comprises:

10. A computer device, comprising: a processor, a memory, and a network interface;

the processor is connected with the memory and the network interface, wherein the network interface is used for providing a network communication function, the memory is used for storing program codes, and the processor is used for calling the program codes to execute the method of any one of claims 1 to 8.