CN111046209A

CN111046209A - Image clustering retrieval system

Info

Publication number: CN111046209A
Application number: CN201911249152.4A
Authority: CN
Inventors: 张峰; 李淼; 赵婷
Original assignee: Institute of Automation of Chinese Academy of Science
Current assignee: Institute of Automation of Chinese Academy of Science
Priority date: 2019-12-09
Filing date: 2019-12-09
Publication date: 2020-04-21
Anticipated expiration: 2039-12-09
Also published as: CN111046209B

Abstract

The invention belongs to the field of image processing and software, and particularly relates to an image clustering retrieval system, aiming at solving the problems of low clustering accuracy and high retrieval difficulty according to a time axis caused by randomness and uncertainty of image shooting. The system comprises an image retrieval module, a character retrieval module, a browsing information retrieval module and a database; the database is configured to acquire and store a plurality of images in a specific activity, and acquire a cluster image set of each target person and color labels and characteristic vectors of set parts through unsupervised clustering; the image retrieval module is configured to match the input person image with a clustering image set of the database; the character retrieval module is configured to obtain a matched cluster image set in the database based on the input color information; and the browsing information retrieval module is configured to acquire a matched cluster image set in the database according to the image with the attention degree larger than the set threshold. According to the invention, the clustering accuracy is improved through unsupervised clustering, and the retrieval difficulty is reduced through different retrieval modes.

Description

Image clustering retrieval system

Technical Field

The invention belongs to the field of image processing and software, and particularly relates to an image clustering retrieval system.

Background

With the rapid development of network and multimedia technologies, digital information including sound, graphics, images, video, and animation is expanding rapidly. The image is a media information with rich content and visual expression, and is concerned by people. In real life, a large amount of images are generated at all times, and how to find out images meeting the requirements of users from the image information is a problem to be solved by researchers. For example, from the perspective of a ski enthusiast, it is desirable to preserve the highlights of the skiing process, and therefore many snow farms also provide conditions for the skier to take a picture, record the highlights of the snow track, and upload to the skiing software where the skier can download them as desired. However, as the number of people in snowfield increases sharply, the number of uploaded images increases sharply, and the software on the market only provides related search according to the approximate time axis, which greatly increases the difficulty of the skiing software user in searching for the images of the user, and even causes missing of the wanted images due to too fast browsing speed. Therefore, an image clustering retrieval system is urgently needed, which can effectively cluster and screen all images and feed back the images to users.

The ski field image has many problems, and can bring great influence to image clustering. Such as: the target person in the field of view is small due to the problems of the angle and the distance of shooting; illumination change in one day in snow season has great influence on the image shooting effect; the occlusion of the snow cap and the snow goggles makes the face identification difficult, and other features need to be extracted for clustering; the uncertainty of the number of skiers results in the inability to perform fixed category number clustering on all images, etc. The key point of the cluster retrieval problem of the ski field images is to extract effective characteristics of skiers and feed back proper images to users according to the requirements or operation habits of the users. Therefore, the invention provides an image clustering retrieval system.

Disclosure of Invention

In order to solve the problems in the prior art, namely the problems that the clustering accuracy is low due to the randomness and uncertainty of image shooting in specific activities and the retrieval difficulty is large due to the image retrieval carried out according to the time axis, the invention provides an image clustering retrieval system, which comprises one or more clients and a server; the client is connected with the server and comprises an image retrieval module and/or a character retrieval module and/or a browsing information retrieval module; the server comprises a database;

the database is configured to acquire and store a plurality of images of a plurality of target persons in a specific activity, perform image clustering on the target persons by an unsupervised clustering method, acquire a cluster image set of each target person, and acquire a color label of a set part corresponding to each target person in each cluster image set and a feature vector corresponding to each target person;

the image retrieval module is configured to match the input person image with the clustered image set of each target person in the database to obtain a clustered image with the maximum matching degree;

the character retrieval module is configured to acquire a cluster image set of target characters matched with the character retrieval module in the database based on the input color information of the set part;

the browsing information retrieval module is configured to obtain the attention degree of the image browsed by the user according to a set attention degree calculation rule, and obtain the cluster image set of the target person matched with the attention degree in the database according to the image of which the attention degree is greater than a set threshold value.

In some preferred embodiments, "clustering images of the target person by an unsupervised clustering method" is performed by:

respectively extracting the feature vectors of a plurality of images of a plurality of target characters, and performing dimensionality reduction processing to obtain low-dimensional feature vectors corresponding to the images;

acquiring the clustering centers of the feature vectors of the images of each category in the database, calculating the distance between each low-dimensional feature vector and each clustering center, if the distance is smaller than a preset distance threshold, judging the images of the same category, and updating the value of the clustering center corresponding to the images of the category; otherwise, the value of the low-dimensional feature vector is used as a new clustering center.

In some preferred embodiments, the method of "extracting feature vectors of a plurality of images of a plurality of target persons, respectively, and performing dimension reduction processing" includes:

extracting the feature vectors of a plurality of images of a plurality of target persons based on a convolutional neural network, and performing dimension reduction processing on the feature vectors of the images through an autoencoder.

and obtaining color histograms of all parts of the target person in the multiple images of the multiple target persons based on the convolutional neural network, and performing dimension reduction processing on the color histograms through a Principal Component Analysis (PCA) algorithm.

In some preferred embodiments, the method for obtaining the color label of the set part corresponding to the target person and the feature vector corresponding to the target person in each cluster image set includes:

acquiring a color histogram of a set part corresponding to a target person in each cluster image set based on a convolutional neural network; obtaining a color label according to the set part and the corresponding color histogram;

and acquiring the clustering centers of all the images corresponding to the target person in the database as the characteristic vectors corresponding to the target person.

In some preferred embodiments, the text retrieval module "acquires a cluster image set of target people matching the input color information of the set part in the database based on the input color information of the set part", and the method includes:

acquiring color information of a set part of an image to be retrieved by a user in a text or voice input mode;

and acquiring a cluster image set of the target person matched with the color information in the database according to the color information.

In some preferred embodiments, the browsing information retrieval module "obtains the attention degree of the user browsing images according to a set attention degree calculation rule" includes:

and acquiring the attention degree of the user to browse the images according to the time length of the user to browse the images and/or the frequency of clicking the browsed images and/or the images downloaded, concerned and liked.

The invention has the beneficial effects that:

according to the invention, the accuracy of clustering is improved through unsupervised clustering, and the retrieval difficulty is reduced through different retrieval modes. The method extracts the characteristic vectors of the images in the image set through two modes of the convolutional neural network plus the self-encoder and the convolutional neural network plus the principal component analysis algorithm and performs dimensionality reduction, obtains the categories and color labels of the images and the characteristic vectors corresponding to the target characters through unsupervised clustering, overcomes the defects caused by a single clustering mode, and improves the clustering accuracy. When the retrieval is carried out, three methods of image retrieval, character retrieval and browsing information retrieval are provided according to the requirements and operation habits of the user, the retrieval difficulty is reduced, and the experience satisfaction of the user is improved.

Drawings

Other features, objects and advantages of the present application will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings.

FIG. 1 is a schematic diagram of a system architecture of an image clustering retrieval system according to an embodiment of the present invention;

FIG. 2 is a schematic flow chart of a first unsupervised clustering method according to an embodiment of the present invention;

FIG. 3 is a flow chart of a second unsupervised clustering method according to an embodiment of the present invention;

fig. 4 is a schematic diagram of a retrieval process of the image clustering retrieval system according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is apparent that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The present application will be described in further detail with reference to the following drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant invention and not restrictive of the invention. It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings.

It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict.

The image clustering retrieval system of the invention, as shown in fig. 1, comprises one or more clients, a server; the client is connected with the server and comprises an image retrieval module and/or a character retrieval module and/or a browsing information retrieval module; the server comprises a database:

In order to more clearly explain the image clustering retrieval system of the present invention, the following will expand the detailed description of the steps in an embodiment of the method of the present invention with reference to the drawings.

1. The database is configured to acquire and store a plurality of images of a plurality of target persons in a specific activity, perform image clustering on the target persons by an unsupervised clustering method, acquire a cluster image set of each target person, and acquire color tags of set parts corresponding to the target persons in each cluster image set and feature vectors corresponding to the target persons.

The invention mainly aims at clustering images in specific activities, preferably takes the images of a ski field as a clustering retrieval object in the embodiment, and adopts an unsupervised clustering mode for the clustering of the images. Due to randomness and uncertainty of skiers in the shot images, the features of each skier cannot be extracted in advance for image matching, and therefore unsupervised clustering can be performed only on the images. When all the images corresponding to the same person are found, the feature vector corresponding to a certain target person is obtained, and meanwhile, the images of the target person can be labeled according to needs, such as 'red snow dresses, black snow trousers, blue snow boots and the like', so that the color label corresponding to the target person, namely the color label of a certain type of images, is obtained.

The embodiment provides two unsupervised clustering methods, which can be roughly classified into two steps: firstly, performing dimensionality reduction processing on an image to be clustered to extract the characteristics of the lower dimensionality of the image; secondly, clustering the extracted features by using a clustering algorithm. In the following, two unsupervised clustering methods are described.

The first unsupervised clustering method: as shown in fig. 2, a neural network trained in a large classified data set, such as VGG16, ResNet, incleption net, etc., is used to input the ski images to be clustered (images in the database) into a pre-trained neural network model, and the output result of the last convolution layer of the network is extracted as the feature vector of the image, so that the input image is subjected to dimension reduction processing, such as reducing the RGB image input as 224 × 3 to 1024 dimensions.

And training an autoencoder by using the feature vector after the dimension reduction of the image so as to further extract the feature vector of the image with lower dimension. The self-encoder belongs to unsupervised training and consists of an encoding part and a decoding part. The encoding part compresses the input into a feature vector with lower dimensionality, the decoding part is opposite, and the compressed feature vector with lower dimensionality is restored into original input data through a neuron layer which is symmetrical to the encoding part. Thus, it can be considered that the lower dimensional feature vectors resulting from the encoding portion of the encoder can better represent the original 224 × 3 RGB image.

Then, aiming at the low-dimensional characteristic vector obtained from each image, the distance between vectors is calculated, and the distance can be Euclidean distance, Manhattan distance, included angle cosine and the like. And comparing the distance between the feature vectors of different images with a preset distance threshold value to determine whether the group of images are the same skier. When certain images are determined to be the same skier, the cluster centers of the feature vectors of these images may be determined. When a new image is generated, the image is firstly input into a feature extraction network for feature extraction, the extracted features are compared with the obtained clustering centers, if the difference between the comparison result and a certain clustering center is less than a set threshold value, the image is allocated to the category of the clustering center, and the value of the clustering center is updated. On the contrary, if the difference between the feature of the image and any one cluster center is far larger than the set threshold, the image is considered to belong to a new category, and the feature value of the image is taken as a new cluster center. And the like until all the images obtain clustering results. In fig. 2 and 3, Y represents that the comparison difference between the extracted feature and the cluster center is smaller than a set threshold, and N represents that the comparison difference between the extracted feature and the cluster center is larger than the set threshold.

The second unsupervised clustering method comprises the following steps: as shown in fig. 3, the skier in the image (the image in the database) is subjected to human body analysis using the trained neural network to obtain the parts such as the head, the upper body, the legs, and the feet, and then the color histogram of each part of the skier in each image is counted. The color histogram obtained in this way has high dimensionality, and Principal Component Analysis (PCA) algorithm can be adopted to reduce dimensionality of histogram data and reduce the amount of computation for subsequent classification processing. The histogram data with reduced dimensions can also be regarded as a low-dimensional feature vector of the original image, and the subsequent clustering process is the same as the processing process in the first unsupervised clustering method. Moreover, all successfully classified images can be labeled according to the colors of all parts of the body, such as 'red snow dress, black snow trousers, blue snow boots' and the like. In the invention, the human body analysis preferably performs sample-level segmentation on the human body through a deep neural network, or uses a traditional computer vision mode, and in other embodiments, other human body analysis methods can be selected.

And carrying out image clustering on the target characters by the two unsupervised clustering methods to obtain a clustering image set of each target character, and obtaining the color labels of the set parts corresponding to the target characters and the characteristic vectors corresponding to the target characters in each clustering image set. And the characteristic vector corresponding to the target person is the clustering center corresponding to all the images of the target person in the database.

After all the images are gathered and stored in the database, the user can extract the images according to the requirement, and the system can also recommend some interested images for the user according to the operation habits of the user. Therefore, when a user performs a search, the system is divided into an image search, a text search, and a browsing information search, as shown in fig. 4. The general retrieval module is carried out at a client, and the client can be a computer terminal, a mobile terminal of a mobile phone, a portable networking device and the like.

2. And the image retrieval module is configured to match the input person image with the clustered image set of each target person in the database to obtain a clustered image with the maximum matching degree.

In this embodiment, the user may upload images of the skiing equipment, and the server extracts feature vectors from the images uploaded by the user according to an algorithm for extracting feature vectors of the images during image clustering, compares the feature vectors with the clustering centers, finds the closest clustering center, and feeds back all the images belonging to the clustering center to the user.

The image retrieval module has many application scenes in real life, for example, in a marathon activity or other activities, an athlete can stand in front of a cabinet-type large screen with a camera shooting function, and the image shot by a target person in the activity is displayed on the large screen by shooting the image of the current target person. Or the entrance guard setting matched with the image retrieval module is set at the entrance and exit of the rest room of the specific activity. Through camera or equipment of shooing, catch the image of business turn over personnel, whether control entrance guard allows, improve the security of activity.

3. And the character retrieval module is configured to acquire a cluster image set of the target person matched with the character retrieval module in the database based on the input color information of the set part.

In this embodiment, the color information of the set portion of the image to be retrieved by the user is obtained by inputting text or voice, and other input methods may be preferred as long as the final result can be converted into text information.

And acquiring a cluster image set of the target person matched with the color information in the database according to the color information. For example: the user can manually input characteristic characters to search for target images such as 'white snow clothes, black snow trousers, black snow boots' and the like, the server matches the input characters with color labels in the database, and all images with the highest matching degree are fed back to the user.

The text retrieval module has many application scenes in real life, for example, in a competitive activity or other activities, when a host or audience shouts the name or the number of a certain player or each part wears, a large screen on the spot quickly acquires a competition photo of the certain player and displays the competition photo on the large screen, so that the atmosphere on the spot and the interactivity of participants are increased.

4. The browsing information retrieval module is configured to obtain the attention degree of the image browsed by the user according to a set attention degree calculation rule, and obtain the cluster image set of the target person matched with the attention degree in the database according to the image of which the attention degree is greater than a set threshold value.

In this embodiment, according to the browsing click condition of the user, the feature center of the image concerned by the user is extracted, and then all photos of the cluster center closest to the feature center in the database are fed back to the user. Setting the calculation rule of the attention degree as follows: (1) recording the time of each photo browsed by a user, extracting features from the photos with long browsing time of the user, and finding a feature center; (2) recording the clicking operation of the user, extracting features of the photos focused by the user, and finding a feature center; (3) and (4) carrying out feature extraction on the photos of user downloading, attention, praise and the like to find a feature center.

It should be noted that, the image clustering and retrieving system provided in the foregoing embodiment is only illustrated by the division of the functional modules, and in practical applications, the above functions may be distributed by different functional modules according to needs, that is, the modules or steps in the embodiment of the present invention are further decomposed or combined, for example, the modules in the foregoing embodiment may be combined into one module, or may be further split into multiple sub-modules, so as to complete all or part of the functions described above. The names of the modules and steps involved in the embodiments of the present invention are only for distinguishing the modules or steps, and are not to be construed as unduly limiting the present invention.

The terms "comprises," "comprising," or any other similar term are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.

So far, the technical solutions of the present invention have been described in connection with the preferred embodiments shown in the drawings, but it is easily understood by those skilled in the art that the scope of the present invention is obviously not limited to these specific embodiments. Equivalent changes or substitutions of related technical features can be made by those skilled in the art without departing from the principle of the invention, and the technical scheme after the changes or substitutions can fall into the protection scope of the invention.

Claims

1. An image clustering retrieval system comprises one or more clients and a server; the client is connected with the server, and is characterized in that in the classified retrieval system, the client comprises an image retrieval module and/or a character retrieval module and/or a browsing information retrieval module; the server comprises a database;

2. The image clustering retrieval system according to claim 1, wherein "clustering images of the target person by an unsupervised clustering method" is performed by:

3. The image clustering retrieval system according to claim 2, wherein the method of extracting the feature vectors of the plurality of images of the plurality of target persons, respectively, and performing the dimension reduction processing comprises:

4. The image clustering retrieval system according to claim 2, wherein the method of extracting the feature vectors of the plurality of images of the plurality of target persons, respectively, and performing the dimension reduction processing comprises:

5. The image clustering retrieval system according to claim 4, wherein the method of obtaining the color label of the set part corresponding to the target person and the feature vector corresponding to the target person in each clustered image set comprises:

6. The image clustering retrieval system of claim 1, wherein the text retrieval module "obtains the clustered image set of the target person matching the input set part in the database based on the color information of the input set part" by:

7. The image clustering retrieval system of claim 1, wherein the browsing information retrieval module "obtains the attention degree of the user browsing images according to the set attention degree calculation rule" by: