CN110609916A

CN110609916A - Video image data retrieval method, device, equipment and storage medium

Info

Publication number: CN110609916A
Application number: CN201910912178.6A
Authority: CN
Inventors: 常亮
Original assignee: Sichuan Dongfang Wangli Technology Co Ltd
Current assignee: Sichuan Dongfang Wangli Technology Co Ltd
Priority date: 2019-09-25
Filing date: 2019-09-25
Publication date: 2019-12-24

Abstract

The invention relates to a video image data retrieval method, a device, equipment and a storage medium, wherein the method comprises the following steps: acquiring a picture retrieval database and a training database; performing clustering training on the feature data in the training database to generate a preset number of data buckets, and determining the clustering center of each data bucket; calculating the distance between each piece of feature data in the picture retrieval database and each clustering center, and adding each piece of feature data in the picture retrieval database into a corresponding data bucket according to a first distance rule to determine an inverted index table; calculating the distance between the feature matrix of the picture to be retrieved and each clustering center, and determining a target data bucket according to a second distance rule; and calculating the distance between the characteristic vector matrix of the picture to be retrieved and the clustering center of the target data bucket based on the inverted index table, and determining a similar picture with the picture to be retrieved as a retrieval result according to a retrieval rule. The performance and efficiency of video image data retrieval are improved.

Description

Video image data retrieval method, device, equipment and storage medium

Technical Field

The invention relates to the technical field of image data retrieval, in particular to a video image data retrieval method, a video image data retrieval device, video image data retrieval equipment and a storage medium.

Background

With the rapid development of the internet, large-scale video image data is becoming more and more popular in search engines and social networks, and has attracted high attention to solutions including smart communities, intelligent security and AI (Artificial Intelligence) cities. In addition, with the increasing of human and urban video image resources, the size of video image data is larger and larger.

In the related technology, although a good clustering effect can be obtained by processing large-scale video image data, training of a training set is time-consuming during clustering, and model correctness verification is complex and troublesome; moreover, the partial depth network model not only needs to be trained, but also needs to be supported by a GPU (Graphics Processing Unit) when deployed on line, and thus, when the model is relatively complex, a large amount of computing performance is required to construct. Thus, the related art is relatively efficient in data processing before searching, and has great limitations in the searching method, such as slow processing speed and poor searching performance.

Disclosure of Invention

In view of this, a video image data retrieval method, apparatus, device and storage medium are provided to solve the problems of large retrieval difficulty, poor retrieval performance and low retrieval efficiency when facing large-scale, high-dimensional and sparsely distributed video image data in the related art.

The invention adopts the following technical scheme:

in a first aspect, an embodiment of the present application provides a video image data retrieval method, where the method includes:

acquiring a picture retrieval database and a training database;

performing clustering training on the feature data in the training database to generate a preset number of data buckets, and determining the clustering center of each data bucket;

calculating the distance between each piece of feature data in the picture retrieval database and each clustering center, and adding each piece of feature data in the picture retrieval database into a corresponding data bucket according to a first distance rule to determine an inverted index table;

calculating the distance between the feature matrix of the picture to be retrieved and each clustering center, and determining a target data bucket according to a second distance rule;

and calculating the distance between the characteristic vector matrix of the picture to be retrieved and the clustering center of the target data bucket based on the inverted index table, and determining a similar picture of the picture to be retrieved as a retrieval result according to a retrieval rule.

In a second aspect, an embodiment of the present application provides a video image data retrieval apparatus, including:

the data acquisition module is used for acquiring a picture retrieval database and a training database;

the clustering module is used for carrying out clustering training on the characteristic data in the training database to generate a preset number of data buckets and determining the clustering center of each data bucket;

the index table determining module is used for calculating the distance between each piece of feature data in the picture retrieval database and each clustering center, and adding each piece of feature data in the picture retrieval database into a corresponding data bucket according to a first distance rule to determine an inverted index table;

the target data bucket determining module is used for calculating the distance between the feature matrix of the picture to be retrieved and each clustering center and determining a target data bucket according to a second distance rule;

and the retrieval module is used for calculating the distance between the characteristic vector matrix of the picture to be retrieved and the clustering center of the target data bucket based on the inverted index table, and determining a similar picture with the picture to be retrieved as a retrieval result according to a retrieval rule.

In a third aspect, an embodiment of the present application provides an apparatus, including:

a processor, and a memory coupled to the processor;

the memory is configured to store a computer program, where the computer program is at least configured to execute the video image data retrieval method according to the first aspect of the embodiments of the present application;

the processor is used for calling and executing the computer program in the memory.

In a fourth aspect, the present application provides a storage medium, where a computer program is stored, and when the computer program is executed by a processor, the steps in the video image data retrieval method according to the first aspect are implemented.

By adopting the technical scheme, the invention obtains the picture retrieval database and the training database; performing clustering training on the feature data in the training database to generate a preset number of data buckets, and determining the clustering center of each data bucket, so that data can be partitioned into buckets to hash high-dimensional data points into a plurality of data buckets, and the retrieval efficiency is improved; calculating the distance between each piece of feature data in the picture retrieval database and each clustering center, adding each piece of feature data in the picture retrieval database into a corresponding data bucket according to a first distance rule to determine an inverted index table, and adopting inverted indexes and data bucket division to better reduce the time complexity and the space complexity in high-dimensional data retrieval; calculating the distance between the feature matrix of the picture to be retrieved and each clustering center, and determining a target data bucket according to a second distance rule; and calculating the distance between the characteristic vector matrix of the picture to be retrieved and the clustering center of the target data bucket based on the inverted index table, and determining a similar picture with the picture to be retrieved as a retrieval result according to a retrieval rule. Therefore, when the data with large scale, high dimensionality and sparse distribution is faced, the retrieval process is more efficient, and the retrieval performance is improved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

Fig. 1 is a flowchart of a video image retrieval method according to an embodiment of the present invention;

fig. 2 is a flowchart of another video image retrieval method according to an embodiment of the present invention;

FIG. 3 is a diagram of a data set prior to dimension reduction in an example dimension reduction suitable for use in embodiments of the present invention;

FIG. 4 is a diagram of a data set during processing in a dimension reduction example suitable for use in embodiments of the present invention;

FIG. 5 is a schematic diagram of a data set during processing in another dimension reduction example applicable in embodiments of the present invention;

fig. 6 is a schematic structural diagram of a video image retrieving apparatus according to an embodiment of the present invention;

fig. 7 is a schematic structural diagram of an apparatus provided in an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the technical solutions of the present invention will be described in detail below. It is to be understood that the described embodiments are merely exemplary of the invention, and not restrictive of the full scope of the invention. All other embodiments, which can be derived by a person skilled in the art from the examples given herein without any inventive step, are within the scope of the present invention.

Examples

Fig. 1 is a flowchart of a video image retrieval method according to an embodiment of the present invention, which may be implemented by a video image retrieval apparatus according to an embodiment of the present invention, where the apparatus may be implemented in software and/or hardware. Referring to fig. 1, the method may specifically include the following steps:

s101, acquiring a picture retrieval database and a training database.

Specifically, the picture retrieval database usually includes a large number of pictures, for example, 100 million pictures, which may be randomly captured pictures of each monitored intersection in a certain city, or monitored pictures in a certain cell within a certain period of time. In addition, the picture may be a directly captured picture, or may be a plurality of pictures obtained from video frames extracted from a monitored video image. The application scenario of the embodiment of the present application may be that a picture similar to a picture to be retrieved is found from a large number of pictures in the picture retrieval database, and this process may be referred to as a retrieval process. In addition, a training database is also required to be obtained, the training database also comprises a large number of pictures, and the pictures can be the same as or different from the pictures in the picture search library, or can be partially the same, that is, there is no necessary connection. By training the database, the image retrieval database can be integrated and then video image data can be retrieved.

S102, performing clustering training on the feature data in the training database to generate a preset number of data buckets, and determining the clustering center of each data bucket.

A data bucket may be understood as a part of a data set in a database, and one data set may generate a plurality of data buckets, and the manner of generating the data buckets is generally a clustering process. The process of dividing a collection of physical or abstract objects into classes composed of similar objects is called clustering, and a cluster generated by clustering is a collection of data objects that are similar to objects in the same cluster and different from objects in other clusters. In the embodiment of the present application, when the objects in the set are feature data in a video image, each class is referred to as a data bucket.

Specifically, a preset clustering algorithm is applied to cluster the feature data in the training database, and the clustering result is to generate data buckets with preset number, the preset number is related to the applied clustering algorithm, and the number of the generated preset data buckets may be different if the applied clustering algorithm is different. Each data bucket stores part of data in the data set, and each data bucket has a cluster center, which may be a piece of feature data in the training database. In the actual application process, the number of the data buckets is the number of the categories of the clusters, that is, if there are several data buckets, the training data is divided into several categories.

S103, calculating the distance between each piece of feature data in the picture retrieval database and each clustering center, and adding each piece of feature data in the picture retrieval database into a corresponding data bucket according to a first distance rule to determine an inverted index table.

Specifically, the distance between each piece of feature data in the picture search library and the center of each cluster is calculated, wherein the distance refers to the distance between vectors. In the actual application process, the distances between the vectors can be in various types, and further correspond to various distance calculation modes. Here, a distance calculation or distance representation mode between vectors may be randomly selected. The first distance rule may be that the distance is shortest, the distance is longest, or the distance satisfies a certain condition. In the embodiment of the application, each piece of feature data in the picture retrieval database is added to each data bucket according to the first distance rule, so that the feature data in the picture retrieval database is fused in the training database. At this time, an inverted index table is determined, in which the numbers of the data buckets and the corresponding relationship of the feature data stored in each data bucket are stored, and finally the inverted index table is stored in a Random Access Memory (RAM).

And S104, calculating the distance between the feature matrix of the picture to be retrieved and each clustering center, and determining a target data bucket according to a second distance rule.

Specifically, the number of the pictures to be retrieved may be one or more, and when the number of the pictures to be retrieved is more than one, the pictures to be retrieved may be processed for multiple times in a one-to-one processing manner, that is, processed in batch. Here, a picture to be processed is taken as an example for explanation. Extracting a characteristic matrix from each picture to be retrieved, wherein the characteristic matrix can be a 128-dimensional vector, and calculating the distance between the characteristic matrix of the picture to be retrieved and each clustering center, so that a distance can be obtained for each data bucket; multiple buckets may get multiple distances. In a specific example, the second distance rule may be a shortest distance rule, so that at least one data bucket satisfying the second distance rule is determined as the target data bucket from all the data buckets according to the length of the distance.

And S105, calculating the distance between the characteristic vector matrix of the picture to be retrieved and the clustering center of the target data bucket based on the inverted index table, and determining a similar picture of the picture to be retrieved as a retrieval result according to a retrieval rule.

Specifically, based on the inverted index table, in the determined at least one target data bucket, the distance of the clustering center of the feature vector matrix target data bucket of the picture to be retrieved is calculated, wherein the feature vector matrix is a matrix formed by combining a plurality of feature vectors. For example, the retrieval rule may be that M with the highest similarity in the similar pictures are taken as the retrieval result, where M is a positive integer, and the maximum value of M is the number of the similar pictures.

Fig. 2 is a flowchart of another video image retrieval method according to another embodiment of the present invention, which is implemented on the basis of the above embodiments. Referring to fig. 2, the method may specifically include the following steps:

s201, acquiring an original picture retrieval database and an original training database.

Specifically, the retrieval picture directly obtained is stored in the original picture retrieval database, and the training picture directly obtained is stored in the original training database, wherein the obtaining can be shooting by a camera, and the camera monitors the camera; directly may mean without any processing after acquisition. In addition, the camera may take pictures or videos, and if the video is a video, the video may be subjected to image extraction to obtain a plurality of pictures, which is not limited herein.

S202, extracting retrieval characteristic data of the pictures in the original picture retrieval database and training characteristic data of the pictures in the original training database.

Specifically, retrieval feature data of the pictures in the original picture retrieval library is extracted, for example, a 128-dimensional feature vector may be extracted; the training feature data for the pictures in the original training database may also be 128-dimensional.

And S203, performing dimension reduction and compression processing on the retrieval characteristic data and the training characteristic data to obtain a picture retrieval database and a training database.

The dimension reduction process is to process high-dimensional feature data into low-dimensional feature data; the compression process is to process a plurality of vectors into a set of vectors according to a certain rule. In a specific example, in the embodiment of the present application, a PCA (Principal Component Analysis) may be applied to perform a dimension reduction process, and a PQ (Product Quantization) algorithm is applied to perform compression, so that a picture search database and a training database may be obtained.

In order to make the technical solution of the embodiment of the present application easier to understand, the PCA algorithm and the PQ algorithm are briefly described below.

The PCA algorithm projects data to a low-dimensional subspace to realize dimension reduction, for example, the dimension reduction of a two-dimensional data set is to project points to a line, each sample of the data set can be represented by one value, and two values are not required; three-dimensional data sets can be reduced to two dimensions, i.e. variables are mapped to a plane. The principle of PCA dimensionality reduction is explained as follows: mapping the high-dimensional dataset to a low-dimensional space, and simultaneously reserving more variables as much as possible; aligning the PCA rotation data set with the principal component thereof, and reserving the most variables into the first principal component; the data set looks like an elongated flat ellipse extending from the origin to the upper right corner. To reduce the dimensionality of the entire data set, points must be mapped into a line.

In a specific example, FIG. 3 shows a diagram of a dataset before dimension reduction in a dimension reduction example; FIG. 4 is a diagram of a data set during processing in a dimension reduction example; FIG. 5 shows a schematic diagram of a data set during processing in another dimension reduction example. Referring to fig. 3, 4 and 5, where the solid and dashed lines in fig. 4 and 5 are both data sets to which the data sets can be mapped, it can be seen that the variation of the samples mapped to the dashed lines is greater than the variation mapped to the solid lines. In practice, this dashed line is the first principal component. The second principal component must be orthogonal to the first principal component, i.e. the second principal component must be statistically independent and will appear in a direction perpendicular to the first principal component.

The principle of the PQ algorithm is as follows: assuming that the picture retrieval database has 100 ten thousand pictures, each picture extracts at least one 128-dimensional feature vector, here, taking the example of extracting one 128-dimensional vector, the 128-dimensional vector is divided into 8 short vectors, each of which is 16-dimensional, that is, the picture retrieval database contains 100 ten thousand by 8 vectors in total, which may be referred to as 8 piles of short vectors, and each pile has 100 ten thousand short vectors. Each heap of short vectors is clustered into 256 classes using K-Means. Each picture in the picture retrieval database is represented by a plurality of 128-dimensional vectors, each 128-dimensional vector is divided into 8 16-dimensional short vectors, and for each short vector, which one of 256 classes belongs to a stack of short vectors is found. In this way, 8 short vectors are respectively searched for which one of 256 classes belongs in 8 piles, each short vector of the 8 short vectors of a picture has 256 choices, namely, 256 choices of 8 powers of 256 in total for a picture, namely, 64 powers of 2, namely, the feature equivalent to a picture can be represented as 64-bit binary number, so that the number of picture search libraries can be large, when the picture to be searched is searched, the first one eighth short vector of the picture to be searched is compared in a picture search database, if the method of nearest neighbor search is used, 255/256 pictures which do not accord with the library in the first short vector are discarded after the judgment of the first short vector, namely only 1/256 pictures of the library are searched, and 255/256 pictures are discarded when the second one eighth short vector of the picture to be searched is compared, the total of 8 times of discarding is carried out, so that the searching workload is greatly reduced, and the picture to be searched is not compared with each picture in the picture searching database. It should be noted that the picture to be retrieved is a general concept, and is a picture that needs to be matched with the picture retrieval library in the PQ algorithm; in addition, a plurality of 128-dimensional vectors are extracted, and in the actual application process, one 128-dimensional vector is usually extracted from one picture.

Optionally, the dimension of the feature data in the picture retrieval database is the same as the dimension of the feature data in the training database, and the compression level of the feature data in the picture retrieval database is the same as the compression level of the feature data in the training database.

In order to ensure the accuracy of the retrieval process, dimension reduction algorithms applied when dimension reduction and compression processing are performed on the retrieval characteristic data and the training characteristic data are the same, for example, the dimension reduction algorithms are PCA algorithms; the same compression algorithm is applied, e.g., both PQ algorithms. In addition, it is also ensured that the dimension of the feature data in the picture search library is the same as the dimension of the feature data in the training database, and the compression level of the feature data in the picture search library is the same as the compression level in the training database during the dimension reduction and compression processing.

And S204, acquiring a picture retrieval database and a training database.

S205, clustering training is carried out on the feature data in the training database by applying a K-Means clustering algorithm, and a preset number of data buckets are generated.

Wherein K-Means is one of unsupervised clustering algorithms, K represents the number of categories, and Means represents the mean value. As the name implies, is an algorithm that clusters data points by mean. The algorithm partitions similar data points by a preset K value and an initial centroid of each category. And obtaining an optimal clustering result through the mean iterative optimization after the division. Specifically, the image retrieval database and the training database are subjected to dimension reduction and compression. And then, clustering the characteristic data in the training database by applying a K-Means clustering algorithm. Optionally, the preset number is K, and K is a positive integer. That is, the value of K is the number of the generated data buckets, that is, the number of the categories of the clusters.

And S206, determining the clustering center of each data bucket.

Specifically, the clustering center of each data bucket can be determined according to the K-Means clustering process, wherein the clustering center is characteristic data in the training database, which conforms to the rules of the clustering algorithm.

S207, calculating the vector inner product or Euclidean distance between each piece of feature data in the picture retrieval database and each clustering center.

Specifically, the euclidean distance may also be referred to as an L2 distance, and a vector inner product or the euclidean distance between each piece of feature data in the picture retrieval database and each cluster center is calculated. In the embodiment of the present application, one of the two methods may be optionally selected for calculation, and other methods for calculating the distance between vectors may also be applied for calculation, which does not form a specific limitation here.

S208, adding each piece of feature data in the image retrieval database into a corresponding data bucket according to a first distance rule that the vector inner product is maximum or the Euclidean distance is minimum.

The first distance rule may be that the inner product of the vectors is maximum or the euclidean distance is minimum, and if the distance calculation method in S207 is the inner product of the vectors, it is determined that the inner product of the first distance rule is maximum; if the distance is determined in S207 in the euclidean distance, it is determined that the first distance rule is the minimum euclidean distance. Specifically, the euclidean distance is taken as an example, and a piece of feature data of one picture in the picture retrieval database is taken as an example, so that the distance between the piece of feature data and the clustering center of each data bucket is calculated, and then the piece of data is added to the data bucket corresponding to the clustering center with the minimum euclidean distance obtained by calculation.

S209, constructing a locality sensitive hash function, and constructing an inverted index table by applying the locality sensitive hash function based on the training data added with the feature data in the picture retrieval database.

Specifically, the reverse index table stores the feature data, the data bucket numbers and the corresponding relationship between the feature data and the data bucket numbers, that is, the attribution relationship of the feature data, the attribution relationship is mapped and stored in the reverse indexes of the K nodes according to the locality sensitive hashing algorithm, and the indexes are stored in the RAM.

A specific example of how to apply the locality sensitive hash function to construct the inverted index table in the embodiment of the present application is described below. It should be noted that the data points in the original high-dimensional feature space are training data to which feature data in the image search database is added.

Firstly, a data point set of an original high-dimensional feature space is divided into a plurality of smaller subsets through mapping transformation of a selected hash function, and the number of elements in each subset is smaller and adjacent. The projection transformation is carried out on the original characteristic data by constructing a local sensitive hash function, so that each dimension of the image data characteristic projected to a new space has local sensitivity more than that of the original characteristic space, and the dimension disaster is successfully overcome. The new image feature can be viewed as a more compact, low-dimensional representation than the original feature.

In one particular example. Assuming that x and y are data points of two original high-dimensional feature spaces, in the locality sensitive hashing algorithm, the hashing function generally satisfies the following condition:

in the formula: h is a hash function cluster; the hash function H (j) is randomly selected from H; sim is the similarity function; pr represents the similarity of the high-dimensional data point x and the high-dimensional data point y after transformation by the same hash function h (j); j represents the category of the hash function, and j is different, which indicates that the hash function is different; x and y represent two different high dimensional data points.

In the embodiment of the application, a Hash function { h: R is adopted^dThe equation for → Z } is as follows:

in the formula: x is a high dimensional data point; w is the quantization width of the projection; the parameter b obeys a uniform distribution with an interval of 0, w. a represents a proportional coefficient, each element of a obeys p-stable distribution, and i is different and represents different elements of a; r represents a real number, Z represents an integer, and d represents an arbitrary natural number; v represents a parameter in the hash function h (j). The p-stable distribution has the following properties: if two variables obey a p-stable distribution, then the linear combination of these two variables also obeys a p-stable distribution, with the sign [ ] representing a rounding operation. The hash function maps a high-dimensional vector to an integer. Because the single hash function is not very discriminative, the following second-level hash function is constructed:

g_i(x)＝{h_j.l(x),...,h_j.k(x)}

in the formula: g represents a secondary hash function; l represents the number of hash buckets in this example; k represents the dimension of the hash code after the hash transform.

In the embodiment of the application, through "projection" and "quantization" operations, the high-dimensional data point x is respectively indexed into a certain data bucket of a hash table, where the hash table stores not the feature vector itself but an identifier representing the data point x and its position in the database.

And mapping the characteristic data to be queried through a local sensitive hash function to obtain a corresponding hash code, taking out data in a corresponding data bucket according to the hash code, linearly retrieving one or more data which are closest to the queried data in the data, and returning a result.

S210, calculating the distance between the feature matrix of the picture to be retrieved and each clustering center, and determining the data buckets corresponding to the preset number of distances as target data buckets according to a second distance rule.

Specifically, the second distance rule may be N shortest distances, where N is less than or equal to K, and N is an integer. And calculating the distance between the feature matrix of the picture to be retrieved and each clustering center, and then determining the data buckets corresponding to the N clustering centers with the shortest distance as target data buckets according to a second distance rule with the shortest distance. Thus, N data buckets are found from the K data buckets for subsequent processing.

Optionally, in this step, firstly, obtaining an original picture to be retrieved, and performing dimension reduction and compression processing on the original picture to be retrieved to obtain the picture to be retrieved; the dimension of the feature data of the picture to be retrieved is the same as the dimension of the feature data in the picture retrieval database, and the compression level of the feature data of the picture to be retrieved is the same as the level of the feature data in the picture retrieval database.

In order to ensure the retrieval accuracy, dimension reduction and compression processing are also carried out on the original picture to be retrieved, so that the picture to be retrieved with the same dimension as the feature data in the picture retrieval database and the picture to be retrieved with the same compression level as the feature data in the picture retrieval database are obtained. In one specific example, the dimension reduction algorithm may apply a PCA algorithm and the compression algorithm may apply a PQ algorithm.

S211, searching the clustering center of the target data bucket in the inverted index table according to the serial number of the data bucket.

Specifically, after the target data bucket is determined, the clustering center of the target data bucket is searched by applying the inverted index table according to the number of each data bucket and the number of the target data bucket. Therefore, after the clustering center of the target data bucket is found, the characteristic data in the corresponding data bucket can be found.

S212, calculating the distance between the feature vector matrix of the picture to be retrieved and the clustering center of the target data bucket, and determining the similar features matched with the picture to be retrieved according to the distance.

Specifically, the feature vector matrix of the picture to be retrieved and the clustering center of the target data bucket are calculated by way of example, the distance can be an Euclidean distance or a cosine distance, and then similar features matched with the picture to be retrieved are determined according to the size of the cosine distance. In a specific example, the first several features with the smallest distance may be taken as the similar feature set, for example. The number of features in the similar feature set is related to a search rule or a search requirement, and is not limited herein.

In a specific example, when the matching degree is determined according to the distance, for example, the matching degree of the identical data is 100%, the data are sequentially decreased; and then a relation table of distance and matching degree can be made, and then the best matching similar characteristics can be found out according to the relation table and the inverted index table.

And S213, determining a retrieval result according to the retrieval rule and the similar characteristics.

Specifically, the search result is determined according to the search rule and the display feature, for example, when the search requirement is strict, only one of the search results S212 with the smallest distance may be selected as the search result; when the search requirement is relaxed and is not missing, the smallest distance in S212 may be selected as the search result. In addition, the search result may be a data set with similar features, or may be a number or an identification number of a picture similar to the picture to be searched, and is not limited here.

In the embodiment of the application, the original image retrieval database and the original training database as well as the original image to be retrieved are subjected to dimension reduction and compression respectively to obtain the image retrieval database, the training database and the image to be retrieved with the same dimension and compression level respectively, so that the retrieval accuracy is improved; clustering by using a K-Means clustering algorithm to obtain a preset number of data buckets and clustering centers of the data buckets; adding each piece of feature data in the picture retrieval database into a corresponding data bucket according to a first preset distance rule, so that high-dimensional data are hashed into a plurality of data buckets; by applying a mode of combining Hash sub-buckets and inverted indexes, the time-space complexity in high-dimensional data retrieval is reduced more, so that the retrieval process is more efficient; in addition, similar features matched with the picture to be retrieved are determined as retrieval results according to the distance between the feature vector matrix of the picture to be retrieved and the clustering center of the target data bucket. Therefore, when the data with large scale, high dimensionality and sparse distribution is faced, the retrieval process is more efficient, and the retrieval performance is improved.

In addition, in the related art, there are several search methods, for example: the method comprises an approximate adjacent retrieval method, a high-dimensional data retrieval method based on preset sequencing provided by a characteristic retrieval module, a method for retrieving by utilizing a pruning method based on a tree-shaped storage structure and the like, wherein in the methods, when the method faces large-scale, high-dimensional and sparsely-distributed data, the retrieval difficulty is huge, the retrieval performance has huge pressure,

in summary, the technical solution provided by the present application can solve the problems existing in the related art, and has the following beneficial effects: carrying out data bucket division on all feature data sets in a training database by utilizing a PCA dimension reduction algorithm and a local sensitivity Hash algorithm, and adding each feature data in a picture retrieval database into a data bucket subjected to high-efficiency clustering, so that the 'dimension disaster' is successfully overcome, and the calculated amount is reduced; by utilizing the inverted index technology, an efficient inverted index table is added to all feature data on the basis of data bucket division, so that the retrieval process is more efficient.

In addition, in the clustering process, if the data volume is too large, the feature data can be firstly divided into a front part and a rear part, the front part and the rear part are respectively subjected to data clustering, and the same index also needs to maintain two parts of indexes. However, the memory space required by this method is nearly twice that of the embodiment of the present application, and therefore, this method is suitable for the case where the memory is sufficient.

Fig. 6 is a schematic structural diagram of a video image retrieval apparatus according to an embodiment of the present invention, which is suitable for executing a video image retrieval method according to an embodiment of the present invention. As shown in fig. 6, the apparatus may specifically include: a data acquisition module 601, a clustering module 602, an index table determination module 603, a target data bucket determination module 604, and a retrieval module 605.

The data acquisition module 601 is configured to acquire a picture retrieval database and a training database; a clustering module 602, configured to perform clustering training on the feature data in the training database, generate a preset number of data buckets, and determine a clustering center of each data bucket; the index table determining module 603 is configured to calculate a distance between each piece of feature data in the picture retrieval database and each clustering center, and add each piece of feature data in the picture retrieval database to a corresponding data bucket according to a first distance rule to determine an inverted index table; a target data bucket determining module 604, configured to calculate distances between the feature matrix of the picture to be retrieved and each cluster center, and determine a target data bucket according to a second distance rule; and the retrieval module 605 is configured to calculate a distance between the feature vector matrix of the picture to be retrieved and the clustering center of the target data bucket based on the inverted index table, and determine a similar picture to the picture to be retrieved as a retrieval result according to a retrieval rule.

Further, the system also comprises a first preprocessing module, which is used for, before the picture retrieval database and the training database are obtained:

acquiring an original picture retrieval database and an original training database;

extracting retrieval characteristic data of pictures in an original picture retrieval database and training characteristic data of the pictures in an original training database;

performing dimension reduction and compression processing on the retrieval characteristic data and the training characteristic data to obtain a picture retrieval database and a training database;

the dimensionality of the feature data in the picture retrieval database is the same as the dimensionality of the feature data in the training database, and the compression level of the feature data in the picture retrieval database is the same as the compression level of the feature data in the training database.

Further, the system also comprises a second preprocessing module, which is used before calculating the distance between the feature matrix of the picture to be retrieved and each cluster center:

the method comprises the steps of obtaining an original picture to be retrieved, and carrying out dimension reduction and compression processing on the original picture to be retrieved to obtain the picture to be retrieved, wherein the dimension of feature data of the picture to be retrieved is the same as the dimension of the feature data in a picture retrieval database, and the compression level of the feature data of the picture to be retrieved is the same as the level of the feature data in the picture retrieval database.

Further, the clustering module 602 is specifically configured to:

performing clustering training on the characteristic data in the training database by using a K-Means clustering algorithm to generate a preset number of data buckets, wherein the preset number is K, and K is a positive integer;

the cluster center for each data bucket is determined.

Further, the index table determining module 603 includes:

the distance calculation submodule is used for calculating the vector inner product or Euclidean distance between each piece of feature data in the picture retrieval database and each clustering center;

and the determining submodule is used for adding each piece of feature data in the image retrieval database into the corresponding data bucket according to a first distance rule that the vector inner product is maximum or the Euclidean distance is minimum so as to determine the inverted index table.

Further, the determination submodule is specifically configured to:

and constructing a locality sensitive hash function, and constructing an inverted index table by applying the locality sensitive hash function based on the training data added with the feature data in the picture retrieval database.

Further, the target data bucket determining module 604 is specifically configured to:

and calculating the distance between the feature matrix of the picture to be retrieved and each clustering center, and determining the data buckets corresponding to the preset number of distances as target data buckets according to a second distance rule.

Further, the retrieving module 605 is specifically configured to:

in the inverted index table, searching a clustering center of a target data bucket according to the number of the data bucket;

calculating the distance between the characteristic vector matrix of the picture to be retrieved and the clustering center of the target data bucket, and determining similar characteristics matched with the picture to be retrieved according to the distance;

and determining a retrieval result according to the retrieval rule and the similar characteristics.

The video image data retrieval device provided by the embodiment of the invention can execute the video image data retrieval method provided by any embodiment of the invention, and has corresponding functional modules and beneficial effects of the execution method.

An embodiment of the present invention further provides an apparatus, please refer to fig. 7, fig. 7 is a schematic structural diagram of an apparatus, as shown in fig. 7, the apparatus includes: a processor 710, and a memory 720 coupled to the processor 710; the memory 720 is used for storing a computer program for executing at least the video image data retrieval method in the embodiment of the present invention; processor 710 is used to invoke and execute computer programs in memory; the video image data retrieval method at least comprises the following steps: acquiring a picture retrieval database and a training database; performing clustering training on the feature data in the training database to generate a preset number of data buckets, and determining the clustering center of each data bucket; calculating the distance between each piece of feature data in the picture retrieval database and each clustering center, and adding each piece of feature data in the picture retrieval database into a corresponding data bucket according to a first distance rule to determine an inverted index table; calculating the distance between the feature matrix of the picture to be retrieved and each clustering center, and determining a target data bucket according to a second distance rule; and calculating the distance between the characteristic vector matrix of the picture to be retrieved and the clustering center of the target data bucket based on the inverted index table, and determining a similar picture with the picture to be retrieved as a retrieval result according to a retrieval rule.

The embodiment of the present invention further provides a storage medium, where the storage medium stores a computer program, and when the computer program is executed by a processor, the method implements the steps in the video image data retrieval method in the embodiment of the present invention: acquiring a picture retrieval database and a training database; performing clustering training on the feature data in the training database to generate a preset number of data buckets, and determining the clustering center of each data bucket; calculating the distance between each piece of feature data in the picture retrieval database and each clustering center, and adding each piece of feature data in the picture retrieval database into a corresponding data bucket according to a first distance rule to determine an inverted index table; calculating the distance between the feature matrix of the picture to be retrieved and each clustering center, and determining a target data bucket according to a second distance rule; and calculating the distance between the characteristic vector matrix of the picture to be retrieved and the clustering center of the target data bucket based on the inverted index table, and determining a similar picture with the picture to be retrieved as a retrieval result according to a retrieval rule.

It is understood that the same or similar parts in the above embodiments may be mutually referred to, and the same or similar parts in other embodiments may be referred to for the content which is not described in detail in some embodiments.

It should be noted that the terms "first," "second," and the like in the description of the present invention are used for descriptive purposes only and are not to be construed as indicating or implying relative importance. Further, in the description of the present invention, the meaning of "a plurality" means at least two unless otherwise specified.

Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing specific logical functions or steps of the process, and alternate implementations are included within the scope of the preferred embodiment of the present invention in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the present invention.

It should be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.

It will be understood by those skilled in the art that all or part of the steps carried by the method for implementing the above embodiments may be implemented by hardware related to instructions of a program, which may be stored in a computer readable storage medium, and when the program is executed, the program includes one or a combination of the steps of the method embodiments.

In addition, functional units in the embodiments of the present invention may be integrated into one processing module, or each unit may exist alone physically, or two or more units are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. The integrated module, if implemented in the form of a software functional module and sold or used as a stand-alone product, may also be stored in a computer readable storage medium.

The storage medium mentioned above may be a read-only memory, a magnetic or optical disk, etc.

In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.

Although embodiments of the present invention have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present invention, and that variations, modifications, substitutions and alterations can be made to the above embodiments by those of ordinary skill in the art within the scope of the present invention.

Claims

1. A method for retrieving video image data, comprising:

acquiring a picture retrieval database and a training database;

2. The method of claim 1, wherein before obtaining the picture search database and the training database, further comprising:

extracting retrieval feature data of the pictures in the original picture retrieval database and training feature data of the pictures in the original training database;

3. The method according to claim 2, wherein before calculating the distance between the feature matrix of the picture to be retrieved and each of the cluster centers, the method further comprises:

4. The method of claim 1, wherein performing cluster training on the feature data in the training database to generate a preset number of data buckets, and determining a cluster center of each data bucket comprises:

performing clustering training on the feature data in the training database by using a K-Means clustering algorithm to generate a data bucket with a preset number, wherein the preset number is K, and K is a positive integer;

the cluster center for each data bucket is determined.

5. The method according to claim 1, wherein the distance between each piece of feature data in the picture retrieval database and each cluster center is calculated, and each piece of feature data in the picture retrieval database is added to a corresponding data bucket according to a first distance rule to determine an inverted index table;

calculating the vector inner product or Euclidean distance between each piece of feature data in the picture retrieval database and each clustering center;

and adding each piece of feature data in the picture retrieval database into a corresponding data bucket according to a first distance rule that the vector inner product is maximum or the Euclidean distance is minimum to determine an inverted index table.

6. The method of claim 5, wherein determining the inverted index table comprises:

and constructing a local sensitive hash function, and constructing an inverted index table by applying the local sensitive hash function based on the training data added with the feature data in the picture retrieval database.

7. The method according to claim 1, wherein the calculating the distance between the feature matrix of the picture to be retrieved and each cluster center, and the determining the target data bucket according to the second distance rule comprises:

8. The method according to claim 1, wherein the calculating a distance between a feature vector matrix of the picture to be retrieved and a cluster center of the target data bucket based on the inverted index table, and determining a similar picture to the picture to be retrieved as a retrieval result according to a retrieval rule comprises:

in the inverted index table, searching a clustering center of the target data bucket according to the serial number of the data bucket;

calculating the distance between the feature vector matrix of the picture to be retrieved and the clustering center of the target data bucket, and determining similar features matched with the picture to be retrieved according to the distance;

9. A video image data retrieval apparatus, comprising:

10. An apparatus, comprising:

a processor, and a memory coupled to the processor;

the memory is adapted to store a computer program for performing at least the video image data retrieval method of any of claims 1-8;

11. A storage medium, characterized in that the storage medium stores a computer program which, when executed by a processor, implements the steps of the video image data retrieval method according to any one of claims 1 to 8.