CN110413807B

CN110413807B - Image query method and system based on content semantic metadata

Info

Publication number: CN110413807B
Application number: CN201910546661.7A
Authority: CN
Inventors: 周可; 刘毅斐; 刘渝; 汪洋涛; 杨玉娟
Original assignee: Huazhong University of Science and Technology
Current assignee: Huazhong University of Science and Technology
Priority date: 2019-06-24
Filing date: 2019-06-24
Publication date: 2021-04-20
Anticipated expiration: 2039-06-24
Also published as: CN110413807A

Abstract

The invention discloses an image query method based on content semantic metadata, which comprises the following steps: inputting the uploaded image file into a deep self-learning hash network, and obtaining a semantic hash code corresponding to the image through deep self-learning hash processing; uploading the semantic hash codes and the image files to a storage system in a one-to-one correspondence manner, and taking the hash codes as semantic metadata of the corresponding image files; setting a chain table with two ends connected with a hash table to link image files with the same semantic hash value, wherein the chain table is used for organizing the image files with the same semantic hash value; initiating a semantic query request through a semantic query interface, wherein the semantic query interface acquires a semantic hash code in the semantic query request through a Hamming map; and initiating an image file request corresponding to the semantic hash code to the storage system through the hash tables connected at the two ends, and returning the image file corresponding to the semantic hash code. The image file query based on semantic content is realized.

Description

Image query method and system based on content semantic metadata

Technical Field

The invention belongs to the technical field of data retrieval, and particularly relates to an image query method and system based on content semantic metadata.

Background

In data storage systems, metadata has proven to be a vital part of the storage system. It has been found that although metadata only accounts for no more than 1% of the capacity of the storage system, more than 50% of the operations in the storage system require the use of metadata.

When a user needs to query the file content itself, it is not efficient to rely solely on existing metadata structures, since these metadata do not store information related to the file content. For example, in the context of a large-scale image storage system, where a user wants to find out images that are similar to a particular image (e.g., images of cats), a simple metadata search is not sufficient; to address this problem, researchers have proposed queryable semantic storage systems because simple metadata in the storage system (e.g., file size, creation time, etc.) is not associated with the content of the file.

The queryable semantic storage system refers to a storage system which is constructed according to semantics and relevance in the system and can support query operation. At present, three types of main-stream queryable voice storage systems exist, the first type is a telescope (Spyglass) storage system proposed by Leung A W and the like, and files with the same or similar naming space are organized through a K-D tree to accelerate the query of metadata; the second is the intelligent storage (SmartStore) system proposed by Hua Y et al, which clusters files with similar metadata through latent semantic analysis, and speeds up the query process through R trees; the third is the FAST system proposed by Hua Y et al, which extracts Scale-invariant feature transform (SIFT) features of images in a storage system, and groups these features using locality-sensitive hashing so that locality-sensitive hashing values of similar features are closer.

However, the above-mentioned queryable semantic storage systems all have some non-negligible drawbacks: firstly, the former two systems use traditional metadata irrelevant to semantic content, and cannot solve the problem of semantic query based on content; secondly, the third system only supports precision search and does not support similarity search, i.e. complex query (such as range query) cannot be realized; finally, since the third system uses SIFT features, which are manually extracted features rather than deep semantic features of the image file, the third system only supports data-based content queries, but cannot support semantic-based content queries.

Disclosure of Invention

Aiming at the defects or improvement requirements of the prior art, the invention provides an image query method and system based on content semantic metadata, aiming at fusing the content semantic information of a file into a metadata structure of a storage system, so that the storage system supports content-based semantic query, thereby solving the technical problem that the file query cannot be carried out based on the content semantic in the existing queriable semantic storage system, and the invention can support similarity search, thereby realizing complex query.

To achieve the above object, according to one aspect of the present invention, there is provided an image query method based on content semantic metadata, including the steps of:

(1) acquiring an image set from the image data set, and processing the image set by using a learning type hash algorithm to obtain a hash label of the image set;

(2) taking the image set as the input of a neural network, taking the Hash label of the image set obtained in the step (1) as the output of the neural network, and carrying out iterative training on the neural network to obtain a trained neural network;

(3) acquiring a new image set from the image data set, inputting the image set into the neural network trained in the step (2) to obtain a hash value of the image set, storing the hash value of each image in the image set in metadata of the corresponding image, and using a separate link hash table to express a mapping relation between each obtained hash value and the corresponding image;

(4) constructing a Hamming map according to all hash values in the separated link hash table used in the step (3);

(5) receiving a semantic query request from a user side, analyzing the semantic query request to acquire an image to be queried, and acquiring a corresponding hash value from metadata of the image to be queried;

(6) searching corresponding nodes in the Hamming map established in the step (4) according to the Hash values obtained in the step (5), obtaining all other nodes which accord with the semantic query request and corresponding Hash values according to the searched nodes and combining the Hamming map, and determining images corresponding to the obtained Hash values corresponding to all other nodes as final query results according to the separated link Hash table.

Preferably, the learning-type hash algorithm is a deep self-learning hash algorithm and the image dataset is CIFAR-10, STL-10, or ImageNet.

Preferably, the neural network used in step (2) is a simpleenet neural network.

Preferably, the iterative training is performed by first setting initial parameters for the neural network, inputting the image set into the neural network to obtain an output result, comparing the output result with the hash tag obtained in step (1), then updating the initial parameters of the neural network through a back propagation algorithm, and then iteratively repeating the above processes until an error between the output result and the hash tag reaches a preset threshold.

Preferably, step (4) comprises in particular the following sub-steps:

(4-1) setting a counter i to 1;

(4-2) taking out the ith hash value in the separate link hash table, establishing a node corresponding to the ith hash value in the hamming graph, calculating hamming distances between the ith hash value and each hash value taken out before the ith hash value, selecting a minimum value from a plurality of hamming distances obtained by calculation, taking all the hamming values corresponding to the minimum value as nodes connected with the node corresponding to the ith hash value in the hamming graph, and taking the hamming distance between each hash value corresponding to the minimum value and the ith hash value as the weight of a connecting edge between two nodes in the hamming graph;

(4-3) judging whether i is the last item of the hash value in the separation link hash table, if so, ending the process, otherwise, setting i to i +1, and returning to the step (4-2).

Preferably, the step (6) of obtaining all other nodes and hash values corresponding to the other nodes according to the searched node and by combining with the hamming graph, and determining the images corresponding to the obtained hash values corresponding to all other nodes as the final query result according to the separate linked hash table includes the following substeps:

(6-1) taking the searched node as a current node;

(6-2) determining the number L of all nodes connected to the current node in the hamming graph, and setting a counter j to 1;

(6-3) judging whether the counter j is less than or equal to L, if so, turning to the step (6-4), and otherwise, turning to the step (6-8);

(6-4) judging whether the Hamming distance between the jth node connected with the current node in the Hamming graph and the searched node is smaller than or equal to the semantic similarity threshold, if so, entering the step (6-5), otherwise, ending the process;

(6-5) putting the jth node into the query result set;

(6-6) judging whether the Hamming distance between the jth node and the searched node is smaller than the semantic similarity threshold, if so, putting the jth node into the node set, and then entering the step (6-7), otherwise, ending the process;

(6-7) setting a counter j ═ j +1, and returning to the step (6-3);

(6-8) judging whether the node set is empty, if so, taking out all nodes in the query result set, and querying a separate link hash table according to hash values of the nodes in the Hamming map to obtain images corresponding to all the nodes as final query results, and ending the process; otherwise, turning to the step (6-9);

(6-9) randomly taking a node from the node set as a current node, deleting the current node from the node set, and returning to the step (6-2).

According to another aspect of the present invention, there is provided an image query system based on content semantic metadata, including:

the system comprises a first module, a second module and a third module, wherein the first module is used for acquiring an image set from an image data set and processing the image set by utilizing a learning type hash algorithm to obtain a hash label of the image set;

the second module is used for taking the image set as the input of the neural network, taking the Hash label of the image set obtained by the first module as the output of the neural network, and carrying out iterative training on the neural network to obtain a trained neural network;

a third module, configured to obtain a new image set from the image data set, input the image set into a neural network trained by the second module to obtain a hash value of the image set, store the hash value of each image in the image set in metadata of a corresponding image, and use a separate linked hash table to represent a mapping relationship between each obtained hash value and its corresponding image;

the fourth module is used for constructing a Hamming map according to all hash values in the separation link hash table used by the third module;

a fifth module, configured to receive a semantic query request from a user, parse the semantic query request to obtain an image to be queried, and obtain a corresponding hash value in metadata of the image to be queried;

and the sixth module is used for searching corresponding nodes in the Hamming map established by the fourth module according to the Hash values obtained in the fifth module, obtaining all other nodes which accord with the semantic query request and the corresponding Hash values thereof according to the searched nodes and by combining the Hamming map, and determining images corresponding to the obtained Hash values corresponding to all other nodes as final query results according to the separated link Hash table.

In general, compared with the prior art, the above technical solution contemplated by the present invention can achieve the following beneficial effects:

(1) the invention can solve the technical problem that the existing inquired semantic storage system can not support the semantic-based content inquiry: because the DSTH algorithm is adopted to extract the information with similar semantics, not just the information with similar pixels, the invention can be suitable for content query based on semantics.

(2) Because the invention constructs the Hamming graph, the nodes with similar semanteme are gathered to the adjacent area in the graph, and then the similar files can be searched in a simple graph searching mode without traversing the metadata of all the files, thereby obviously reducing the time overhead required by query.

(3) Because the DSTH algorithm is adopted, the generation time of the hash code is short with high efficiency, so that the high performance in the metadata uploading process can be ensured, and the metadata distribution and storage process is facilitated.

(4) The invention combines and optimizes a plurality of hash codes by using the separate link hash table, thereby not only identifying the most semantically similar images in semantic query, but also obviously reducing the storage cost and the calculation cost of the Hamming map.

Drawings

FIG. 1 illustrates a difference diagram of semantic similarity and data similarity;

FIG. 2 is a diagram illustrating semantic relationships;

FIG. 3 is a schematic diagram of a Hamming distance calculation method according to the present invention;

FIG. 4 is a flow chart illustrating the uploading of hash codes in the hash table according to the present invention;

FIG. 5 illustrates the overall workflow of the present invention from uploading a file to a storage system to performing a semantic query and returning results;

FIG. 6 is a diagram illustrating the selection of edges of a Hamming map based on a threshold value according to the present invention;

FIG. 7 is a diagram illustrating a semantic query process according to the present invention;

FIG. 8 is a schematic diagram of a split linked hash table used by the present invention;

FIG. 9 is a schematic diagram of a Hamming map constructed in accordance with the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. In addition, the technical features involved in the embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.

In order to solve the problem of Content-based Query which is important and to be solved urgently in a storage system and consider the current situation that the traditional metadata in the storage system cannot deal with the Content-based Query, the invention provides that the Content-based Semantic Information (Content-based Semantic Information) of the file is fused into the metadata structure of the storage system, so that the storage system supports the Content-based Semantic Query (Semantic Query), which is abbreviated as the Semantic Query in the invention. The invention mainly researches the semantic query of the image data and the files in the storage system, because the image data is typical unstructured data, the data volume is large, and the semantic information is difficult to express, which brings more serious challenge to the storage system. The invention uses the image content hash as the metadata of the storage system semantics, thereby recording the content semantics of the image file in the storage system, and organizing the content semantics of the image by using a more efficient Hamming Graph (Semantic Hamming Graph) structure. Not only can semantic similarity between the image and other files be calculated quickly, but also a query interface can be provided to facilitate query by a user. Meanwhile, the result can be quickly found and returned through the index of the graph structure, and a more efficient and intelligent interface and service are provided for the storage system.

The present invention works with the goal of implementing content-based semantic queries in a storage system through a metadata structure. Since metadata is very lightweight in storage systems (small footprint and fast metadata operation), adding semantic metadata to a storage system must also guarantee less overhead, which otherwise can load the storage system on computation, storage, and I/O, resulting in a decrease in overall storage performance. In consideration of the factors, the invention adopts a fixed-length binary semantic hash code as semantic metadata and integrates the semantic metadata into a storage system, wherein the semantic hash code is obtained through hash to hash. By means of dimension reduction and quantization, the image file can be converted into a binary code with a fixed length (such as the length of 48 bits or 96 bits). This approach greatly reduces the storage overhead of semantic metadata. In addition, according to the characteristics of Hash learning, the semantic similarity degree between the images can be calculated according to the distance between the Hash codes. Because the hash code is a binary characteristic (only containing 0 and 1), the distance can be obtained through Hamming distance operation, and the efficiency of realizing semantic query is greatly improved.

The metadata system for content semantic query, which is realized by the invention, can support semantic range query, and is explained as follows:

semantic scope queries refer to queries that specify all documents whose semantic similarity of content to a particular document (referred to as a query point document) satisfies within a given scope.

Unlike the former method based on locality sensitive hashing, the invention focuses on semantic similarity, such as semantic information described by an image file, that is, what the image describes, rather than shallow data information (pixels). Taking fig. 1 as an example, two pairs of images connected by a solid line are images depicting a dog and a cat, respectively, which are semantically similar, i.e., the two images connected are both depicting the same animal (same content). The two pairs of images connected by the dotted line are images with similar data, the pixel distribution between the two pairs of images is similar, but the two pairs of images describe the same species (dog or cat), so the two images do not have semantic similarity.

As shown in fig. 4, the image query method based on content semantic metadata of the present invention includes the following steps:

specifically, the learning type hash algorithm used in this step is a Deep Self-learning hash (DSTH) algorithm, and the number of images included in the image set may be freely set, and is preferably 100.

The image data set used in this step is CIFAR-10, STL-10, or ImageNet.

specifically, the neural network used in this step is a simpleenet neural network.

The iterative training process includes setting initial parameters (which may be any value) for the neural network, inputting the image set into the neural network to obtain an output result, comparing the output result with the hash tag obtained in step (1), updating the initial parameters of the neural network through a back propagation algorithm, and iteratively repeating the above process until an error between the output result and the hash tag reaches a preset threshold. The preset threshold value can be freely set, if the set value is smaller, the accuracy of the result obtained by the iterative training process is higher, and otherwise, the accuracy is lower.

(3) Acquiring a new image set from the image data set, inputting the image set into the neural network trained in the step (2) to obtain Hash values of the image set, storing the Hash value of each image in the image set in metadata of a corresponding image, and using a Separate Chaining Hash Table (Separate Chaining Hash Table) to represent a mapping relationship between each obtained Hash value and the corresponding image, as shown in fig. 8;

as can be seen in fig. 8, there are cases where one hash value corresponds to one image, and there are cases where one hash value corresponds to a plurality of images.

(4) Constructing a hamming graph according to all hash values in the split link hash table used in step (3), as shown in fig. 9;

the step (4) specifically comprises the following substeps:

(4-1) setting a counter i to 1;

(4-3) judging whether i is the last item of the hash value in the split link hash table, if so, ending the process, otherwise, setting i to i +1, and returning to the step (4-2);

Specifically, in this step, the following substeps are included to obtain all other nodes and hash values corresponding to the other nodes according to the searched node and in combination with the hamming graph, and determine, according to the separate linked hash table, the images corresponding to the hash values corresponding to all other nodes to be obtained as the final query result:

(6-1) taking the searched node as a current node;

specifically, the semantic similarity threshold can be freely set, and the larger the value is, the lower the accuracy of the finally obtained query result is, otherwise, the higher the accuracy is, and the preferred value is 1 or 2.

(6-5) putting the jth node into the query result set;

(6-7) setting a counter j ═ j +1, and returning to the step (6-3);

The learning type hash is a hash mapping method based on machine learning, which maintains the similarity of the original data through the machine learning method in the mapping process. The Hash code is used as an efficient nearest neighbor searching method and is suitable for large-scale occasions. Because a large-scale image file is processed, the features of the image are more complex compared with data such as text, and a larger amount of calculation is required to extract the content semantics. For example, for text data, a bag-of-words model can be used to obtain the content semantics of a text, but the image content semantics is difficult to obtain by a simple or shallow method, resulting in that the hash learning of the image content semantics is gradually changed from a shallow learning mode to deep learning.

The problem solved by the image hash learning algorithm is how to encode the image into fixed-length binary values (only 0 and 1). The fixed-length hash code not only occupies small storage space, but also is very suitable for retrieval of large-scale data sets. If the hash codes of the two images are similar, the content semantics described by the two images are also relatively similar. Image-oriented hash learning is a popular direction in the multimedia and computer vision fields. According to the method, Deep Self-learning Hash (DSTH) is used as a core algorithm for generating the content semantic Hash code according to a specific application scene of a storage system. The reason is as follows:

(1) DSTH fully uses the feature extraction capability of deep learning, and can effectively inquire and search the content semantics aiming at the image through the existing network and the training result on the large-scale data set. Compared with the traditional method based on locality sensitive hashing, DSTH can extract information with similar semantics, not just information with similar pixels, and therefore the method is suitable for semantic query. The experimental section lists the comparison of the DSTH semantic query accuracy compared to other LSH-based methods.

(2) The implementation of DSTH is a self-learning way, i.e. DSTH does not require tagging for new image data sets. For scenes storing massive heterogeneous data of the system, the system is very difficult to obtain the classification labels of the images. The DSTH does not require additional information (tags) to store image data in the system. Compared with a supervised hash algorithm depending on classification labels, the self-learning mode of the DSTH has more practicability and universality.

(3) DSTH trains and obtains the hash code by using a fast and efficient network SimpleNet, and the advantage of doing so is that the efficiency of training the hash model can be improved, and the generation process of the hash code can be effectively accelerated. For the scene of uploading files and metadata by a storage system, the uploading process must be low in cost and high in speed, otherwise the performance of the whole system is seriously affected. The high efficiency of DSTH ensures that the generation time of the hash code is short, thereby ensuring the high performance in the metadata uploading process and being beneficial to the completion of the metadata distribution and storage process.

The DSTH is mainly divided into two stages, namely a hash tag generation stage and a hash function learning stage.

The main task of the hash tag generation phase is to provide hash tags for the data sets, so as to guide the learning of the hash function in the next phase. The main task at this stage is to map the depth features into hamming space according to the graph structure, which first extracts the used data features by using the network model (e.g. google lenet, AlexNet, VGG16) obtained on the pre-training data set (e.g. ImageNet) in a transfer learning manner. The ImageNet as a pre-training data set has large data volume and various types, and has stronger accuracy guarantee when being used as a model for feature extraction.

After the features are obtained, dimension reduction operation is carried out on the features in a spectral hash mode, firstly, a graph model is built by using a K nearest algorithm, dimension reduction is carried out by using a Laplacian Eigenmaps (LE) algorithm, and then binarization is carried out on each dimension by taking an average value as a threshold value to obtain a hash binary code of 0 or 1. Through the steps, the finally obtained hash code keeps semantic similarity between original images and reduces image features into a binary form with short length and fixed length. The spectral hash can measure the distance between data from a global perspective and reflect the distance in the hash code, so that the algorithm has strong generalization capability.

In the DSTH hash function learning stage, the binary hash code obtained in the previous stage is used as a hash label, learning is carried out in an end-to-end mode, and a simple convolutional neural network is used for accelerating the processes of training and generating the hash code. After the convolutional neural network is trained, the convolutional neural network can be applied to the metadata generation module of the invention. The hash label contains global clustering information, so that the learning of the hash function has strong generalization capability. In the scene of the storage system, the used image types are varied, so that the DSTH can be used for well processing the multi-source data problem. In addition, the whole process does not use new labels, so that the training process can meet the actual application scene.

Because the focus of the present invention is to use DSTH rather than to propose and optimize DSTH, the theory and implementation of specific DSTH can be referred to (Liu Y, Song J, Zhou K, et al. deep self-task shaking for image retrieval [ J ]. IEEE transactions on cybernetics, 2018(99):1-13), and the selection and application process of the activation function, the encoding mode and the loss function used in the reference are described in detail, and in addition, the advantages of DSTH in accuracy compared with other image hashing algorithms are also shown. From literature experience, the present invention chooses a code length that uses 48 bits as semantic metadata. For a storage system, a 48-bit binary hash code only occupies 6 bytes (48 bits) in length, and the storage overhead generated by the 48-bit binary hash code is negligible compared with that generated by one image.

Although many image search engines, such as google image search and hundredth image search, support similar image query services, these search engines rely heavily on manually tagging images or using text and images for matching. This method of manual labeling represents a significant overhead, and in many scenarios, such queries find images that are data (pixels) similar, rather than semantically similar. In contrast, DSTH is a deep learning hash method based on semantic extraction, and image data can be automatically mapped in the form of a binary hash code without manual labeling.

A hamming graph is a data structure used in the system of the present invention to organize semantic metadata. DSTH generates a hash code for each image file, and the system generates a large number of hash codes. Semantic queries (range queries) are results that return certain conditions based on the relationships between semantic hash codes. Therefore, for a large number of hash codes, an efficient organization method is needed to construct a data structure supporting query. According to the characteristics of the binary hash code generated by the DSTH, the invention abandons the traditional tree structure in the organization mode. The tree structure acts as a hierarchical structure whose connectivity represents a directory hierarchy, but the connectivity of semantic similarities represents semantic associations between files. Meanwhile, the invention still keeps the directory-file hierarchy structure in the traditional directory tree, because the directory tree structure is the basic query and positioning structure in the storage system, the invention has very important significance to the storage system. The invention only enhances the content semantic query function of the traditional storage system by adding semantic metadata organization.

In essence, a tree structure is a special graph structure, and the advantage of a graph structure over a tree structure is that a graph structure can more flexibly express semantic similarity of contents between files. For image files in a storage system, the semantic relationship between the files is mostly non-hierarchical, as shown in fig. 2, a file a is semantically related to B and C, a file C is semantically related to D, and a file D is semantically related to a, and the semantic association relationship can be represented by a graph, but the association relationship is invalid under the traditional tree structure.

The binary hash code has the advantage of supporting efficient semantic association calculation, namely exclusive or operation under the hamming distance. The hamming distance is calculated as shown in fig. 3, the binary hash code 100001010 with the length of 9 is compared with the other three equal-length binary hash codes 110001010, 001001010, 10101110 in a way of comparing the two hash codes with different numbers of bits, which is the hamming distance between the two hash codes, for example, 100001010 and 110001010 have one bit (the second bit from the left) which is different, so the hamming distance between them is 1. The advantage of using hamming distance calculations is that computers are good at processing 01 data and can quickly calculate results by exclusive or (XOR) operations with less overhead.

On the premise of obtaining all file hash codes, the invention constructs a Hamming graph according to the Hamming distance before the file hash codes (as shown in 4). For a hamming graph, nodes in the graph are hash codes, edges in the graph represent that semantic relations exist between the hash codes, and the weight of an edge is the hamming distance between two hash codes. The size of the Hamming distance determines the similarity degree between data, and for semantic hashing of the images, a smaller weight value represents that the semantics of two images are more similar or related, otherwise, the two images may not be similar or related. The Hamming graph has the advantage that the calculation of the overhead during composition is converted from the calculation of floating point numbers to the calculation of Hamming distance under 01, so that the construction speed of the graph is greatly improved.

According to the characteristics of the hash codes, different images may be mapped to the same hash code, so that one node in the hamming graph corresponds to multiple files. In addition, when the number of images increases, the number of corresponding nodes also increases, resulting in the number of edges increasing in the number of square levels, so that the storage of the hamming map consumes a large amount of storage resources and calculation resources. Aiming at the two problems, the invention provides two Hamming diagram optimization methods to obtain an efficient and low-overhead metadata management method. The two optimization methods are node combination during hash collision and threshold-based hamming graph edge pruning and selection.

The invention is used as an organization method of metadata, and the structure of the metadata is closely combined with a storage system. The general framework of the invention is shown in figure 5. The invention comprises the whole workflow from file uploading to a storage system to semantic query execution and result return.

Since the present invention is directed to image files, it is first necessary to have a large number of image sources (such as images in the internet) to upload files to a storage system. The uploading process is different from the traditional storage method. The method comprises the steps of firstly inputting the uploaded image file into a trained DSTH network so as to obtain a semantic hash code corresponding to the image. Then, the invention uploads the semantic hash codes and the image files to the storage system in a one-to-one correspondence mode, and the hash codes are used as semantic metadata of the images. The upload portion of the storage system image file ends here.

After the file is uploaded, the semantic metadata needs to be managed, and different image files may correspond to the same hash value. The invention sets a hash table with separated links to link the files with the same semantic hash value by using the link table, and the hash table is used for organizing the files with the same semantic hash value; as a communication structure of the Hamming map and the storage system, the hash table (as shown in FIG. 5) can facilitate data information transmission at both ends when performing semantic query.

The hash table can judge whether the newly uploaded file semantic hash codes appear in the system, and if the newly uploaded file semantic hash codes do not appear in the system, the newly appearing hash codes are uploaded to the Hamming map, namely, the Hamming map does not store semantic metadata repeatedly. Hamming diagrams, which are the main part of semantic metadata management in fig. 5, are currently implemented using graph databases, because graph databases have high stability and extensibility, and there are query languages dedicated to graph data. The invention designs an interface for semantic range query by utilizing the characteristics.

In table 1, the present invention is compared to other metadata systems and queryable file system methods. The design method of the present invention is similar to these metadata systems in two ways. On one hand, they all extract the feature used for inquiring and use this feature as metadata; on the other hand, they all use data structure aggregation metadata to speed up the query process. The method used by the present invention is different from other metadata systems in the metadata generation and organization process.

Table 1 present invention in contrast to other metadata organizations

For these systems, both Spyglass and SmartStore use traditional metadata, the association of which is not related in content. FAST uses content-based metadata, but its PCA-SIFT approach is a shallow feature of images and is not suitable for semantic queries. Furthermore, FAST can only be used for precise queries, and cannot be applied to similarity queries. The deep hash (DSTH) method used by the invention satisfies semantic similarity query based on contents. In metadata organization, the essential requirement is to organize "semantically" similar content together, which is actually a clustering process. SmartStore mines the semantics of the metadata through latent semantic analysis, and then clusters the contents with similar semantics, but the semantic relation only uses the relation of simple metadata. The Spyglass is clustered by using a hierarchical division method, that is, contents with similar semantics are located in one sub-tree as much as possible through a tree structure. The FAST is to map the extracted PCA-SIFT features to hash values of LSH through locality sensitive hashing, and then can perform lookup through a bloom filter. The aggregation of the invention is to aggregate the semantic metadata with similar Hamming distance in a Hamming space, thereby being more suitable for the requirement of semantic query, and compared with a tree structure, the invention has more flexibility and high efficiency. For the organization of metadata, FAST resolves the hash collision problem by Cuckoo hashing and combines it with a bloom filter. SmartStore uses the R-tree of the spatial search domain to build semantically similar content in the same domain of the R-tree. Spyglass then enables fast metadata search through K-D trees. The organization mode of the invention is the Hamming map, and the mode enables the file to correspond to the node in the map, thereby realizing semantic complex query (range query) aiming at the image file.

The semantic extraction module is the basis of the whole semantic metadata management, and the management of the metadata and the realization of the subsequent semantic query function can be carried out only by extracting the semantic hash code (semantic metadata) through the semantic hash code. The design of the semantic extraction module is mainly divided into three steps: (1) and training the hash function network (2) to extract semantic hash codes (3) and upload and distribute semantic metadata.

In the hash tag generation stage, the DSTH directly uses the existing deep neural network (such as AlexNet, google net, VGG16, etc.) to extract the image features in the data set. Generally, the deep network extracts high-dimensional floating-point feature vectors. According to the dimension reduction method, the semantic hash codes of the images are obtained by binarizing the features. In the process of training the Hash function network, according to the Hash code obtained by DSTH in the Hash label generation stage, a simple network SimpleNet is designed to fit the Hash label. Meanwhile, according to the result of the code length analysis of the DSTH, the invention selects the 48-bit hash code in a unified way.

In the second step of extracting the semantic hash code, the image file passes through a trained neural network before being uploaded to a storage system. According to the batch processing characteristic of deep learning, the invention provides two uploading modes: batch uploading and individual uploading. The batch uploading refers to that some images are sent to the trained SimpleNet in batches, so that the semantic hash codes of a plurality of files are obtained simultaneously. Separate upload means that each image is sequentially uploaded to SimpleNet so that the neural network will output the results sequentially.

Bulk upload has the advantage of low overhead. On one hand, deep learning can more efficiently process data in batches; on the other hand, uploading a file alone may continuously start the network, thereby causing additional time overhead. Thus, bulk uploads may be preferred over individual uploads as a whole. The disadvantage of batch uploading is that the metadata distribution and file uploading work can be executed only after the hash codes of all files are completely generated, which brings higher delay to the storage system. The comparison of the file uploading time and the semantic metadata batch generation time of the storage system is tested later. In design, the present invention primarily uses a batch upload approach. Due to the characteristics of SimpleNet, the speed of generating the hash code is high, the highest performance can be guaranteed through batch uploading, and the uploading time of the storage system cannot be influenced.

In the third step of uploading and distributing the semantic metadata, the uploading sequence of the semantic hash codes and the image files is recorded, so that the semantic hash codes and the image files are ensured to be corresponding. After obtaining the semantic hash code of the file, the invention firstly stores the semantic hash code correspondingly into Extended Attributes (Extended Attributes) of the image file, where it is required to ensure that the file system supports the Extended Attributes (most file systems such as XFS, ext3, NTFS can support). Then when uploading the image file to the storage system, the Swift automatically converts the semantic hash code saved in the extended attribute of the file into one item of metadata attribute (called semantic metadata). And ending the work of the semantic extraction module. For different back-end storage systems, different metadata allocation methods can be used as long as semantic metadata and image files are guaranteed to correspond correctly.

The semantic query of the invention is realized by means of the Hamming graph, and because the nodes in the Hamming graph are hash codes and the edges represent Hamming distances, the Hamming graph does not store the information of files and does not know which files correspond to the hash codes. DSTH is a similarity hashing algorithm that maps similar images to the same or similar hash codes. Therefore, one hash code may correspond to a plurality of image files, and a data structure is required to manage such association, thereby enabling communication of metadata.

As shown in fig. 5, it can be seen that there is a separately linked hash table (hash table connected at both ends) connecting semantic metadata management, image features and storage systems in the present invention. The split-link hash table is a basic data structure of metadata communication, wherein a key value of the hash table is a 48-bit semantic hash code value existing in a storage system, and a linked list structure linked by each key value records an identifier of an image file having the semantic hash code as metadata. Since the absolute path of the storage system is unique, the absolute path is used as the file identifier, and the linked list structure is used for linking. The relationship between the two-end connection hash table and the storage system and the hamming graph is shown in fig. 6 below.

The hash table connected at both ends is empty when the system does not have any image file, and when the system uploads a file, the image hash code generated by the DSTH and the absolute path of the image file are simultaneously transmitted to the hash table (see the solid arrow in fig. 6). For each hash value, the hash table firstly uses the time of O (1) to search whether the hash value exists in all existing key values, if yes, the file with the hash value exists in the storage system is indicated, and the hash table adds the file identifier of the file to the tail of a linked list of the hash table. If the result of the hash table lookup is that the semantic hash value does not exist, it indicates that the file in which the hash code is not stored in the storage system is the first file, in this case, a key value is newly created in the hash table, the content of the key value is the semantic hash value, and then a linked list is created on the key value, and the first item of the linked list is the identifier of the file.

After the insertion of the linked list is finished, the newly appeared hash codes are transmitted into the Hamming graph, so that the Hamming graph is newly added with one node of the hash codes, and then the relation between the nodes and other nodes is calculated, and edges are further added. The invention optimizes the calculation and selection of edges of other nodes, and the optimization process is shown in the following. After the hamming graph edge is selected, the uploading process of the whole file is finished. If the hash code of the uploaded file already exists in the two-end connection hash table, the hamming graph is shown to store the node before the uploading process. In this case, the hamming map is not updated. In the case of large-scale data, the situation that the hash values are the same is increased, and the method can improve the updating efficiency of the Hamming map. To summarize, a flowchart of the upload process is shown in FIG. 4 below.

In the process of semantic query, the invention firstly obtains the semantic hash value of the query point file and then sends the hash value to the Hamming map. Returning the hash codes meeting the conditions through the query in the graph, further sending the hash codes to the hash tables (see a dotted arrow in figure 6) connected at two ends, and obtaining the file names meeting the conditions by the hash tables according to the linked lists corresponding to the hash codes, so that the file names can be obtainedTo return the results of the semantic query through the storage system. For a 48-bit hash code of DSTH, the hash code exists at most 2⁴⁸In this case, hash collisions do not occur frequently. For each key in the hash table, the length of the linked list can be kept short, so that the time for querying the linked list is reduced.

The invention discloses two optimization methods of a data structure Hamming graph for semantic metadata management, namely combination of Hash collision and selection of a Hamming graph edge based on a threshold value. The two methods can greatly reduce the storage overhead of the Hamming map of the invention, and can remarkably reduce the storage overhead and the time for calculation under large-scale data.

Handling of hash collisions

It has been mentioned above that the DSTH algorithm may suffer from hash collisions, i.e. non-identical images generate identical hash codes. For this case, the present invention only keeps one node corresponding to the hash value in the hamming graph, and links the same hash code file using the hash table connected at both ends. For DSTH, if the semantic hash codes generated by different images are completely the same, it is indicated that the semantics of the two images are extremely similar (the corresponding hamming distance is 0). As shown in fig. 8 below, the same semantic hash code is generated after the upper half of the image passes through the neural network, and the identical hash code is generated for the lower half of the image. In this case, only one hash code node is generated for the image of the upper half, and then connected with the hash code node generated for the lower half by an edge, so that the relationship between 8 images is represented by only 2 nodes.

By the Hash code combination optimization method, the most semantically similar images can be identified in semantic query, and the storage cost and the calculation cost of the graph can be remarkably reduced. The same hash code file identifications are linked through the Hamming graph, and only one hash code node can be stored, so that all the hash nodes are unique and cannot be stored repeatedly. If the hamming graph takes the file as a node, a large number of redundant nodes and edges result, and using the graph to find the relationship according to the hamming distance causes huge overhead. The invention can ensure that data can not be stored repeatedly by taking the hash code as the node, thereby ensuring the high efficiency of the Hamming graph.

Threshold-based selection of hamming graph edges

The hamming graph introduced above has a serious performance bottleneck, i.e. if the nodes represented by the hash code and other hash code nodes are in a fully connected state, many edges are generated. The storage of these edges requires a lot of overhead and many edges are not necessarily stored. If the hamming distance of two hash codes is too long, then the semantic association between them is low, so they do not need to be connected with edges. This section proposes threshold-based selection of hamming graph edges such that hamming graphs store only the edges with the most semantic relevance.

If N images exist in the storage system, corresponding N binary hash codes can be generated through a depth hash model DSTH, because the storage system possibly has a plurality of same images and the DSTH can map similar images into the same hash codes, and the like, different binary hash codes are defined as N_d(≤N，∈Z⁺) And (4) respectively. N is a radical of_dIs the number of nodes in the hamming graph. Since the hamming distance between nodes is the weight of the edge between two nodes and the relationship between two nodes is also mutual, the edges of the hamming graph are non-directional. For a Hamming graph G in a memory system, it has N_dA node, then the maximum correlation coefficient it can possess is

The maximum number of edges and the number of nodes are squared. For example, if there are only 10,000 different hash codes, the number of edges between them in the full connection case will reach 49,995,000, and the storage of so many edges and weights will bring about a large storage and computation overhead.

For such a situation, the present invention proposes a threshold-based method to reduce the number of edges, and sets a threshold T (which is the minimum value used in the above step (4-2)) for each node of the hamming graph to restrict the number of edges, and sets N_d*Is the first node in graph G, H (N)_d*) Is N_d*The semantic hash value of (1). Then j satisfies (0 ≦ i, j < N) for any integer i, j_d)，N_diAnd N_djThe Hamming distance between can be represented by hd (N)_di，N_dj) To indicate. Sign for XOR operation

Then hd (H (N) can be used_di)，

To calculate the hamming distance. Setting the threshold T has the effect of N_diAnd N_djThe connecting edges between the two parts need to meet the following requirements:

hd(H(N_di)，H(N_dj))≤T

the choice of threshold T is a trade-off between information throughput and efficiency. When the value of T is larger, the more edges can be kept in the graph, so that the stored information can be ensured to be enough, and the query operation can be executed more. At the same time, there is a cost of many edges being stored, most of which are either irrelevant or semantically weak. If the value of T is small, the edge with the strongest semantic similarity can be retained as much as possible, but information is lost. Because the semantic query is mainly to query the files of the hash codes with the strongest semantic relationship, the invention selects and reserves the T value as small as possible. For the scene of this hamming space, the minimum distance that can be taken is 1, because hash values with a distance of 0 will be merged. For 48-bit hash codes, a hamming distance of 1 can indicate that the file semantics corresponding to the two hash codes are very similar, but in an experiment, it is found that if T is taken as 1, a large number of isolated nodes occur, so that the hamming graph cannot be subjected to semantic query, because many nodes cannot find other nodes with a hamming distance of 1 as edge connections, the hamming graph fails.

Based on this, the invention sets the threshold value T to change with different nodes. For one to possess N_dHamming graph G of individual node, set T_iThreshold for the ith node, then T_iIs taken as N_diThe minimum value of the Hamming distance between the node and all other nodes in G can ensure that the node N_diAt least one node is connected, the condition of isolated nodes is not caused, the nodes with the most similar semanteme are ensured to be connected, and the number of edges can be effectively reduced. So that for the integers i and j,

the values of (a) can be defined as:

T_i＝min{hd(H(N_di，N_dj))}，j≠i，j∈[0，N_d)

for example, as shown in fig. 6, node 6 is a node newly added to the hamming graph, which needs to find an edge that can be connected, so it will find the hamming distance from all

other nodes

1, 2, 3, 4, 5, and then find the minimum hamming distance value. The minimum Hamming distance in FIG. 6 is 2, so T₆Is 2, it connects all nodes with hamming distance of 2 (i.e. node 1 in the figure). When the threshold value of each node is determined and the edge connecting process is finished, the whole semantic metadata uploading process is finished.

The main function of the invention is to realize the query with similar semantic relation, and the semantic query refers to the semantic query based on the content, namely to find the image with similar content semanteme with the specific file. The semantic query of the invention is a query for files already stored in the system, and the invention refers to the specific file as a query point file. For each query point file, there is a corresponding semantic hash code (semantic metadata), and a specific node, called a query point of semantic query, can be found in a hamming graph according to the semantic hash code, as shown in fig. 7.

After receiving the semantic query request, the invention can take out the semantic metadata of the requested query point file and send the semantic metadata to the Hamming map. The hamming graph quickly locates the query point in the graph, then starts the graph traversal and search to get the most similar hash code values and returns the results. As in fig. 7, the hash code of the query request is sent to the query point in the hamming graph, and then the present invention starts to query hierarchically. Firstly, the most similar files are the files which are the same as the hash codes of the query points, and then the method finally finds all the hash code values meeting the requirements through hierarchical progressive searching (the first layer, the second layer and the third layer are sequentially progressive). And taking out the query result through a linked list of the hash codes in the hash table, wherein the nodes in the Hamming graph correspond to the hash key values in the hash table, and the two structures can not store the redundant hash codes repeatedly. The query Search process of the hamming graph can be understood as a Breadth-First-Search algorithm (BFS) of a graph structure. The concept and method of semantic scope query will be described below, and how to improve the accuracy of query according to the features of the DSTH semantic hash algorithm.

Semantic scope query

Semantic scope queries refer to finding all documents that have semantic relevance to the query point document within a particular scope. In the invention, the semantics are expressed by the hash codes, and the correlation relationship between the semantics is determined by the Hamming distance between the hash codes. A smaller hamming distance indicates a stronger semantic relevance, so a semantic range query may be defined in the present invention as querying all files for which the hash code hamming distance from a particular query point file is less than or equal to a particular value γ. This set value γ is the Query Range (Query Range) of the semantic Range Query.

It is known that in the DSTH algorithm, a smaller hamming distance indicates that the files represented by the hash code are semantically more similar in content, and then the most similar images are the same hash code with a hamming distance of 0. Through a large number of experiments, in the process of using the 48-bit hash code, when the hamming distance is greater than 2, the semantic similarity of the hash code is weakened, so that more wrong results are returned. So in experiments, taking a smaller query scope (especially in the case of large-scale data sets) may return more accurate query results.

The process, which can be represented as a semantic query in FIG. 7, starts at a query point in the Hamming map for a semantic range query hd ≦ γ. The invention firstly obtains the hash code of the semantic query point file, then takes out all the files which are completely same as the query point hash code from the linked list of the hash table, because the semantic range query can retrieve all the files with the hash code of 0, the file result with hd of 0 can be returned through the step.

When gamma is larger than or equal to 1, the semantic range query process not only needs query points, but also needs traversal and search of the Hamming map so as to obtain all hash codes meeting the conditions, and files are obtained through a linked list of the hash table. In this case, the hash code of the query point will be the starting point of the graph traversal, and then the BFS is used to find other results that satisfy the query condition. Firstly, all nodes (as shown in fig. 7) of the first layer of the BFS are obtained, then whether the node meets the query condition is checked, and the hash code node meeting the condition returns a file through a linked list of a hash table. If gamma is larger than or equal to 2, more layers need to be traversed, the premise is that nodes of the previous layer meet the condition that the Hamming distance between the nodes and the query point is smaller than gamma, then the nodes are traversed, otherwise, the nodes do not meet the condition of traversing. The termination condition is that when the level of BFS search is the same as the range γ, the reason is that according to the construction method of hamming graph, the hash code satisfying the condition will be within the same number of levels of nodes.

Extensibility analysis

The invention analyzes the expandability of the invention from two dimensions of data volume increase and node expansion. From the view of data volume increase, according to experimental results, when the data volume is increased (from a CIFAR-10 dataset to ImageNet), the semantic query time overhead of the method is still stable, because the method only searches nodes near a query point in the query process, when the data volume is increased, the method can not be limited by the data scale, keeps searching adjacent nodes, and has stable time overhead. In addition, the hash table with separated links is also the key for ensuring the expandability when the data volume is increased, because the repetition of semantic hash is increased under the condition of data increase, the hash table can save the storage space of a Hamming diagram, so that the same hash codes are organized together, and file identification information is reserved for facilitating query.

From the point of view of node expansion, the hamming graph of the invention currently runs on a single-point graph database, that is, all semantic queries run on one machine, and the reason for this is that the Neo4j graph database is currently only suitable for a single-machine mode. In consideration of the availability and fault tolerance of the system, when the system can distribute the graph to a plurality of machines, the access speed of the user is faster, the semantic query service is more efficient, and when the server fails, the distributed graph structure can still effectively provide the service, which is also the future improvement direction of the invention. The invention can support distributed realization, two servers are used in an experiment to construct a cluster of OpenStack Swift, but because of the characteristic of a graph database, the semantic metadata organization structure still runs on a single server. In future work, the system can be divided into graphs so that the graphs are stored on a plurality of nodes in a distributed mode, distributed metadata management is achieved, and the problem of single point of failure is solved.

In summary, the present invention has the following advantages:

(1) the invention uses the hash of the image content as the metadata of the semantic of the storage system, thereby recording the content semantic of the image file in the storage system and organizing the content semantic of the image by using a more efficient Hamming diagram structure. Not only can semantic similarity between the image and other files be calculated quickly, but also a query interface can be provided to facilitate query by a user. Meanwhile, the result can be quickly found and returned through the index of the graph structure, and a more efficient and intelligent interface and service are provided for the storage system;

(2) according to the method, the deep self-learning Hash DSTH is used as a core algorithm for generating the content semantic Hash code according to a specific application scene of a storage system. DSTH fully uses the feature extraction capability of deep learning, and can effectively inquire and search the content semantics aiming at the image through the existing network and the training result on the large-scale data set. Compared with the traditional method based on locality sensitive hashing, DSTH can extract information with similar semantics, not just information with similar pixels, and therefore the method is suitable for semantic query. The implementation of DSTH is a self-learning way, i.e. DSTH does not require tagging for new image data sets. For scenes storing massive heterogeneous data of the system, the system is very difficult to obtain the classification labels of the images. The DSTH does not require additional information (tags) to store image data in the system. Compared with a supervised hash algorithm depending on classification labels, the self-learning mode of the DSTH has more practicability and universality. DSTH trains and obtains the hash code by using a fast and efficient network SimpleNet, and the advantage of doing so is that the efficiency of training the hash model can be improved, and the generation process of the hash code can be effectively accelerated. For the scene of uploading files and metadata by a storage system, the uploading process must be low in cost and high in speed, otherwise the performance of the whole system is seriously affected. The high efficiency of DSTH ensures that the generation time of the hash code is short, thereby ensuring the high performance in the metadata uploading process and being beneficial to the completion of the metadata distribution and storage process.

(3) The present invention organizes the data structure of semantic metadata using the graph structure of a hamming graph. Conventional directory tree structures manage all files through a tree structure, resulting in semantically similar files being likely under different directories. When semantic query is performed by using a tree structure, all file directories are often required to be traversed to compare semantic similarity, so that a large calculation overhead is brought. For the present invention, the hierarchical relationship of the directory is not affected, but the file metadata represented by the leaf node at the bottom layer is organized in a graph manner again. For the Hamming map, nodes with similar semantics are gathered to adjacent areas in the map, so that similar files can be searched in a simple map searching mode without traversing metadata of all files, and the time overhead required by query is obviously reduced.

(4) According to the invention, through the optimization method of hash code combination, the most semantically similar images can be identified in semantic query, and the storage cost and the calculation cost of the graph can be remarkably reduced. The same hash code file identifications are linked through the Hamming graph, and only one hash code node can be stored, so that all the hash nodes are unique and cannot be stored repeatedly. If the hamming graph takes the file as a node, a large number of redundant nodes and edges result, and using the graph to find the relationship according to the hamming distance causes huge overhead. The invention can ensure that data can not be stored repeatedly by taking the hash code as the node, thereby ensuring the high efficiency of the Hamming graph.

(5) The invention reduces the number of edges based on a threshold method, sets a threshold T for each node of the Hamming graph to restrict the number of edges, and when the value of T is larger, the invention shows that more edges can be kept in the graph, thus ensuring enough stored information and more executed query operations. At the same time, there is a cost of many edges being stored, most of which are either irrelevant or semantically weak. If the value of T is small, the edge with the strongest semantic similarity can be retained as much as possible, but information is lost. Because the semantic query is mainly to query the files of the hash codes with the strongest semantic relationship, the invention selects and reserves the T value as small as possible. Therefore, the large storage and calculation cost brought by the storage of the edges and the weights can be greatly reduced.

It will be understood by those skilled in the art that the foregoing is only a preferred embodiment of the present invention, and is not intended to limit the invention, and that any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims

1. An image query method based on content semantic metadata is characterized by comprising the following steps:

(4) constructing a Hamming map according to all hash values in the separated link hash table used in the step (3); the step (4) specifically comprises the following substeps:

(4-1) setting a counter i to 1;

2. The image query method of claim 1, wherein the learning-based hash algorithm is a deep self-learning hash algorithm and the image dataset is CIFAR-10, STL-10, or ImageNet.

3. The image query method according to claim 1 or 2, wherein the neural network used in step (2) is a SimpleNet neural network.

4. The image query method according to claim 3, wherein the iterative training is performed by first setting initial parameters for the neural network, inputting the image set into the neural network to obtain an output result, comparing the output result with the hash tag obtained in step (1), then updating the initial parameters of the neural network through a back propagation algorithm, and then iteratively repeating the above processes until the error between the output result and the hash tag reaches a preset threshold.

5. The image query method according to claim 1, wherein the step (6) of obtaining all other nodes and their corresponding hash values according to the searched node and by combining with the hamming graph, and determining the images corresponding to the obtained hash values corresponding to all other nodes as the final query result according to the separate linked hash table comprises the following substeps:

(6-1) taking the searched node as a current node;

(6-5) putting the jth node into the query result set;

(6-7) setting a counter j ═ j +1, and returning to the step (6-3);

6. An image query system based on content semantic metadata, comprising:

the fourth module is used for constructing a Hamming map according to all hash values in the separation link hash table used by the third module; the fourth module specifically includes:

a first submodule for setting a counter i to 1;

the second submodule is used for taking out the ith hash value in the separate link hash table, establishing a node corresponding to the ith hash value in the Hamming map, calculating the Hamming distance between the ith hash value and each hash value taken out before, selecting the minimum value from a plurality of calculated Hamming distances, taking all the Hash values corresponding to the minimum value as nodes connected with the node corresponding to the ith hash value in the Hamming map, and taking the Hamming distance between each Hash value corresponding to the minimum value and the ith hash value as the weight of a connecting edge between two nodes in the Hamming map;

the third sub-module is used for judging whether i is the last item of the hash value in the separation link hash table, if so, the process is ended, otherwise, i is set to be i +1, and the process returns to the second sub-module;