CN103218404B - A kind of multi-dimensional metadata management method based on associate feature and system - Google Patents

A kind of multi-dimensional metadata management method based on associate feature and system Download PDF

Info

Publication number
CN103218404B
CN103218404B CN201310090042.4A CN201310090042A CN103218404B CN 103218404 B CN103218404 B CN 103218404B CN 201310090042 A CN201310090042 A CN 201310090042A CN 103218404 B CN103218404 B CN 103218404B
Authority
CN
China
Prior art keywords
metadata
meta data
data server
collection
query
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201310090042.4A
Other languages
Chinese (zh)
Other versions
CN103218404A (en
Inventor
华宇
黄大彰
冯丹
刘进军
聂振华
蔡娟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huazhong University of Science and Technology
Original Assignee
Huazhong University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huazhong University of Science and Technology filed Critical Huazhong University of Science and Technology
Priority to CN201310090042.4A priority Critical patent/CN103218404B/en
Publication of CN103218404A publication Critical patent/CN103218404A/en
Application granted granted Critical
Publication of CN103218404B publication Critical patent/CN103218404B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a kind of multi-dimensional metadata management method based on associate feature, comprise: in metadata server cluster, metadata on every platform meta data server is divided according to associate feature, with generator data acquisition and set statistics file, according to set statistics file, division operation is carried out to metadata cluster, to generate the grouping of multiple meta data server and packet configuration file; According to set statistics file, local index table is set up respectively on every platform meta data server, according to packet configuration file and set statistics file, group index table is set up respectively in each meta data server grouping, the top layer concordance list of metadata server cluster is set up according to group index table, receive the inquiry request from user, and inquire about top layer concordance list, group index table and local index table successively according to inquiry request.The present invention can utilize the associate feature between the multidimensional property of metadata fully, meets complex query demand, and is with good expansibility.

Description

A kind of multi-dimensional metadata management method based on associate feature and system
Technical field
The invention belongs to computer data field of storage, more specifically, relate to a kind of multi-dimensional metadata management method based on associate feature and system.
Background technology
Along with cloud computing, cloud store the arrival in epoch, the geometric series formula of the data scale in information storage system increases problems such as making the efficient storage to data, management and inquiry and also becomes more and more difficult.The difficulty that the continuous growth of mass data scale result in data storage and maintenance constantly increases, and research shows, in reality, the file data of mass storage system (MSS) has significant linked character.Associate feature refers to the clustering phenomena that file has in its attribute space, and it embodies the correlativity between file in essence.Under normal circumstances, we often use temporal associativity between file and spatial correlation, temporal associativity shows that time close file can be accessed immediately within a period of time, and spatial correlation shows that the file being positioned at adjacent position has very large possibility and accessed by subsequent requests.Except temporal associativity and spatial correlation, numerous relevances is also had to be embodied between file and file, the such as access frequency of file size, file, the founder etc. of file.But existing achievement in research obviously lacks the research to file relevance on more attributes.Consider the relevance on more attributes, contribute to the correlativity distinguished more exactly between file, based on the distance metric in multidimensional property space, what the correlativity between two files can be clear and definite calculates.In the face of mass data processing, use the relevance that certain method is come between metric data, and thus Data Placement is become the space of multiple gathering, obvious benefit will be brought for subsequent processes.
But there is following problem in existing metadata management method:
(1) associate feature between the multidimensional property not making full use of metadata, shows that existing method often only make use of time attribute and the space attribute of metadata, excavates the associate feature between metadata fully.
(2) effectively can not support complicated inquiry request, for the inquiry request relating to metadata multidimensional property, as range query, TopK inquiry etc., existing method can not process effectively;
(3) poor expandability, when metadata number becomes many along with the expansion of system, existing methodical query responding time will significantly increase.
Summary of the invention
For the defect of prior art, the object of the present invention is to provide a kind of multi-dimensional metadata management method based on associate feature, be intended to solve the metadata management problem in mass storage system (MSS), it can utilize the associate feature between the multidimensional property of metadata fully, meet complex query demand, and be with good expansibility.
For achieving the above object, the invention provides a kind of multi-dimensional metadata management method based on associate feature, comprise the following steps:
(1) in metadata server cluster, the metadata on every platform meta data server is divided according to associate feature, with generator data acquisition and set statistics file;
(2) according to set statistics file, division operation is carried out to metadata cluster, to generate the grouping of multiple meta data server and packet configuration file;
(3) according to set statistics file, on every platform meta data server, local index table is set up respectively; Local index table is for managing the collection of metadata on every platform meta data server, and in concordance list, each have recorded the collection of metadata numbering in set statistics file, and the metadata set of this collection of metadata numbering correspondence is combined in the memory address in disk;
(4) according to packet configuration file and set statistics file, in each meta data server grouping, group index table is set up respectively;
(5) according to group index table, the top layer concordance list of metadata server cluster is set up;
(6) receive the inquiry request from user, and inquire about top layer concordance list, group index table and local index table successively according to inquiry request, and return Query Result; Wherein user's inquiry request comprises an inquiry, range query and TopK inquiry.
Step (1) comprises following sub-step:
(1-1) multidimensional property representing associate feature between metadata on every platform meta data server is determined;
(1-2) multidimensional property of metadata is configured to the input vector of regular length, this input vector is as the input value of position sensitive hash function;
(1-3) use identical position sensitive hash function to carry out Hash calculation to input vector, the cryptographic hash obtained is as the unique identification of metadata corresponding to this input vector;
(1-4) metadata with identical cryptographic hash is divided in same collection of metadata, and using this cryptographic hash as the numbering of this collection of metadata;
(1-5) dividing condition of metadata in collection of metadata is added up, to generate set statistics file; This set statistics file comprises collection of metadata numbering, metadata number, each dimension attribute mean value, each dimension attribute scope, and wherein collection of metadata Serial Number Range is 1,2,3 ..., N, N represent the length of Hash table in position sensitive hash function.
Step (2) is specially, every platform meta data server builds a bit vector, the length of this bit vector is identical with the Hash table length that position sensitive hash function in step (1) uses, thereafter, hierarchical clustering algorithm is utilized to carry out cluster operation between meta data server according to the bit vector of all meta data servers Hamming distances between any two, to obtain the grouping of meta data server, the grouping number formed when cluster reaches lower limit, or the distance between grouping reaches the upper limit, then stop cluster operation, thus obtain multiple meta data server group, and result is kept in packet configuration file.
Step (4) is specially, for each grouping in packet configuration file, build corresponding group index table respectively, each in group index table records the information of collection of metadata on all meta data servers in this grouping, comprises collection of metadata numbering, the IP address of collection of metadata place meta data server, metadata number, each dimension attribute mean value, each dimension attribute scope.
Point query manipulation in step (6) specifically comprises the following steps:
(6-1-1) acceptance point inquiry request, determines the multidimensional property of the metadata that this inquiry request is corresponding, and utilizes position sensitive hash function to calculate the cryptographic hash of multidimensional property, and this cryptographic hash is the numbering of the collection of metadata needing inquiry;
(6-1-2) list item that query metadata set number is corresponding in top layer concordance list, to obtain the IP address of this metadata place meta data server grouping;
(6-1-3) corresponding meta data server is determined according to the IP address of meta data server grouping, and the list item that query metadata set number is corresponding in the group index table of this meta data server, to obtain the IP address of this metadata place meta data server;
(6-1-4) according to the IP address of meta data server found, the list item that query metadata set number is corresponding in the group index table of this meta data server, is combined in memory address in disk to obtain this metadata place metadata set;
(6-1-5) be combined in the memory address in disk according to the metadata set found, the collection of metadata that inquiry is corresponding, and return Query Result.
Range query operation in step (6) specifically comprises the following steps:
(6-2-1) range of receiving inquiry request, determine multidimensional property scope to be checked, calculate the median of each range of attributes, and construct input vector by the median of each range of attributes, utilize position sensitive hash function to calculate the cryptographic hash of input vector, this cryptographic hash is the numbering of the collection of metadata needing inquiry;
(6-2-2) list item that query metadata set number is corresponding in top layer concordance list, compares the multidimensional property scope of preserving in the multidimensional property scope in inquiry request and list item, if two scopes are non-intersect, directly returns results as sky; If two scopes intersect, obtain the IP address of the meta data server grouping comprising metadata to be checked;
(6-2-3) corresponding meta data server is determined according to the IP address of meta data server grouping, and the list item that query metadata set number is corresponding in the group index table of this meta data server, to obtain the IP address of this metadata place meta data server;
(6-2-4) according to the IP address of meta data server found, the list item that query metadata set number is corresponding in the group index table of this meta data server, is combined in memory address in disk to obtain this metadata place metadata set;
(6-2-5) be combined in the memory address in disk according to the metadata set found, the collection of metadata that inquiry is corresponding, and return all metadata meeting multidimensional property scope in inquiry request.
TopK query manipulation in step (6) specifically comprises the following steps:
(6-3-1) receive TopK inquiry request, determine multidimensional property and the K value of the metadata that this inquiry request is corresponding, and utilize position sensitive hash function to calculate the cryptographic hash of multidimensional property, this cryptographic hash is the numbering of the collection of metadata needing inquiry; Wherein K represents the quantity of the metadata the most close with the metadata multidimensional property in inquiry request;
(6-3-2) list item that query metadata set number is corresponding in top layer concordance list, if the metadata number recorded in list item is less than K value, then also include the list item of this list item the right and left in query context, until metadata number sum is more than or equal to K value in list item, finally obtain the IP address of multiple meta data server grouping;
(6-3-3) corresponding meta data server is determined according to the IP address of meta data server grouping, and the list item that query metadata set number is corresponding in the group index table of this meta data server, to obtain the IP address of this metadata place meta data server;
(6-3-4) according to the IP address of meta data server found, the list item that query metadata set number is corresponding in the group index table of this meta data server, is combined in memory address in disk to obtain this metadata place metadata set;
(6-3-5) be combined in the memory address in disk according to the metadata set found, the collection of metadata that inquiry is corresponding, and return the K bar metadata the most close with the multidimensional property of the metadata in inquiry request.
By the above technical scheme that the present invention conceives, compared with prior art, this method has following beneficial effect:
Associate feature between the multidimensional property 1, taking full advantage of metadata.Owing to have employed step (1) and (2), metadata has been divided into multiple collection of metadata according to the associate feature between its multidimensional property, the metadata with identical or close multidimensional property has been divided among identical collection of metadata, thus effectively can manage all metadata in units of collection of metadata.
2, complicated inquiry request is effectively supported, as range query, TopK inquiry.Owing to have employed step (3), step (4) and step (5), for often kind of inquiry request, all top layer concordance list, group index table and local index table will be inquired about successively, and the most at last among Query Location to collection of metadata.There is in query metadata set the metadata of identical or close multidimensional property, can be quick, return results exactly.
3, the requirement of extensibility is met.Manage because metadata has been divided into multiple collection of metadata according to associate feature, system metadata number increase the slow growth that only can cause collection of metadata rapidly, thus ensure that effect and the efficiency of metadata management.
Another object of the present invention is to provide a kind of multidimensional meta data management system based on associate feature, be intended to solve the metadata management problem in mass storage system (MSS), it can utilize the associate feature between the multidimensional property of metadata fully, meet complex query demand, and be with good expansibility.
For achieving the above object, the invention provides a kind of multidimensional meta data management system based on associate feature, comprise collection of metadata generation module, meta data server grouping module, local index generation module, group index generation module, top layer index generation module, enquiry module, collection of metadata generation module divides according to associate feature the metadata on every platform meta data server, with generator data acquisition and set statistics file, meta data server grouping module is according to set statistics file, division operation is carried out to metadata cluster, to generate the grouping of multiple meta data server and packet configuration file, local index generation module, according to set statistics file, local index table is set up respectively on every platform meta data server, group index generation module is according to packet configuration file and set statistics file, group index table is set up respectively in each meta data server grouping, top layer index generation module is according to group index table, set up the top layer concordance list of metadata server cluster, enquiry module receives the inquiry request from user, and inquire about top layer concordance list successively according to inquiry request, group index table and local index table, and return Query Result.
Collection of metadata generation module comprises multidimensional property determination module, input vector constructing module, hash function computing module, associate feature divides module and output module, the multidimensional property of associate feature between multidimensional property determination module determination representation element data, the multidimensional property of metadata is configured to the input vector of regular length by input vector constructing module, this input vector is as the input value of position sensitive hash function, hash function computing module uses identical position sensitive hash function to carry out Hash calculation to input vector, the cryptographic hash obtained is as the unique identification of metadata corresponding to this input vector, associate feature divides module and is divided in same collection of metadata by the metadata with identical cryptographic hash, and using this cryptographic hash as the numbering of this collection of metadata, the dividing condition of metadata in output module statistics collection of metadata, to generate set statistics file.
Enquiry module specifically comprises an inquiry submodule, range query submodule, TopK inquires about submodule, the point inquiry request of some inquiry submodule process user, the multidimensional property of certain metadata given, Query Result returns the specifying information of metadata, the range query request of range query submodule process user, the scope of given multidimensional property, Query Result returns in whole system all metadata informations meeting scope, TopK inquires about the TopK inquiry request of submodule process user, given one group of multidimensional property, and refer to defining K value, Query Result returns K bar data the most close with given multidimensional property in whole system.
By the above technical scheme that the present invention conceives, compared with prior art, native system has following beneficial effect:
Associate feature between the multidimensional property 1, taking full advantage of metadata.Owing to have employed collection of metadata generation module, meta data server grouping module, metadata has been divided into multiple collection of metadata according to the associate feature between its multidimensional property, the metadata with identical or close multidimensional property has been divided among identical collection of metadata, thus effectively can manage all metadata in units of collection of metadata.
2, complicated inquiry request is effectively supported, as range query, TopK inquiry.Owing to have employed local index generation module, group index generation module, top layer index generation module, for often kind of inquiry request, all top layer concordance list, group index table and local index table will be inquired about successively, and the most at last among Query Location to collection of metadata.There is in query metadata set the metadata of identical or close multidimensional property, can be quick, return results exactly.
3, the requirement of extensibility is met.Manage because metadata has been divided into multiple collection of metadata according to associate feature, system metadata number increase the slow growth that only can cause collection of metadata rapidly, thus ensure that effect and the efficiency of metadata management.
Accompanying drawing explanation
Fig. 1 is the process flow diagram of the multi-dimensional metadata management method that the present invention is based on associate feature.
Fig. 2 is the module diagram of the multidimensional meta data management system that the present invention is based on associate feature.
Fig. 3 is the refinement process flow diagram of step (1) in the inventive method.
Fig. 4 is the schematic block diagram of collection of metadata generation module of the present invention.
Fig. 5 is the schematic diagram of query script of the present invention.
Fig. 6 is the matters block diagram of enquiry module in present system.
Fig. 7 is the process flow diagram of some query script of the present invention.
Fig. 8 is the process flow diagram of scope of the invention query script.
Fig. 9 is the process flow diagram of TopK query script of the present invention.
Embodiment
In order to make object of the present invention, technical scheme and advantage clearly understand, below in conjunction with drawings and Examples, the present invention is further elaborated.Should be appreciated that specific embodiment described herein only in order to explain the present invention, be not intended to limit the present invention.
The present invention is multi-dimensional metadata management method based on associate feature and system, the method utilizes the associate feature between multidimensional meta data, division operation is carried out to metadata, the metadata with same or similar multidimensional property is divided among same collection of metadata, the various complex query operations for metadata can be met.
As shown in Figure 1, the multi-dimensional metadata management method that the present invention is based on associate feature comprises the following steps:
(1) in metadata server cluster, the metadata on every platform meta data server is divided according to associate feature, with generator data acquisition and set statistics file;
As shown in Figure 3, this step comprises following sub-step,
(1-1) multidimensional property representing associate feature between metadata on every platform meta data server is determined; In the present embodiment, multidimensional property comprises file size, file creation time, filemodetime, file access number of times etc.;
(1-2) multidimensional property of metadata is configured to the input vector of regular length, this input vector is as the input value of position sensitive hash function; Specifically, the size of regular length equals the dimension of metadata; The input vector be configured to is represented as (file size, file creation time, filemodetime, file access number of times ...);
(1-3) use identical position sensitive hash function to carry out Hash calculation to input vector, the cryptographic hash obtained is as the unique identification of metadata corresponding to this input vector;
(1-4) metadata with identical cryptographic hash is divided in same collection of metadata, and using this cryptographic hash as the numbering of this collection of metadata;
(1-5) dividing condition of metadata in collection of metadata is added up, to generate set statistics file; This set statistics file comprises collection of metadata numbering, metadata number, each dimension attribute mean value, each dimension attribute scope, and wherein collection of metadata Serial Number Range is 1,2,3 ..., N, wherein N represents the length of Hash table in position sensitive hash function.
Carry out there is multiple method based on the grouping of associate feature to metadata, the present invention have selected position sensitive Hash.Multidimensional data can be mapped among the one-dimensional space by position sensitive Hash, and keep the constant of spatial relationship between multidimensional data, namely adjacent data are remaining adjacent after Hash originally simultaneously.For every bar metadata, obtain cryptographic hash by position sensitive Hash calculation, the metadata with identical cryptographic hash is gathered in same collection of metadata, just can reach the object that metadata is divided into groups.
The advantage of this step is: the associate feature between the multidimensional property utilizing metadata, all metadata are divided into multiple collection of metadata, and the metadata comprised in same collection of metadata has same or analogous multidimensional property.Like this, the query manipulation for metadata just can be limited in corresponding collection of metadata and carry out, thus significantly improves search efficiency.
(2) according to set statistics file, division operation is carried out to metadata cluster, to generate the grouping of multiple meta data server and packet configuration file;
Specifically, every platform meta data server builds a bit vector, and the length of this bit vector is identical with the Hash table length that position sensitive hash function in step (1) uses, and whether the collection of metadata that first position of bit vector is numbered 1 corresponding to collection of metadata exists, if exist, then this first position is 1, otherwise is 0 ... whether the collection of metadata that N number of position of bit vector is numbered N corresponding to collection of metadata exists, if exist, then this N number of position is 1, otherwise is 0; Thereafter, hierarchical clustering algorithm is utilized to carry out cluster operation between meta data server according to the bit vector Hamming distances between any two (Hammingdistance) of all meta data servers, to obtain the grouping of meta data server, the grouping number formed when cluster reaches lower limit, or the distance between grouping reaches the upper limit, then stop cluster operation, thus obtain multiple meta data server group, and result is kept in packet configuration file; In the present embodiment, the value of the upper limit equals the half of bit vector length, and the value of lower limit is 1.
For example, if there are A, B, C, D and E totally 5 meta data servers, correspond respectively to 5 bit vectors, first these 5 bit vectors Hamming distances is between any two calculated, then the meta data server (such as A and B) selecting two the shortest bit vectors of Hamming distances corresponding forms cluster F, be the grouping of meta data server with each cluster generated, then between F, C, D and E, iteration operation is carried out, once the grouping number formed reaches lower limit, or the distance between grouping reaches the upper limit, then stop cluster operation.
The advantage of this step is: meta data server is divided into the grouping of multiple meta data server, effectively inquiry request can be dispersed among the grouping of multiple meta data server, thus avoid occurring system bottleneck.
(3) according to set statistics file, on every platform meta data server, local index table is set up respectively; Local index table is for managing the collection of metadata on every platform meta data server, and in concordance list, each have recorded the collection of metadata numbering in set statistics file, and the metadata set of this collection of metadata numbering correspondence is combined in the memory address in disk.
(4) according to packet configuration file and set statistics file, in each meta data server grouping, group index table is set up respectively; Specifically, for each grouping in packet configuration file, build corresponding group index table respectively, each in group index table records the information of collection of metadata on all meta data servers in this grouping, comprises collection of metadata numbering, the IP address of collection of metadata place meta data server, metadata number, each dimension attribute mean value, each dimension attribute scope.
For the group index table set up, in concrete enforcement, can leave on arbitrary meta data server in meta data server grouping, this meta data server we be referred to as group leader's meta data server of meta data server grouping; When considering data redundancy, also group index table can be left on the multiple stage meta data server in meta data server grouping.
(5) according to group index table, the top layer concordance list of metadata server cluster is set up; Specifically, set up the information that top layer concordance list carrys out the grouping of management of metadata server, comprise collection of metadata numbering, the IP address of collection of metadata place grouping, metadata number, each dimension attribute mean value, each dimension attribute scope.
Top layer concordance list can leave on arbitrary selected meta data server, in order to avoid Single Point of Faliure, also can be left on multiple stage meta data server.
Step (3) to step (5) for constructing the three level list table of whole system, by three level list table, inquiry request for metadata can navigate in the grouping of certain meta data server rapidly, and then navigate to concrete meta data server, finally find the collection of metadata comprising this metadata.Three level list table is used for locating collection of metadata rapidly, improves the time efficiency of inquiry.
(6) receive the inquiry request from user, and inquire about top layer concordance list, group index table and local index table successively according to inquiry request, and return Query Result; Wherein user's inquiry request specifically comprises: some inquiry, range query and TopK inquiry.
As shown in Figure 6, the some query manipulation in step (6) specifically comprises the following steps:
(6-1-1) acceptance point inquiry request, determines the multidimensional property of the metadata that this inquiry request is corresponding, and utilizes position sensitive hash function to calculate the cryptographic hash of multidimensional property, and this cryptographic hash is the numbering of the collection of metadata needing inquiry;
(6-1-2) list item that query metadata set number is corresponding in top layer concordance list, to obtain the IP address of this metadata place meta data server grouping;
(6-1-3) corresponding meta data server is determined according to the IP address of meta data server grouping, and the list item that query metadata set number is corresponding in the group index table of this meta data server, to obtain the IP address of this metadata place meta data server;
(6-1-4) according to the IP address of meta data server found, the list item that query metadata set number is corresponding in the group index table of this meta data server, is combined in memory address in disk to obtain this metadata place metadata set;
(6-1-5) be combined in the memory address in disk according to the metadata set found, the collection of metadata that inquiry is corresponding, and return Query Result;
As shown in Figure 7, the range query operation in step (6) specifically comprises the following steps:
(6-2-1) range of receiving inquiry request, determine multidimensional property scope to be checked, calculate the median of each range of attributes, and construct input vector by the median of each range of attributes, utilize position sensitive hash function to calculate the cryptographic hash of input vector, this cryptographic hash is the numbering of the collection of metadata needing inquiry;
(6-2-2) list item that query metadata set number is corresponding in top layer concordance list, compares the multidimensional property scope of preserving in the multidimensional property scope in inquiry request and list item, if two scopes are non-intersect, directly returns results as sky; If two scopes intersect, obtain the IP address of the meta data server grouping comprising metadata to be checked;
(6-2-3) corresponding meta data server is determined according to the IP address of meta data server grouping, and the list item that query metadata set number is corresponding in the group index table of this meta data server, to obtain the IP address of this metadata place meta data server;
(6-2-4) according to the IP address of meta data server found, the list item that query metadata set number is corresponding in the group index table of this meta data server, is combined in memory address in disk to obtain this metadata place metadata set;
(6-2-5) be combined in the memory address in disk according to the metadata set found, the collection of metadata that inquiry is corresponding, and return all metadata meeting multidimensional property scope in inquiry request;
As shown in Figure 8, the TopK query manipulation in step (6) specifically comprises the following steps:
(6-3-1) receive TopK inquiry request, determine multidimensional property and the K value of the metadata that this inquiry request is corresponding, and utilize position sensitive hash function to calculate the cryptographic hash of multidimensional property, this cryptographic hash is the numbering of the collection of metadata needing inquiry; Wherein K represents the quantity of the metadata the most close with the metadata multidimensional property in inquiry request;
(6-3-2) list item that query metadata set number is corresponding in top layer concordance list, if the metadata number recorded in list item is less than K value, then also include the list item of this list item the right and left in query context, until metadata number sum is more than or equal to K value in list item, finally obtain the IP address of multiple meta data server grouping;
(6-3-3) corresponding meta data server is determined according to the IP address of meta data server grouping, and the list item that query metadata set number is corresponding in the group index table of this meta data server, to obtain the IP address of this metadata place meta data server;
(6-3-4) according to the IP address of meta data server found, the list item that query metadata set number is corresponding in the group index table of this meta data server, is combined in memory address in disk to obtain this metadata place metadata set;
(6-3-5) be combined in the memory address in disk according to the metadata set found, the collection of metadata that inquiry is corresponding, and return the K bar metadata the most close with the multidimensional property of the metadata in inquiry request;
Because metadata is made up of multidimensional property, these attributes comprise file size, file creation time, filemodetime, file access number of times etc.In file system, certain associate feature is often there is between metadata, such as have similar size and access time, traditional metadata query mode only make use of spatial locality and the temporal locality of metadata, and have ignored the relevance between other attribute of metadata.In the inquiry request in the face of complexity, as TopK inquiry, range query, traditional metadata management mode has to travel through the incompatible acquisition result of whole metadata set.In the present invention, make use of the associate feature between metadata fully, the metadata with same or similar metadata attributes is focused into collection of metadata, for each metadata query request, the present invention will first by Query Location to certain or some collection of metadata, thus greatly reduce the number of queries of metadata, Query Result can be returned faster.
As shown in Figure 2, the multidimensional meta data management system that the present invention is based on associate feature comprises collection of metadata generation module 1, meta data server grouping module 2, local index generation module 3, group index generation module 4, top layer index generation module 5, enquiry module 6.
Collection of metadata generation module 1, divides according to associate feature the metadata on every platform meta data server, with generator data acquisition and set statistics file; As shown in Figure 4, collection of metadata generation module 1 comprises multidimensional property determination module 11, input vector constructing module 12, hash function computing module 13, associate feature division module 14 and output module 15.
Multidimensional property determination module 11 determines the multidimensional property of associate feature between representation element data;
The multidimensional property of metadata is configured to the input vector of regular length by input vector constructing module 12, and this input vector is as the input value of position sensitive hash function;
Hash function computing module 13 uses identical position sensitive hash function to carry out Hash calculation to input vector, and the cryptographic hash obtained is as the unique identification of metadata corresponding to this input vector;
Associate feature divides module 14 and the metadata with identical cryptographic hash is divided in same collection of metadata, and using this cryptographic hash as the numbering of this collection of metadata;
The dividing condition of metadata in collection of metadata added up by output module 15, to generate set statistics file;
Meta data server grouping module 2, according to set statistics file, carries out division operation to metadata cluster, to generate the grouping of multiple meta data server and packet configuration file;
Local index generation module 3, according to set statistics file, sets up local index table respectively on every platform meta data server;
Group index generation module 4, according to packet configuration file and set statistics file, sets up group index table respectively in each meta data server grouping;
Top layer index generation module 5, according to group index table, sets up the top layer concordance list of metadata server cluster;
Enquiry module 6 receives the inquiry request from user, and inquires about top layer concordance list, group index table and local index table successively according to inquiry request, and returns Query Result; As shown in Figure 6, enquiry module specifically to comprise some an inquiry submodule 61, range query submodule 62, TopK inquire about submodule 63.
The general illustration of enquiry module 6 as shown in Figure 5, inquiry request is first sent to top layer concordance list place meta data server, by top layer index generation module 5, inquiry request is navigated to the grouping of certain meta data server, forwarding inquiries request is to corresponding group index generation module 4, inquiry request navigated on certain or some meta data servers by inquiry group index table, forwarding inquiries request is to corresponding local index generation module 3, finally determine the collection of metadata that will inquire about, and return the result satisfied condition.
Specifically: some inquiry submodule 61 processes the some inquiry request of user, the multidimensional property of certain metadata given, and Query Result returns the specifying information of metadata; Range query submodule 62 processes the range query request of user, the scope of given multidimensional property, and Query Result returns in whole system all metadata informations meeting scope; TopK inquires about the TopK inquiry request that submodule 63 processes user, given one group of multidimensional property, and refers to defining K value, and Query Result returns K bar data the most close with given multidimensional property in whole system.
For verifying feasibility and the validity of present system, under true environment, configure present system, line correlation query manipulation of going forward side by side is to verify its effect.
The hardware and software system of present system test is as shown in table 1:
Table 1
The layoutprocedure of present system is as follows: first, by the trace file distributing of test to each node; Then, each node runs collection of metadata generation module 1 and data server grouping module 2, and in this test, 5 nodes have been divided into three groupings, and the interstitial content in grouping is respectively 1, and 2,3; In each grouping, run local index generation module 3, in each grouping, select a node to deposit the group index table generated by group index generation module 4, also preserve the top layer concordance list that top layer index generation module 5 generates simultaneously on this node.
For the inquiry request of user, then processed by enquiry module 6, first inquire about top layer concordance list, determine to ask all meta data server groupings, inquire about group index table in a packet, determine to ask all meta data servers, determine the collection of metadata that will inquire about finally by local index table.By this process, a request is finally limited in certain or some collection of metadata, thus effectively raises the time efficiency of inquiry.Table 2 is that the inquiry of present system and relational database system contrasts averaging time.
Table 2
Those skilled in the art will readily understand; the foregoing is only preferred embodiment of the present invention; not in order to limit the present invention, all any amendments done within the spirit and principles in the present invention, equivalent replacement and improvement etc., all should be included within protection scope of the present invention.

Claims (8)

1. based on a multi-dimensional metadata management method for associate feature, it is characterized in that, comprise the following steps:
(1) in metadata server cluster, the metadata on every platform meta data server is divided according to associate feature, with generator data acquisition and set statistics file; This step comprises following sub-step:
(1-1) multidimensional property representing associate feature between metadata on every platform meta data server is determined;
(1-2) multidimensional property of metadata is configured to the input vector of regular length, this input vector is as the input value of position sensitive hash function;
(1-3) use identical position sensitive hash function to carry out Hash calculation to input vector, the cryptographic hash obtained is as the unique identification of metadata corresponding to this input vector;
(1-4) metadata with identical cryptographic hash is divided in same collection of metadata, and using this cryptographic hash as the numbering of this collection of metadata;
(1-5) dividing condition of metadata in collection of metadata is added up, to generate set statistics file; This set statistics file comprises collection of metadata numbering, metadata number, each dimension attribute mean value, each dimension attribute scope, and wherein collection of metadata Serial Number Range is 1,2,3 ..., N, N represent the length of Hash table in position sensitive hash function;
(2) according to set statistics file, division operation is carried out to metadata cluster, to generate the grouping of multiple meta data server and packet configuration file;
(3) according to set statistics file, on every platform meta data server, local index table is set up respectively; Local index table is for managing the collection of metadata on every platform meta data server, and in concordance list, each have recorded the collection of metadata numbering in set statistics file, and the metadata set of this collection of metadata numbering correspondence is combined in the memory address in disk;
(4) according to packet configuration file and set statistics file, in each meta data server grouping, group index table is set up respectively;
(5) according to group index table, the top layer concordance list of metadata server cluster is set up;
(6) receive the inquiry request from user, and inquire about top layer concordance list, group index table and local index table successively according to inquiry request, and return Query Result; Wherein user's inquiry request comprises an inquiry, range query and TopK inquiry.
2. multi-dimensional metadata management method according to claim 1, it is characterized in that, step (2) is specially, every platform meta data server builds a bit vector, the length of this bit vector is identical with the Hash table length that position sensitive hash function in step (1-3) uses, thereafter, hierarchical clustering algorithm is utilized to carry out cluster operation between meta data server according to the bit vector of all meta data servers Hamming distances between any two, to obtain the grouping of meta data server, the grouping number formed when cluster reaches lower limit, or the distance between grouping reaches the upper limit, then stop cluster operation, thus obtain multiple meta data server group, and result is kept in packet configuration file.
3. multi-dimensional metadata management method according to claim 1, it is characterized in that, step (4) is specially, for each grouping in packet configuration file, build corresponding group index table respectively, each in group index table records the information of collection of metadata on all meta data servers in this grouping, comprises collection of metadata numbering, the IP address of collection of metadata place meta data server, metadata number, each dimension attribute mean value, each dimension attribute scope.
4. multi-dimensional metadata management method according to claim 1, is characterized in that, the some query manipulation in step (6) specifically comprises the following steps:
(6-1-1) acceptance point inquiry request, determines the multidimensional property of the metadata that this inquiry request is corresponding, and utilizes position sensitive hash function to calculate the cryptographic hash of multidimensional property, and this cryptographic hash is the numbering of the collection of metadata needing inquiry;
(6-1-2) list item that query metadata set number is corresponding in top layer concordance list, to obtain the IP address of this metadata place meta data server grouping;
(6-1-3) corresponding meta data server is determined according to the IP address of meta data server grouping, and the list item that query metadata set number is corresponding in the group index table of this meta data server, to obtain the IP address of this metadata place meta data server;
(6-1-4) according to the IP address of meta data server found, the list item that query metadata set number is corresponding in the group index table of this meta data server, is combined in memory address in disk to obtain this metadata place metadata set;
(6-1-5) be combined in the memory address in disk according to the metadata set found, the collection of metadata that inquiry is corresponding, and return Query Result.
5. multi-dimensional metadata management method according to claim 1, is characterized in that, the range query operation in step (6) specifically comprises the following steps:
(6-2-1) range of receiving inquiry request, determine multidimensional property scope to be checked, calculate the median of each range of attributes, and construct input vector by the median of each range of attributes, utilize position sensitive hash function to calculate the cryptographic hash of input vector, this cryptographic hash is the numbering of the collection of metadata needing inquiry;
(6-2-2) list item that query metadata set number is corresponding in top layer concordance list, compares the multidimensional property scope of preserving in the multidimensional property scope in inquiry request and list item, if two scopes are non-intersect, directly returns results as sky; If two scopes intersect, obtain the IP address of the meta data server grouping comprising metadata to be checked;
(6-2-3) corresponding meta data server is determined according to the IP address of meta data server grouping, and the list item that query metadata set number is corresponding in the group index table of this meta data server, to obtain the IP address of this metadata place meta data server;
(6-2-4) according to the IP address of meta data server found, the list item that query metadata set number is corresponding in the group index table of this meta data server, is combined in memory address in disk to obtain this metadata place metadata set;
(6-2-5) be combined in the memory address in disk according to the metadata set found, the collection of metadata that inquiry is corresponding, and return all metadata meeting multidimensional property scope in inquiry request.
6. multi-dimensional metadata management method according to claim 1, is characterized in that, the TopK query manipulation in step (6) specifically comprises the following steps:
(6-3-1) receive TopK inquiry request, determine multidimensional property and the K value of the metadata that this inquiry request is corresponding, and utilize position sensitive hash function to calculate the cryptographic hash of multidimensional property, this cryptographic hash is the numbering of the collection of metadata needing inquiry; Wherein K represents the quantity of the metadata the most close with the metadata multidimensional property in inquiry request;
(6-3-2) list item that query metadata set number is corresponding in top layer concordance list, if the metadata number recorded in list item is less than K value, then also include the list item of this list item the right and left in query context, until metadata number sum is more than or equal to K value in list item, finally obtain the IP address of multiple meta data server grouping;
(6-3-3) corresponding meta data server is determined according to the IP address of meta data server grouping, and the list item that query metadata set number is corresponding in the group index table of this meta data server, to obtain the IP address of this metadata place meta data server;
(6-3-4) according to the IP address of meta data server found, the list item that query metadata set number is corresponding in the group index table of this meta data server, is combined in memory address in disk to obtain this metadata place metadata set;
(6-3-5) be combined in the memory address in disk according to the metadata set found, the collection of metadata that inquiry is corresponding, and return the K bar metadata the most close with the multidimensional property of the metadata in inquiry request.
7., based on a multidimensional meta data management system for associate feature, comprise collection of metadata generation module, meta data server grouping module, local index generation module, group index generation module, top layer index generation module, enquiry module, it is characterized in that,
Collection of metadata generation module divides according to associate feature the metadata on every platform meta data server, with generator data acquisition and set statistics file; Collection of metadata generation module comprises multidimensional property determination module, input vector constructing module, hash function computing module, associate feature division module and output module;
The multidimensional property of associate feature between multidimensional property determination module determination representation element data;
The multidimensional property of metadata is configured to the input vector of regular length by input vector constructing module, and this input vector is as the input value of position sensitive hash function;
Hash function computing module uses identical position sensitive hash function to carry out Hash calculation to input vector, and the cryptographic hash obtained is as the unique identification of metadata corresponding to this input vector;
Associate feature divides module and the metadata with identical cryptographic hash is divided in same collection of metadata, and using this cryptographic hash as the numbering of this collection of metadata;
The dividing condition of metadata in output module statistics collection of metadata, to generate set statistics file;
Meta data server grouping module, according to set statistics file, carries out division operation to metadata cluster, to generate the grouping of multiple meta data server and packet configuration file;
Local index generation module, according to set statistics file, sets up local index table respectively on every platform meta data server;
Group index generation module, according to packet configuration file and set statistics file, sets up group index table respectively in each meta data server grouping;
Top layer index generation module, according to group index table, sets up the top layer concordance list of metadata server cluster;
Enquiry module receives the inquiry request from user, and inquires about top layer concordance list, group index table and local index table successively according to inquiry request, and returns Query Result.
8. multidimensional meta data management system according to claim 7, is characterized in that,
Enquiry module specifically to comprise some an inquiry submodule, range query submodule, TopK inquire about submodule;
The point inquiry request of some inquiry submodule process user, the multidimensional property of certain metadata given, Query Result returns the specifying information of metadata;
The range query request of range query submodule process user, the scope of given multidimensional property, Query Result returns in whole system all metadata informations meeting scope;
TopK inquires about the TopK inquiry request of submodule process user, given one group of multidimensional property, and refers to defining K value, and Query Result returns K bar data the most close with given multidimensional property in whole system.
CN201310090042.4A 2013-03-20 2013-03-20 A kind of multi-dimensional metadata management method based on associate feature and system Active CN103218404B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310090042.4A CN103218404B (en) 2013-03-20 2013-03-20 A kind of multi-dimensional metadata management method based on associate feature and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310090042.4A CN103218404B (en) 2013-03-20 2013-03-20 A kind of multi-dimensional metadata management method based on associate feature and system

Publications (2)

Publication Number Publication Date
CN103218404A CN103218404A (en) 2013-07-24
CN103218404B true CN103218404B (en) 2015-11-18

Family

ID=48816191

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310090042.4A Active CN103218404B (en) 2013-03-20 2013-03-20 A kind of multi-dimensional metadata management method based on associate feature and system

Country Status (1)

Country Link
CN (1) CN103218404B (en)

Families Citing this family (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104424240B (en) * 2013-08-27 2019-06-14 腾讯科技(深圳)有限公司 Multilist correlating method, main service node, calculate node and system
CN105637506B (en) * 2013-10-03 2020-04-28 华为技术有限公司 Method for optimizing query execution on data store
CN104657383B (en) * 2013-11-22 2017-11-24 华中科技大学 A kind of repetition video detecting method and system based on associate feature
CN103970871B (en) * 2014-05-12 2017-06-16 华中科技大学 File metadata querying method and system based on information of tracing to the source in storage system
CN103984640B (en) * 2014-05-14 2017-06-20 华为技术有限公司 Realize data prefetching method and device
CN105956122A (en) * 2016-05-03 2016-09-21 无锡雅座在线科技发展有限公司 Object attribute determining method and device
CN107818117B (en) * 2016-09-14 2022-02-15 阿里巴巴集团控股有限公司 Data table establishing method, online query method and related device
CN107562946A (en) * 2017-09-26 2018-01-09 南京哈卢信息科技有限公司 A kind of method that concordance list is created in big data system
CN110347654A (en) * 2018-03-23 2019-10-18 北京京东尚科信息技术有限公司 A kind of method and apparatus of online cluster features
CN109067817B (en) * 2018-05-31 2021-12-07 北京五八信息技术有限公司 Media content flow distribution method and device, electronic equipment and server
CN109143017B (en) * 2018-07-31 2021-03-30 成都天衡智造科技有限公司 Production test data processing method for semiconductor industry
CN109558404B (en) * 2018-10-19 2023-12-01 中国平安人寿保险股份有限公司 Data storage method, device, computer equipment and storage medium
CN111062751A (en) * 2019-12-12 2020-04-24 镇江市第一人民医院 Charging system and method based on automatic drug correlation consumable
CN111597148B (en) * 2020-05-14 2023-09-19 杭州果汁数据科技有限公司 Distributed metadata management method for distributed file system

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101957861A (en) * 2010-10-18 2011-01-26 江苏大学 Novel metadata server cluster and metadata management method based on reconciliation statement
CN102063486A (en) * 2010-12-28 2011-05-18 东北大学 Multi-dimensional data management-oriented cloud computing query processing method
CN102411637A (en) * 2011-12-30 2012-04-11 创新科软件技术(深圳)有限公司 Metadata management method of distributed file system

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7177883B2 (en) * 2004-07-15 2007-02-13 Hitachi, Ltd. Method and apparatus for hierarchical storage management based on data value and user interest

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101957861A (en) * 2010-10-18 2011-01-26 江苏大学 Novel metadata server cluster and metadata management method based on reconciliation statement
CN102063486A (en) * 2010-12-28 2011-05-18 东北大学 Multi-dimensional data management-oriented cloud computing query processing method
CN102411637A (en) * 2011-12-30 2012-04-11 创新科软件技术(深圳)有限公司 Metadata management method of distributed file system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
smartstore:a new metadata organization paradigm with semantic-awareness for next-generation file systems;yu hua et al;《SC09 conference on high performance computing networking,storage and analysis》;20091114;第2-6页 *

Also Published As

Publication number Publication date
CN103218404A (en) 2013-07-24

Similar Documents

Publication Publication Date Title
CN103218404B (en) A kind of multi-dimensional metadata management method based on associate feature and system
Mouratidis et al. A threshold-based algorithm for continuous monitoring of k nearest neighbors
US8219564B1 (en) Two-dimensional indexes for quick multiple attribute search in a catalog system
CN107423422B (en) Spatial data distributed storage and search method and system based on grid
CN103106249B (en) A kind of parallel data processing system based on Cassandra
CN105786808B (en) A kind of method and apparatus for distributed execution relationship type computations
US10140351B2 (en) Method and apparatus for processing database data in distributed database system
CN105404634B (en) Data managing method and system based on Key-Value data block
CN102725755B (en) Method and system of file access
CN101916261A (en) Data partitioning method for distributed parallel database system
CN106933511B (en) Space data storage organization method and system considering load balance and disk efficiency
CN103207919A (en) Method and device for quickly inquiring and calculating MangoDB cluster
CN103823846A (en) Method for storing and querying big data on basis of graph theories
CN103970871A (en) Method and system for inquiring file metadata in storage system based on provenance information
CN110321325A (en) File inode lookup method, terminal, server, system and storage medium
CN105550332A (en) Dual-layer index structure based origin graph query method
Li et al. Trajmesa: A distributed nosql-based trajectory data management system
CN104714986A (en) Three-dimensional picture searching method and three-dimensional picture searching system
US20140019454A1 (en) Systems and Methods for Caching Data Object Identifiers
CN104572862A (en) Mass data storage access method and system
CN105357247A (en) Multi-dimensional cloud resource interval finding method based on hierarchical cloud peer-to-peer network
CN102724301B (en) Cloud database system and method and equipment for reading and writing cloud data
US20150293971A1 (en) Distributed queries over geometric objects
CN112699187B (en) Associated data processing method, device, equipment, medium and product
CN110134698A (en) Data managing method and Related product

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant