CN105117402B - Daily record data sharding method and device - Google Patents

Daily record data sharding method and device Download PDF

Info

Publication number
CN105117402B
CN105117402B CN201510420017.7A CN201510420017A CN105117402B CN 105117402 B CN105117402 B CN 105117402B CN 201510420017 A CN201510420017 A CN 201510420017A CN 105117402 B CN105117402 B CN 105117402B
Authority
CN
China
Prior art keywords
daily record
record data
storage unit
segmentation
cryptographic hash
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201510420017.7A
Other languages
Chinese (zh)
Other versions
CN105117402A (en
Inventor
覃雄派
陈跃国
杜小勇
金国栋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Renmin University of China
Original Assignee
Renmin University of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Renmin University of China filed Critical Renmin University of China
Priority to CN201510420017.7A priority Critical patent/CN105117402B/en
Publication of CN105117402A publication Critical patent/CN105117402A/en
Application granted granted Critical
Publication of CN105117402B publication Critical patent/CN105117402B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • G06F16/2255Hash tables
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/1805Append-only file systems, e.g. using logs or journals to store data

Abstract

A kind of daily record data sharding method of present invention offer and device.The present invention is based on segmentation order-preserving Hash daily record data sharding method include:The codomain of multiple attribute fields of daily record data is divided into N number of segmentation;N is the integer more than 1;The mapping relations of each attribute field corresponding N number of segmentation and cryptographic Hash are established according to the sequence of N number of segmentation;The cryptographic Hash is continuously arranged integer, the sequence consensus of the cryptographic Hash to put in order with N number of segmentation;The corresponding daily record data of each cryptographic Hash is divided into a storage unit.The present invention ensure that adjacent daily record data is divided into adjacent storage unit by the isotonicity of hash function, to support range query that can quickly search out related data.

Description

Daily record data sharding method and device
Technical field
The present invention relates to technical field of data processing, and in particular to a kind of daily record data fragment based on segmentation order-preserving Hash Method and device.
Background technology
The historical informations of interaction occur for all kinds of entities of logdata record, for example, user to the buying behaviors of commodity, use The interactive history of family and good friend, user trace information etc..Based on a large amount of daily record data, data mining and machine can be carried out Study, to find rule and feature therein.In data mining and machine learning, needed for a large amount of daily record data logical Cross data query extraction data.Here data query is primarily referred to as range query, for example, in inquiry a period of time certain entities it Between interactive information and some geographic range in entity between interactive information etc..However, in order to which data query is more acurrate With it is efficient, how for a large amount of daily record data carry out fragment storage be particularly important.
The existing daily record data sharding method based on Hash, be daily record data is mapped to according to hash function it is corresponding In the target address space, to realize that daily record data fragment stores, adjacent daily record data is not necessarily mapped to phase after fragment In the adjacent target address space, therefore the daily record data after fragment is only supported a little to inquire, that is, can only once extract list when inquiring The daily record data of item without supporting range query, for example once extracts the daily record data in certain time;Therefore, utilization is existing Daily record data sharding method fragment after daily record data, it is less efficient in data query.
Invention content
The present invention provides a kind of daily record data sharding method and device based on segmentation order-preserving Hash, can solve existing skill The problem relatively low to efficiency data query after daily record data fragment in art.
In a first aspect, the present invention provides a kind of daily record data sharding method based on segmentation order-preserving Hash, including:
The codomain of multiple attribute fields of daily record data is divided into N number of segmentation;N is the integer more than 1;
The corresponding N number of segmentation of each attribute field and cryptographic Hash are established according to the sequence of N number of segmentation Mapping relations;The cryptographic Hash is continuously arranged integer, the sequence one of the cryptographic Hash to put in order with N number of segmentation It causes;
The corresponding daily record data of each cryptographic Hash is divided into a storage unit.
Optionally, the codomain of multiple attribute fields by daily record data is divided into N number of segmentation, including:
The daily record data for obtaining sampling divides according to the daily record data of the sampling in the codomain of the multiple attribute field Deep histogram Jian Li not waited;
The codomain is divided into N number of segmentation according to the equal deep histogram.
Optionally, described that the corresponding daily record data of each cryptographic Hash is divided into a storage unit, including:
Select a corresponding cryptographic Hash of the segmentation to generate vector from each attribute field respectively, by it is described to Amount is used as element number;
The corresponding daily record data of the element number is divided into a storage unit;The storage unit and institute State element number one-to-one correspondence.
Optionally, it is described the corresponding daily record data of the element number is divided into a storage unit after, Further include:
If after the memory space of the storage unit is filled with, recording the metamessage of the storage unit, and deposit described In daily record data write-in data file in storage unit;Wherein, the metamessage includes:The Hash of each attribute field Maximum value and minimum of the codomain of number, each attribute field that value, the storage unit are filled in the storage unit Value and location information.
Optionally, further include:
Daily record data in multiple storage units is written into same data file, in the file header of the data file Including:The element number of the storage unit is with the daily record data in the storage unit in the data file bias internal amount Correspondence.
Optionally, further include:
It records the mapping relations of each attribute field, the enabling time of the mapping relations and terminates time, described The metamessage of storage unit and the set of data file.
Second aspect, the present invention provide a kind of daily record data slicing apparatus based on segmentation order-preserving Hash, including:
Division module, for the codomain of multiple attribute fields of daily record data to be divided into N number of segmentation;N is more than 1 Integer;
Mapping block, it is described N number of point corresponding for establishing each attribute field according to the sequence of N number of segmentation The mapping relations of section and cryptographic Hash;The cryptographic Hash be continuously arranged integer, the cryptographic Hash put in order with it is described N number of The sequence consensus of segmentation;
The division module is additionally operable to the corresponding daily record data of each cryptographic Hash being divided into a storage unit In.
Optionally, the division module, is specifically used for:
The daily record data for obtaining sampling divides according to the daily record data of the sampling in the codomain of the multiple attribute field Deep histogram Jian Li not waited;
The codomain is divided into N number of segmentation according to the equal deep histogram.
Optionally, the division module, also particularly useful for:
Select a corresponding cryptographic Hash of the segmentation to generate vector from each attribute field respectively, by it is described to Amount is used as element number;
The corresponding daily record data of the element number is divided into a storage unit;The storage unit and institute State element number one-to-one correspondence.
Optionally, further include:
Processing module, if after the memory space for the storage unit is filled with, recording the member letter of the storage unit Breath, and will be in the daily record data write-in data file in the storage unit;Wherein, the metamessage includes:Each category The codomain of number, each attribute field that cryptographic Hash, the storage unit of property field are filled with is in the storage unit Maximum value and minimum value and location information.
Daily record data sharding method and device provided by the invention based on segmentation order-preserving Hash, by by daily record data The codomain of multiple attribute fields is divided into N number of segmentation;Each attribute field is established according to the sequence of N number of segmentation The mapping relations of the corresponding N number of segmentation and cryptographic Hash;The cryptographic Hash is continuously arranged integer, the row of the cryptographic Hash The sequence consensus of row sequence and N number of segmentation;It is single that the corresponding daily record data of each cryptographic Hash is divided into a storage In member, cryptographic Hash is arranged in order according to the sequence of segmentation when due to establishing mapping relations, and the mapping relations established are The hash function of segmentation order-preserving ensure that adjacent daily record data is divided into adjacent mesh by the isotonicity of hash function Memory space is marked, to support range query that can quickly search out related data.
Description of the drawings
In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below There is attached drawing needed in technology description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this Some embodiments of invention for those of ordinary skill in the art without creative efforts, can be with Other attached drawings are obtained according to these attached drawings.
Fig. 1 is that the present invention is based on the flow charts of one embodiment of daily record data sharding method of segmentation order-preserving Hash;
Fig. 2 is the mapping relations schematic diagram of one embodiment of the method for the present invention;
Fig. 3 is the schematic diagram for waiting deep histogram of one embodiment of the method for the present invention;
Fig. 4 is that the present invention is based on the structural schematic diagrams of one embodiment of daily record data slicing apparatus of segmentation order-preserving Hash.
Specific implementation mode
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete Site preparation describes, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.It is based on Embodiment in the present invention, those of ordinary skill in the art are obtained all other without creative efforts Embodiment shall fall within the protection scope of the present invention.
Fig. 1 is that the present invention is based on the flow charts of one embodiment of daily record data sharding method of segmentation order-preserving Hash.Fig. 2 is this The mapping relations schematic diagram of one embodiment of inventive method.As shown in Figure 1, the method for the present embodiment includes:
The codomain of multiple attribute fields of daily record data is divided into N number of segmentation by step 101;N is whole more than 1 Number;
Step 102, established according to the sequence of N number of segmentation the corresponding N number of segmentation of each attribute field with The mapping relations of cryptographic Hash;The cryptographic Hash is continuously arranged integer, and the cryptographic Hash puts in order and N number of segmentation Sequence consensus;
The corresponding daily record data of each cryptographic Hash is divided into a storage unit by step 103.
Specifically, the codomain of multiple attribute fields of daily record data is divided into N number of segmentation, N number of segmentation can be with It is decile, can also be not decile, multiple attribute fields is, for example, timestamp field, geographical location information field (including X Coordinate fields, Y coordinate field) etc. attribute field as range query, each segmentation correspond to a cryptographic Hash, i.e., often A segmentation corresponds to cryptographic Hash from 0 to N-1 in sequence, and the cryptographic Hash of the more forward subsection compression of sequence is smaller, N number of segmentation with The mapping relations of cryptographic Hash are just used as hash function, and since the cryptographic Hash after mapping is sequentially, which is It is segmented the hash function of order-preserving, as shown in Fig. 2, the codomain of timestamp field is divided into N number of segmentation, the width being each segmented is 1 Hour.For example, all daily record datas (namely each attribute field of first segmentation in each attribute field The daily record data that falls at first in segmentation of value) be mapped in 1 corresponding Hash bucket of cryptographic Hash, second segmentation it is all Daily record data is mapped in 2 corresponding Hash bucket of cryptographic Hash etc., i.e., divides the corresponding daily record data of each cryptographic Hash Into a storage unit.
The hash function of so-called order-preserving, the initial value for referring to the data of attribute field is x and y, if x<=y, then Hash (x)< =Hash (y), wherein Hash () is hash function.The hash function of order-preserving can support range query, i.e., if querying condition For a>=x and a<=y (a is the value of the attribute field of range query), then it can be calculated according to the occurrence of x and y Cryptographic Hash, to quickly find relevant daily record data.
The above-mentioned hash function that fragment is carried out to daily record data, is mapped to the daily record data for belonging to a fragment identical In storage unit (storage unit corresponds to an element number).For attribute field (such as the timestamp of each range query Field, geographical location information field), each point of section boundary is directly as the boundary of segmentation hash function, the maximum width of segmentation Degree uses scale more smaller than range query.If the representative condition of range query is hundreds of meters of ranges, geographical location letter Breath field divides section boundary that should be determined as smaller scale, such as 10 meters.
In the attribute field of daily record data entity identifier (Identity, abbreviation ID) field, timestamp field and Other fields (such as X-coordinate position and Y coordinate position) etc. as range query condition all carry out hashing operation.For Entity identifier (Identity, abbreviation ID) field in the attribute field of daily record data, is reflected using general hash function It penetrates.But for the attribute field as range query condition, it is necessary to using order-preserving hash function carry out mapping calculation its Cryptographic Hash.
Daily record data sharding method provided in this embodiment based on segmentation order-preserving Hash, by by the multiple of daily record data The codomain of attribute field is divided into N number of segmentation;Each attribute field is established according to the sequence of N number of segmentation to correspond to N number of segmentation and the mapping relations of cryptographic Hash;The cryptographic Hash is continuously arranged integer, and the arrangement of the cryptographic Hash is suitable The sequence consensus of sequence and N number of segmentation;The corresponding daily record data of each cryptographic Hash is divided into a storage unit, Cryptographic Hash is arranged in order according to the sequence of segmentation when due to establishing mapping relations, and the mapping relations established are that segmentation is protected The hash function of sequence ensure that adjacent daily record data is divided into adjacent target storage by the isotonicity of hash function Space, to support range query that can quickly search out related data.
It, further, in practical applications, will each cryptographic Hash pair on the basis of embodiment shown in Fig. 1 The daily record data answered be divided into the mode in a storage unit can there are many, optionally, as a kind of enforceable mode, Selection one is described from each attribute field respectively is segmented corresponding cryptographic Hash generation vector, regard the vector as list Member number;
The corresponding daily record data of the element number is divided into a storage unit;The storage unit It is corresponded with the element number.
Specifically, the corresponding cryptographic Hash of the segmentation of selection one generates one from each attribute field respectively Multi-C vector, it is such as that the corresponding cryptographic Hash 1 of first segmentation of timestamp field, first segmentation of X-coordinate field is corresponding Cryptographic Hash 1, the corresponding cryptographic Hash 1 of first segmentation of Y coordinate field form a multi-C vector (1,1,1), the multi-C vector As an element number;The corresponding daily record data of the element number is divided into a storage unit, i.e., it will be In the time slice, and all daily record datas of X-coordinate position, Y coordinate position all in the segmentation limit of the field are stored in In corresponding storage unit, that is, the value of each attribute field is fallen the daily record data in being segmented at first and is stored in In corresponding storage unit.
On the basis of the above embodiment, further, described to divide the corresponding daily record data of the element number To after in a storage unit, further include:
If after the memory space of the storage unit is filled with, recording the metamessage of the storage unit, and deposit described In daily record data write-in data file in storage unit;Wherein, the metamessage includes:The Hash of each attribute field Maximum value and minimum of the codomain of number, each attribute field that value, the storage unit are filled in the storage unit Value and location information.
Specifically, when carrying out fragment storage to daily record data, daily record data caching can be carried out first with buffering area, it will Buffering area corresponds to a buffering area as storage unit, an element number.
When these buffering areas are filled with, metamessage is recorded, and data file is written in the daily record data in buffering area. Metamessage shaped like<Entity ID Hash, X-coordinate Hash, Y coordinate Hash, timestamp Hash, Full Counter, X min, X Max, Y min, Y max, TS min, TS max, the data positional information of buffering area>, the attribute of these metamessages is real respectively Number that the cryptographic Hash of body ID, the cryptographic Hash of X-coordinate, the cryptographic Hash of Y coordinate, the hash value of timestamp field, buffering area fill up, The location information of minimum value and the daily record data of maximum value and the buffering area storage of each range query field (is stored in Specific location in which data file).Metamessage record sheet can be stored in file system, be used for data query.
Fig. 3 is the schematic diagram for waiting deep histogram of one embodiment of the method for the present invention.
On the basis of the above embodiment, further, since daily record data distribution may be uneven, in order to avoid The daily record data of certain segmentations is more, to which only a few cells buffering area is rapidly fully written, frequently handles, therefore press The codomain of multiple attribute fields of the daily record data is divided into N number of segmentation by the distribution according to daily record data.
The codomain of multiple attribute fields by the daily record data is divided into N number of segmentation heterogeneous, specifically Including:
The daily record data for obtaining sampling divides according to the daily record data of the sampling in the codomain of the multiple attribute field Deep histogram Jian Li not waited;
The codomain is divided into N number of segmentation according to the equal deep histogram.
Specifically, as shown in figure 3, since daily record data is accumulated inside the buffering area of memory first, until some is slow When rushing area and be filled with, just need to be written in the data file of file system.Therefore, it is desirable to the data volumes of each buffering area to the greatest extent may be used Can it is uniform, these buffering areas are singly write full, and are written into file system, rather than only a few buffering area It is rapidly fully written, frequently handles.
For daily record data situation unevenly distributed, division methods are segmented using the codomain of attribute field heterogeneous. In order to support segmentation heterogeneous to divide, before the formal fragment storage of daily record data, a part of data can be sampled, for true How fixed segmentation divides.To the daily record data of sampling, an equal deep histogram is established, the quantity of the bucket of histogram is N, these are straight The quantity that side schemes the daily record data of each bucket is the same, and what the frequency of ordinate referred to is exactly the quantity of daily record data in Fig. 3.
The boundary of the bucket of deep histogram, the structural segmentation order-preserving hash function, in the histogram bucket of front such as utilize Daily record data be mapped to the Hash bucket of low serial number, and the record in each histogram bucket thereafter is sequentially mapped to each higher The Hash bucket of serial number, each codomain section for waiting the boundary demarcation of the bucket of deep histogram to go out just correspond to a segmentation.As shown in Fig. 2, Quantity Deng deep histogram bucket is N, then all daily record datas of first histogram bucket, (value of namely attribute field is fallen Daily record data in this barrel) it is mapped to cryptographic Hash 1 (i.e. first Hash bucket), all daily record datas of second histogram bucket It is mapped to cryptographic Hash 2 (i.e. second Hash bucket) etc..According to the boundary of equal deep histogram bucket, the hash function of foundation is one It is segmented the hash function of order-preserving, that is, daily record data in a histogram bucket is mapped to a cryptographic Hash, low serial number The cryptographic Hash of the daily record data of histogram bucket is less than the cryptographic Hash of the daily record data of the histogram bucket of high serial number.
In above-mentioned specific implementation mode, by the non-homogeneous division of segmentation, the daily record data due to certain segmentations is avoided It is more, to which only a few buffering area is rapidly fully written, frequently the problem of processing.
On the basis of aforementioned embodiments, further, in practical applications, by the daily record number in the storage unit According to the mode in write-in data file can there are many, optionally, as a kind of enforceable mode, described can be deposited multiple Same data file is written in daily record data in storage unit, and the file header of the data file includes:The storage unit The correspondence of daily record data in element number and the storage unit in the data file bias internal amount.
Specifically, multiple storage units can be written same data file and can also be written in different data files; When being written in same data file, for the daily record data of some storage unit in rapidly locating file, it is necessary in text Part head records pair of the daily record data in the data file bias internal amount of the element number of the storage unit and the storage unit It should be related to, the daily record data of some storage unit can be quickly positioned when to inquire.
Due to when carrying out fragment to daily record data, being directed to range query field such as timestamp field, coordinate word On section such as x coordinate, y-coordinate etc., segmentation order-preserving hashing operation has been carried out.When inquiry is related to these fields, such as model It encloses inquiry and merely relates to timestamp field, querying condition has following form, " [time constant 1]<The and times=time<= [time constant 2] ", query processing process description is as follows:
First, the boundary (each dividing section boundary) of each Hash bucket of control segmentation order-preserving Hash, [time is normal for searching Amount 1] and [time constant 2] where Hash bucket, be Bucket respectivelyiAnd Bucketj
Since hash function is order-preserving, for Bucketi+1To Bucketj-1Wait the corresponding unit buffering of Hash bucket Area, data are centainly fallen between [time constant 1] and [time constant 2], and data are all (i.e. 100%) related datas.It will Timestamp field is fallen in Bucketi+1Lower bound and Bucketj-1The upper bound all metamessages record extract, these yuan letter Corresponding data file is ceased, is exactly perfectly correlated data file.
For BucketiAnd BucketjThe corresponding storage unit of two buckets, only part include related data.We are right respectively It is proceeded as follows, and extracts the cryptographic Hash in its corresponding timestamp field, utilizes this cryptographic Hash inquiry system metamessage Record sheet, extracts all records that all timestamp cryptographic Hash are the cryptographic Hash, and corresponding data file, only part include phase Close data.After extracting data file, using the methods of binary chop, the non-relevant data of the data file is filtered out, you can Obtain related data.So far, all related datas extraction is completed.
The principle for inquiring metamessage record sheet is as follows:For example the timestamp Hash field in metamessage record sheet is TS_ Hash after above-mentioned boundary determines, is extracted in metamessage record sheet, TS_Hash is in the record between these boundaries, you can seeks Find the information of the data file of its mapping.For example, only including that a User ID, a timestamp field, X and Y are sat at one In the daily record data of marking-up section, if TS_HashminAnd TS_HashmaxValue be respectively 3 and 5, then 2-7 rows in the following table 1, It will be confirmed as including the cryptographic Hash of query-relevant data.
Table 1
In practical applications, optionally, as a kind of enforceable mode, the method further includes:It records described each The metamessage sum number of the enabling time and termination time, the storage unit of the mapping relations of attribute field, the mapping relations According to the set of file.
Specifically, during daily record data continuous fragment, we are constantly monitored daily record data, understand each A range query field is on each Hash bucket, if uniformly.When the uniformity of daily record data varies widely, i.e., Certain Hash barrelages evidences are excessively intensive, and certain Hash buckets are excessively sparse.
So, we are designed using with above-mentioned identical method and step for range query field based on new data sampling New hash function.After the completion of new hash function collection design, records new hash function and integrate as activity data discipline (Active Data Epoch).Hash function collection, enabling time and termination time, metamessage and the corresponding data file that last time uses Referred to as a data are recorded.
When range query field is recorded across more than two data, the boundary recorded according to data is needed to be looked into line range The rewriting of inquiry.For example the Lower and upper bounds of range query field are [time constant 1] and [time constant 2], which crosses over two numbers According to discipline, the boundary that two data are recorded is TC, then range query condition " [time constant 1]<The and times=time<=[the time Constant 2] ", it is rewritten as " [time constant 1]<The and times=time<=TC”or“TC<The and times=time<=[time is normal Amount 2] ", remaining step is similar with above-described embodiment.
Fig. 4 is that the present invention is based on the structural schematic diagrams of one embodiment of daily record data slicing apparatus of segmentation order-preserving Hash.Such as Shown in Fig. 4, the device of the present embodiment may include:Division module 401 and mapping block 402;
Wherein, division module 401, for the codomain of multiple attribute fields of daily record data to be divided into N number of segmentation; N is the integer more than 1;
Mapping block 402, for establishing the corresponding N of each attribute field according to the sequence of N number of segmentation The mapping relations of a segmentation and cryptographic Hash;The cryptographic Hash be continuously arranged integer, the cryptographic Hash put in order and institute State the sequence consensus of N number of segmentation;
The division module 401 is additionally operable to the corresponding daily record data of each cryptographic Hash being divided into a storage single In member.
Optionally, the division module 401, is specifically used for:
The daily record data for obtaining sampling divides according to the daily record data of the sampling in the codomain of the multiple attribute field Deep histogram Jian Li not waited;
The codomain is divided into N number of segmentation according to the equal deep histogram.
Optionally, the division module 401, also particularly useful for:
Select a corresponding cryptographic Hash of the segmentation to generate vector from each attribute field respectively, by it is described to Amount is used as element number;
The corresponding daily record data of the element number is divided into a storage unit;The storage unit and institute State element number one-to-one correspondence.
Optionally, further include:
Processing module, if after the memory space for the storage unit is filled with, recording the member letter of the storage unit Breath, and will be in the daily record data write-in data file in the storage unit;Wherein, the metamessage includes:Each category The codomain of number, each attribute field that cryptographic Hash, the storage unit of property field are filled with is in the storage unit Maximum value and minimum value and location information.
Optionally, the processing module, is additionally operable to:
Daily record data in multiple storage units is written into same data file, in the file header of the data file Including:The element number of the storage unit is with the daily record data in the storage unit in the data file bias internal amount Correspondence.
Optionally, the processing module, is additionally operable to:
It records the mapping relations of each attribute field, the enabling time of the mapping relations and terminates time, described The metamessage of storage unit and the set of data file.
The device of the present embodiment, can be used for executing as Fig. 1-3 it is any shown in embodiment of the method technical solution, realize Principle is similar with technique effect, and details are not described herein again.
One of ordinary skill in the art will appreciate that:Realize that all or part of step of above method embodiment can pass through Program instruction relevant software and hardware is completed, and program above-mentioned can be stored in a computer read/write memory medium, The program when being executed, executes step including the steps of the foregoing method embodiments;And storage medium above-mentioned includes:ROM, RAM, magnetic disc Or the various media that can store program code such as CD.
Finally it should be noted that:Above example is only to illustrate the technical solution of the application, rather than its limitations;Although The application is described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that:It still may be used With technical scheme described in the above embodiments is modified or equivalent replacement of some of the technical features; And these modifications or replacements, each embodiment technical solution of the application that it does not separate the essence of the corresponding technical solution spirit and Range.

Claims (10)

1. a kind of daily record data sharding method, which is characterized in that including:
The codomain of multiple attribute fields of daily record data is divided into N number of segmentation;N is the integer more than 1;
The mapping of each attribute field corresponding N number of segmentation and cryptographic Hash is established according to the sequence of N number of segmentation Relationship;The cryptographic Hash is continuously arranged integer, the sequence consensus of the cryptographic Hash to put in order with N number of segmentation;
The corresponding daily record data of each cryptographic Hash is divided into a storage unit.
2. according to the method described in claim 1, it is characterized in that, the codomain of multiple attribute fields by daily record data point It is not divided into N number of segmentation, including:
The daily record data for obtaining sampling, builds according to the daily record data of the sampling in the codomain of the multiple attribute field respectively It is vertical to wait deep histogram;
The codomain is divided into N number of segmentation according to the equal deep histogram.
3. method according to claim 1 or 2, which is characterized in that described by each corresponding daily record number of the cryptographic Hash According to being divided into a storage unit, including:
The corresponding cryptographic Hash of the segmentation of selection one generates vector from each attribute field respectively, by the vector work For element number;
The corresponding daily record data of the element number is divided into a storage unit;The storage unit and the list Member number corresponds.
4. according to the method described in claim 3, it is characterized in that, described divide the corresponding daily record data of the element number To after in a storage unit, further include:
If after the memory space of the storage unit is filled with, recording the metamessage of the storage unit, and the storage is single In daily record data write-in data file in member;Wherein, the metamessage includes:The cryptographic Hash of each attribute field, institute State the codomain of number, each attribute field that storage unit is filled in the maximum value and minimum value of the storage unit and Location information.
5. according to the method described in claim 4, it is characterized in that, further including:
Same data file is written into daily record data in multiple storage units, is wrapped in the file header of the data file It includes:Pair of daily record data in the element number of the storage unit and the storage unit in the data file bias internal amount It should be related to.
6. according to the method described in claim 5, it is characterized in that, further including:
It records the mapping relations of each attribute field, the enabling time of the mapping relations and terminates time, the storage The metamessage of unit and the set of data file.
7. a kind of daily record data slicing apparatus, which is characterized in that including:
Division module, for the codomain of multiple attribute fields of daily record data to be divided into N number of segmentation;N is whole more than 1 Number;
Mapping block, for according to the sequence of N number of segmentation establish the corresponding N number of segmentation of each attribute field with The mapping relations of cryptographic Hash;The cryptographic Hash is continuously arranged integer, and the cryptographic Hash puts in order and N number of segmentation Sequence consensus;
The division module is additionally operable to the corresponding daily record data of each cryptographic Hash being divided into a storage unit.
8. device according to claim 7, which is characterized in that the division module is specifically used for:
The daily record data for obtaining sampling, builds according to the daily record data of the sampling in the codomain of the multiple attribute field respectively It is vertical to wait deep histogram;
The codomain is divided into N number of segmentation according to the equal deep histogram.
9. device according to claim 7 or 8, which is characterized in that the division module, also particularly useful for:
The corresponding cryptographic Hash of the segmentation of selection one generates vector from each attribute field respectively, by the vector work For element number;
The corresponding daily record data of the element number is divided into a storage unit;The storage unit and the list Member number corresponds.
10. device according to claim 9, which is characterized in that further include:
Processing module, if after the memory space for the storage unit is filled with, the metamessage of the storage unit is recorded, and It will be in the daily record data write-in data file in the storage unit;Wherein, the metamessage includes:Each attribute field Cryptographic Hash, the number that the storage unit is filled with, each attribute field codomain the storage unit maximum value With minimum value and location information.
CN201510420017.7A 2015-07-16 2015-07-16 Daily record data sharding method and device Active CN105117402B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510420017.7A CN105117402B (en) 2015-07-16 2015-07-16 Daily record data sharding method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510420017.7A CN105117402B (en) 2015-07-16 2015-07-16 Daily record data sharding method and device

Publications (2)

Publication Number Publication Date
CN105117402A CN105117402A (en) 2015-12-02
CN105117402B true CN105117402B (en) 2018-08-28

Family

ID=54665394

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510420017.7A Active CN105117402B (en) 2015-07-16 2015-07-16 Daily record data sharding method and device

Country Status (1)

Country Link
CN (1) CN105117402B (en)

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105930256B (en) * 2016-04-14 2018-07-17 北京思特奇信息技术股份有限公司 A kind of log-output method and device using log4j single cent parts
CN107590157B (en) * 2016-07-08 2021-03-23 腾讯科技(深圳)有限公司 Data storage method, data query method and related equipment
CN106354434B (en) * 2016-08-31 2019-07-23 中国人民大学 The storage method and system of daily record data
CN106599127A (en) * 2016-12-01 2017-04-26 深圳市风云实业有限公司 Log storage and query method applied to standalone server
CN107330106B (en) * 2017-07-07 2020-11-20 苏州浪潮智能科技有限公司 Data filtering method and device based on FPGA
CN108415869B (en) * 2018-02-28 2020-06-26 北京零壹空间科技有限公司 Method and device for sending serial data
CN109101830A (en) * 2018-09-03 2018-12-28 安徽太阳石科技有限公司 Real time data safety protecting method and system based on block chain
CN109657182B (en) * 2018-12-18 2020-09-08 深圳店匠科技有限公司 Webpage generation method, system and computer readable storage medium
CN111382463B (en) * 2020-04-02 2022-11-29 中国工商银行股份有限公司 Block chain system and method based on stream data
CN112632018B (en) * 2020-12-21 2022-05-17 深圳市杰成软件有限公司 Business process event log sampling method and system

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104408159A (en) * 2014-12-04 2015-03-11 曙光信息产业(北京)有限公司 Data correlating, loading and querying method and device
CN104536988A (en) * 2014-12-10 2015-04-22 杭州斯凯网络科技有限公司 MonetDB distributed computing storage method
CN104572809A (en) * 2014-11-17 2015-04-29 杭州斯凯网络科技有限公司 Distributive relational database free expansion method
CN104598519A (en) * 2014-12-11 2015-05-06 浙江浙大中控信息技术有限公司 Continuous-memory-based database index system and processing method

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8990177B2 (en) * 2011-10-27 2015-03-24 Yahoo! Inc. Lock-free transactional support for large-scale storage systems
US9754050B2 (en) * 2012-02-28 2017-09-05 Microsoft Technology Licensing, Llc Path-decomposed trie data structures
US9405643B2 (en) * 2013-11-26 2016-08-02 Dropbox, Inc. Multi-level lookup architecture to facilitate failure recovery
GB201400191D0 (en) * 2014-01-07 2014-02-26 Cryptic Software Ltd Data file searching method

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104572809A (en) * 2014-11-17 2015-04-29 杭州斯凯网络科技有限公司 Distributive relational database free expansion method
CN104408159A (en) * 2014-12-04 2015-03-11 曙光信息产业(北京)有限公司 Data correlating, loading and querying method and device
CN104536988A (en) * 2014-12-10 2015-04-22 杭州斯凯网络科技有限公司 MonetDB distributed computing storage method
CN104598519A (en) * 2014-12-11 2015-05-06 浙江浙大中控信息技术有限公司 Continuous-memory-based database index system and processing method

Also Published As

Publication number Publication date
CN105117402A (en) 2015-12-02

Similar Documents

Publication Publication Date Title
CN105117402B (en) Daily record data sharding method and device
US10372723B2 (en) Efficient query processing using histograms in a columnar database
CN103577440B (en) A kind of data processing method and device in non-relational database
US9367574B2 (en) Efficient query processing in columnar databases using bloom filters
CN101777017B (en) Rapid recovery method of continuous data protection system
CN101963982A (en) Method for managing metadata of redundancy deletion and storage system based on location sensitive Hash
CN106970930A (en) Message, which is sent, determines method and device, tables of data creation method and device
CN102779138A (en) Hard disk access method of real time data
CN105095247A (en) Symbolic data analysis method and system
CN110399096A (en) Metadata of distributed type file system caches the method, apparatus and equipment deleted again
EP4150481A1 (en) Execution-time dynamic range partitioning transformations
CN107506466A (en) A kind of small documents storage method and system
CN116339643B (en) Formatting method, formatting device, formatting equipment and formatting medium for disk array
CN104408097A (en) Hybrid indexing method and system based on character field hot update
CN101290621B (en) Safe digital card memory search method
CN115858471A (en) Service data change recording method, device, computer equipment and medium
CN105224596A (en) A kind of method of visit data and device
CN111026827A (en) Data service method and device for soil erosion factors and electronic equipment
CN108021562A (en) Deposit method, apparatus and distributed file system applied to distributed file system
CN115576947A (en) Data management method and device, combined library, electronic equipment and storage medium
CN117196602A (en) Payment data processing method and device, computer equipment and storage medium
CN116578571A (en) Method, device, computer equipment and storage medium for updating guest group data
CN117435581A (en) Index identification method, apparatus, device, storage medium, and program product
CN116186075A (en) Method, device, equipment and medium for realizing scattering points in visual range of map
CN117725266A (en) Load curve data processing method and device and intelligent ammeter

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant