CN105117402B - Daily record data sharding method and device - Google Patents
Daily record data sharding method and device Download PDFInfo
- Publication number
- CN105117402B CN105117402B CN201510420017.7A CN201510420017A CN105117402B CN 105117402 B CN105117402 B CN 105117402B CN 201510420017 A CN201510420017 A CN 201510420017A CN 105117402 B CN105117402 B CN 105117402B
- Authority
- CN
- China
- Prior art keywords
- daily record
- record data
- storage unit
- segmentation
- cryptographic hash
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/22—Indexing; Data structures therefor; Storage structures
- G06F16/2228—Indexing structures
- G06F16/2255—Hash tables
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/18—File system types
- G06F16/1805—Append-only file systems, e.g. using logs or journals to store data
Abstract
A kind of daily record data sharding method of present invention offer and device.The present invention is based on segmentation order-preserving Hash daily record data sharding method include:The codomain of multiple attribute fields of daily record data is divided into N number of segmentation;N is the integer more than 1;The mapping relations of each attribute field corresponding N number of segmentation and cryptographic Hash are established according to the sequence of N number of segmentation;The cryptographic Hash is continuously arranged integer, the sequence consensus of the cryptographic Hash to put in order with N number of segmentation;The corresponding daily record data of each cryptographic Hash is divided into a storage unit.The present invention ensure that adjacent daily record data is divided into adjacent storage unit by the isotonicity of hash function, to support range query that can quickly search out related data.
Description
Technical field
The present invention relates to technical field of data processing, and in particular to a kind of daily record data fragment based on segmentation order-preserving Hash
Method and device.
Background technology
The historical informations of interaction occur for all kinds of entities of logdata record, for example, user to the buying behaviors of commodity, use
The interactive history of family and good friend, user trace information etc..Based on a large amount of daily record data, data mining and machine can be carried out
Study, to find rule and feature therein.In data mining and machine learning, needed for a large amount of daily record data logical
Cross data query extraction data.Here data query is primarily referred to as range query, for example, in inquiry a period of time certain entities it
Between interactive information and some geographic range in entity between interactive information etc..However, in order to which data query is more acurrate
With it is efficient, how for a large amount of daily record data carry out fragment storage be particularly important.
The existing daily record data sharding method based on Hash, be daily record data is mapped to according to hash function it is corresponding
In the target address space, to realize that daily record data fragment stores, adjacent daily record data is not necessarily mapped to phase after fragment
In the adjacent target address space, therefore the daily record data after fragment is only supported a little to inquire, that is, can only once extract list when inquiring
The daily record data of item without supporting range query, for example once extracts the daily record data in certain time;Therefore, utilization is existing
Daily record data sharding method fragment after daily record data, it is less efficient in data query.
Invention content
The present invention provides a kind of daily record data sharding method and device based on segmentation order-preserving Hash, can solve existing skill
The problem relatively low to efficiency data query after daily record data fragment in art.
In a first aspect, the present invention provides a kind of daily record data sharding method based on segmentation order-preserving Hash, including:
The codomain of multiple attribute fields of daily record data is divided into N number of segmentation;N is the integer more than 1;
The corresponding N number of segmentation of each attribute field and cryptographic Hash are established according to the sequence of N number of segmentation
Mapping relations;The cryptographic Hash is continuously arranged integer, the sequence one of the cryptographic Hash to put in order with N number of segmentation
It causes;
The corresponding daily record data of each cryptographic Hash is divided into a storage unit.
Optionally, the codomain of multiple attribute fields by daily record data is divided into N number of segmentation, including:
The daily record data for obtaining sampling divides according to the daily record data of the sampling in the codomain of the multiple attribute field
Deep histogram Jian Li not waited;
The codomain is divided into N number of segmentation according to the equal deep histogram.
Optionally, described that the corresponding daily record data of each cryptographic Hash is divided into a storage unit, including:
Select a corresponding cryptographic Hash of the segmentation to generate vector from each attribute field respectively, by it is described to
Amount is used as element number;
The corresponding daily record data of the element number is divided into a storage unit;The storage unit and institute
State element number one-to-one correspondence.
Optionally, it is described the corresponding daily record data of the element number is divided into a storage unit after,
Further include:
If after the memory space of the storage unit is filled with, recording the metamessage of the storage unit, and deposit described
In daily record data write-in data file in storage unit;Wherein, the metamessage includes:The Hash of each attribute field
Maximum value and minimum of the codomain of number, each attribute field that value, the storage unit are filled in the storage unit
Value and location information.
Optionally, further include:
Daily record data in multiple storage units is written into same data file, in the file header of the data file
Including:The element number of the storage unit is with the daily record data in the storage unit in the data file bias internal amount
Correspondence.
Optionally, further include:
It records the mapping relations of each attribute field, the enabling time of the mapping relations and terminates time, described
The metamessage of storage unit and the set of data file.
Second aspect, the present invention provide a kind of daily record data slicing apparatus based on segmentation order-preserving Hash, including:
Division module, for the codomain of multiple attribute fields of daily record data to be divided into N number of segmentation;N is more than 1
Integer;
Mapping block, it is described N number of point corresponding for establishing each attribute field according to the sequence of N number of segmentation
The mapping relations of section and cryptographic Hash;The cryptographic Hash be continuously arranged integer, the cryptographic Hash put in order with it is described N number of
The sequence consensus of segmentation;
The division module is additionally operable to the corresponding daily record data of each cryptographic Hash being divided into a storage unit
In.
Optionally, the division module, is specifically used for:
The daily record data for obtaining sampling divides according to the daily record data of the sampling in the codomain of the multiple attribute field
Deep histogram Jian Li not waited;
The codomain is divided into N number of segmentation according to the equal deep histogram.
Optionally, the division module, also particularly useful for:
Select a corresponding cryptographic Hash of the segmentation to generate vector from each attribute field respectively, by it is described to
Amount is used as element number;
The corresponding daily record data of the element number is divided into a storage unit;The storage unit and institute
State element number one-to-one correspondence.
Optionally, further include:
Processing module, if after the memory space for the storage unit is filled with, recording the member letter of the storage unit
Breath, and will be in the daily record data write-in data file in the storage unit;Wherein, the metamessage includes:Each category
The codomain of number, each attribute field that cryptographic Hash, the storage unit of property field are filled with is in the storage unit
Maximum value and minimum value and location information.
Daily record data sharding method and device provided by the invention based on segmentation order-preserving Hash, by by daily record data
The codomain of multiple attribute fields is divided into N number of segmentation;Each attribute field is established according to the sequence of N number of segmentation
The mapping relations of the corresponding N number of segmentation and cryptographic Hash;The cryptographic Hash is continuously arranged integer, the row of the cryptographic Hash
The sequence consensus of row sequence and N number of segmentation;It is single that the corresponding daily record data of each cryptographic Hash is divided into a storage
In member, cryptographic Hash is arranged in order according to the sequence of segmentation when due to establishing mapping relations, and the mapping relations established are
The hash function of segmentation order-preserving ensure that adjacent daily record data is divided into adjacent mesh by the isotonicity of hash function
Memory space is marked, to support range query that can quickly search out related data.
Description of the drawings
In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below
There is attached drawing needed in technology description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this
Some embodiments of invention for those of ordinary skill in the art without creative efforts, can be with
Other attached drawings are obtained according to these attached drawings.
Fig. 1 is that the present invention is based on the flow charts of one embodiment of daily record data sharding method of segmentation order-preserving Hash;
Fig. 2 is the mapping relations schematic diagram of one embodiment of the method for the present invention;
Fig. 3 is the schematic diagram for waiting deep histogram of one embodiment of the method for the present invention;
Fig. 4 is that the present invention is based on the structural schematic diagrams of one embodiment of daily record data slicing apparatus of segmentation order-preserving Hash.
Specific implementation mode
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete
Site preparation describes, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.It is based on
Embodiment in the present invention, those of ordinary skill in the art are obtained all other without creative efforts
Embodiment shall fall within the protection scope of the present invention.
Fig. 1 is that the present invention is based on the flow charts of one embodiment of daily record data sharding method of segmentation order-preserving Hash.Fig. 2 is this
The mapping relations schematic diagram of one embodiment of inventive method.As shown in Figure 1, the method for the present embodiment includes:
The codomain of multiple attribute fields of daily record data is divided into N number of segmentation by step 101;N is whole more than 1
Number;
Step 102, established according to the sequence of N number of segmentation the corresponding N number of segmentation of each attribute field with
The mapping relations of cryptographic Hash;The cryptographic Hash is continuously arranged integer, and the cryptographic Hash puts in order and N number of segmentation
Sequence consensus;
The corresponding daily record data of each cryptographic Hash is divided into a storage unit by step 103.
Specifically, the codomain of multiple attribute fields of daily record data is divided into N number of segmentation, N number of segmentation can be with
It is decile, can also be not decile, multiple attribute fields is, for example, timestamp field, geographical location information field (including X
Coordinate fields, Y coordinate field) etc. attribute field as range query, each segmentation correspond to a cryptographic Hash, i.e., often
A segmentation corresponds to cryptographic Hash from 0 to N-1 in sequence, and the cryptographic Hash of the more forward subsection compression of sequence is smaller, N number of segmentation with
The mapping relations of cryptographic Hash are just used as hash function, and since the cryptographic Hash after mapping is sequentially, which is
It is segmented the hash function of order-preserving, as shown in Fig. 2, the codomain of timestamp field is divided into N number of segmentation, the width being each segmented is 1
Hour.For example, all daily record datas (namely each attribute field of first segmentation in each attribute field
The daily record data that falls at first in segmentation of value) be mapped in 1 corresponding Hash bucket of cryptographic Hash, second segmentation it is all
Daily record data is mapped in 2 corresponding Hash bucket of cryptographic Hash etc., i.e., divides the corresponding daily record data of each cryptographic Hash
Into a storage unit.
The hash function of so-called order-preserving, the initial value for referring to the data of attribute field is x and y, if x<=y, then Hash (x)<
=Hash (y), wherein Hash () is hash function.The hash function of order-preserving can support range query, i.e., if querying condition
For a>=x and a<=y (a is the value of the attribute field of range query), then it can be calculated according to the occurrence of x and y
Cryptographic Hash, to quickly find relevant daily record data.
The above-mentioned hash function that fragment is carried out to daily record data, is mapped to the daily record data for belonging to a fragment identical
In storage unit (storage unit corresponds to an element number).For attribute field (such as the timestamp of each range query
Field, geographical location information field), each point of section boundary is directly as the boundary of segmentation hash function, the maximum width of segmentation
Degree uses scale more smaller than range query.If the representative condition of range query is hundreds of meters of ranges, geographical location letter
Breath field divides section boundary that should be determined as smaller scale, such as 10 meters.
In the attribute field of daily record data entity identifier (Identity, abbreviation ID) field, timestamp field and
Other fields (such as X-coordinate position and Y coordinate position) etc. as range query condition all carry out hashing operation.For
Entity identifier (Identity, abbreviation ID) field in the attribute field of daily record data, is reflected using general hash function
It penetrates.But for the attribute field as range query condition, it is necessary to using order-preserving hash function carry out mapping calculation its
Cryptographic Hash.
Daily record data sharding method provided in this embodiment based on segmentation order-preserving Hash, by by the multiple of daily record data
The codomain of attribute field is divided into N number of segmentation;Each attribute field is established according to the sequence of N number of segmentation to correspond to
N number of segmentation and the mapping relations of cryptographic Hash;The cryptographic Hash is continuously arranged integer, and the arrangement of the cryptographic Hash is suitable
The sequence consensus of sequence and N number of segmentation;The corresponding daily record data of each cryptographic Hash is divided into a storage unit,
Cryptographic Hash is arranged in order according to the sequence of segmentation when due to establishing mapping relations, and the mapping relations established are that segmentation is protected
The hash function of sequence ensure that adjacent daily record data is divided into adjacent target storage by the isotonicity of hash function
Space, to support range query that can quickly search out related data.
It, further, in practical applications, will each cryptographic Hash pair on the basis of embodiment shown in Fig. 1
The daily record data answered be divided into the mode in a storage unit can there are many, optionally, as a kind of enforceable mode,
Selection one is described from each attribute field respectively is segmented corresponding cryptographic Hash generation vector, regard the vector as list
Member number;
The corresponding daily record data of the element number is divided into a storage unit;The storage unit
It is corresponded with the element number.
Specifically, the corresponding cryptographic Hash of the segmentation of selection one generates one from each attribute field respectively
Multi-C vector, it is such as that the corresponding cryptographic Hash 1 of first segmentation of timestamp field, first segmentation of X-coordinate field is corresponding
Cryptographic Hash 1, the corresponding cryptographic Hash 1 of first segmentation of Y coordinate field form a multi-C vector (1,1,1), the multi-C vector
As an element number;The corresponding daily record data of the element number is divided into a storage unit, i.e., it will be
In the time slice, and all daily record datas of X-coordinate position, Y coordinate position all in the segmentation limit of the field are stored in
In corresponding storage unit, that is, the value of each attribute field is fallen the daily record data in being segmented at first and is stored in
In corresponding storage unit.
On the basis of the above embodiment, further, described to divide the corresponding daily record data of the element number
To after in a storage unit, further include:
If after the memory space of the storage unit is filled with, recording the metamessage of the storage unit, and deposit described
In daily record data write-in data file in storage unit;Wherein, the metamessage includes:The Hash of each attribute field
Maximum value and minimum of the codomain of number, each attribute field that value, the storage unit are filled in the storage unit
Value and location information.
Specifically, when carrying out fragment storage to daily record data, daily record data caching can be carried out first with buffering area, it will
Buffering area corresponds to a buffering area as storage unit, an element number.
When these buffering areas are filled with, metamessage is recorded, and data file is written in the daily record data in buffering area.
Metamessage shaped like<Entity ID Hash, X-coordinate Hash, Y coordinate Hash, timestamp Hash, Full Counter, X min, X
Max, Y min, Y max, TS min, TS max, the data positional information of buffering area>, the attribute of these metamessages is real respectively
Number that the cryptographic Hash of body ID, the cryptographic Hash of X-coordinate, the cryptographic Hash of Y coordinate, the hash value of timestamp field, buffering area fill up,
The location information of minimum value and the daily record data of maximum value and the buffering area storage of each range query field (is stored in
Specific location in which data file).Metamessage record sheet can be stored in file system, be used for data query.
Fig. 3 is the schematic diagram for waiting deep histogram of one embodiment of the method for the present invention.
On the basis of the above embodiment, further, since daily record data distribution may be uneven, in order to avoid
The daily record data of certain segmentations is more, to which only a few cells buffering area is rapidly fully written, frequently handles, therefore press
The codomain of multiple attribute fields of the daily record data is divided into N number of segmentation by the distribution according to daily record data.
The codomain of multiple attribute fields by the daily record data is divided into N number of segmentation heterogeneous, specifically
Including:
The daily record data for obtaining sampling divides according to the daily record data of the sampling in the codomain of the multiple attribute field
Deep histogram Jian Li not waited;
The codomain is divided into N number of segmentation according to the equal deep histogram.
Specifically, as shown in figure 3, since daily record data is accumulated inside the buffering area of memory first, until some is slow
When rushing area and be filled with, just need to be written in the data file of file system.Therefore, it is desirable to the data volumes of each buffering area to the greatest extent may be used
Can it is uniform, these buffering areas are singly write full, and are written into file system, rather than only a few buffering area
It is rapidly fully written, frequently handles.
For daily record data situation unevenly distributed, division methods are segmented using the codomain of attribute field heterogeneous.
In order to support segmentation heterogeneous to divide, before the formal fragment storage of daily record data, a part of data can be sampled, for true
How fixed segmentation divides.To the daily record data of sampling, an equal deep histogram is established, the quantity of the bucket of histogram is N, these are straight
The quantity that side schemes the daily record data of each bucket is the same, and what the frequency of ordinate referred to is exactly the quantity of daily record data in Fig. 3.
The boundary of the bucket of deep histogram, the structural segmentation order-preserving hash function, in the histogram bucket of front such as utilize
Daily record data be mapped to the Hash bucket of low serial number, and the record in each histogram bucket thereafter is sequentially mapped to each higher
The Hash bucket of serial number, each codomain section for waiting the boundary demarcation of the bucket of deep histogram to go out just correspond to a segmentation.As shown in Fig. 2,
Quantity Deng deep histogram bucket is N, then all daily record datas of first histogram bucket, (value of namely attribute field is fallen
Daily record data in this barrel) it is mapped to cryptographic Hash 1 (i.e. first Hash bucket), all daily record datas of second histogram bucket
It is mapped to cryptographic Hash 2 (i.e. second Hash bucket) etc..According to the boundary of equal deep histogram bucket, the hash function of foundation is one
It is segmented the hash function of order-preserving, that is, daily record data in a histogram bucket is mapped to a cryptographic Hash, low serial number
The cryptographic Hash of the daily record data of histogram bucket is less than the cryptographic Hash of the daily record data of the histogram bucket of high serial number.
In above-mentioned specific implementation mode, by the non-homogeneous division of segmentation, the daily record data due to certain segmentations is avoided
It is more, to which only a few buffering area is rapidly fully written, frequently the problem of processing.
On the basis of aforementioned embodiments, further, in practical applications, by the daily record number in the storage unit
According to the mode in write-in data file can there are many, optionally, as a kind of enforceable mode, described can be deposited multiple
Same data file is written in daily record data in storage unit, and the file header of the data file includes:The storage unit
The correspondence of daily record data in element number and the storage unit in the data file bias internal amount.
Specifically, multiple storage units can be written same data file and can also be written in different data files;
When being written in same data file, for the daily record data of some storage unit in rapidly locating file, it is necessary in text
Part head records pair of the daily record data in the data file bias internal amount of the element number of the storage unit and the storage unit
It should be related to, the daily record data of some storage unit can be quickly positioned when to inquire.
Due to when carrying out fragment to daily record data, being directed to range query field such as timestamp field, coordinate word
On section such as x coordinate, y-coordinate etc., segmentation order-preserving hashing operation has been carried out.When inquiry is related to these fields, such as model
It encloses inquiry and merely relates to timestamp field, querying condition has following form, " [time constant 1]<The and times=time<=
[time constant 2] ", query processing process description is as follows:
First, the boundary (each dividing section boundary) of each Hash bucket of control segmentation order-preserving Hash, [time is normal for searching
Amount 1] and [time constant 2] where Hash bucket, be Bucket respectivelyiAnd Bucketj。
Since hash function is order-preserving, for Bucketi+1To Bucketj-1Wait the corresponding unit buffering of Hash bucket
Area, data are centainly fallen between [time constant 1] and [time constant 2], and data are all (i.e. 100%) related datas.It will
Timestamp field is fallen in Bucketi+1Lower bound and Bucketj-1The upper bound all metamessages record extract, these yuan letter
Corresponding data file is ceased, is exactly perfectly correlated data file.
For BucketiAnd BucketjThe corresponding storage unit of two buckets, only part include related data.We are right respectively
It is proceeded as follows, and extracts the cryptographic Hash in its corresponding timestamp field, utilizes this cryptographic Hash inquiry system metamessage
Record sheet, extracts all records that all timestamp cryptographic Hash are the cryptographic Hash, and corresponding data file, only part include phase
Close data.After extracting data file, using the methods of binary chop, the non-relevant data of the data file is filtered out, you can
Obtain related data.So far, all related datas extraction is completed.
The principle for inquiring metamessage record sheet is as follows:For example the timestamp Hash field in metamessage record sheet is TS_
Hash after above-mentioned boundary determines, is extracted in metamessage record sheet, TS_Hash is in the record between these boundaries, you can seeks
Find the information of the data file of its mapping.For example, only including that a User ID, a timestamp field, X and Y are sat at one
In the daily record data of marking-up section, if TS_HashminAnd TS_HashmaxValue be respectively 3 and 5, then 2-7 rows in the following table 1,
It will be confirmed as including the cryptographic Hash of query-relevant data.
Table 1
In practical applications, optionally, as a kind of enforceable mode, the method further includes:It records described each
The metamessage sum number of the enabling time and termination time, the storage unit of the mapping relations of attribute field, the mapping relations
According to the set of file.
Specifically, during daily record data continuous fragment, we are constantly monitored daily record data, understand each
A range query field is on each Hash bucket, if uniformly.When the uniformity of daily record data varies widely, i.e.,
Certain Hash barrelages evidences are excessively intensive, and certain Hash buckets are excessively sparse.
So, we are designed using with above-mentioned identical method and step for range query field based on new data sampling
New hash function.After the completion of new hash function collection design, records new hash function and integrate as activity data discipline (Active
Data Epoch).Hash function collection, enabling time and termination time, metamessage and the corresponding data file that last time uses
Referred to as a data are recorded.
When range query field is recorded across more than two data, the boundary recorded according to data is needed to be looked into line range
The rewriting of inquiry.For example the Lower and upper bounds of range query field are [time constant 1] and [time constant 2], which crosses over two numbers
According to discipline, the boundary that two data are recorded is TC, then range query condition " [time constant 1]<The and times=time<=[the time
Constant 2] ", it is rewritten as " [time constant 1]<The and times=time<=TC”or“TC<The and times=time<=[time is normal
Amount 2] ", remaining step is similar with above-described embodiment.
Fig. 4 is that the present invention is based on the structural schematic diagrams of one embodiment of daily record data slicing apparatus of segmentation order-preserving Hash.Such as
Shown in Fig. 4, the device of the present embodiment may include:Division module 401 and mapping block 402;
Wherein, division module 401, for the codomain of multiple attribute fields of daily record data to be divided into N number of segmentation;
N is the integer more than 1;
Mapping block 402, for establishing the corresponding N of each attribute field according to the sequence of N number of segmentation
The mapping relations of a segmentation and cryptographic Hash;The cryptographic Hash be continuously arranged integer, the cryptographic Hash put in order and institute
State the sequence consensus of N number of segmentation;
The division module 401 is additionally operable to the corresponding daily record data of each cryptographic Hash being divided into a storage single
In member.
Optionally, the division module 401, is specifically used for:
The daily record data for obtaining sampling divides according to the daily record data of the sampling in the codomain of the multiple attribute field
Deep histogram Jian Li not waited;
The codomain is divided into N number of segmentation according to the equal deep histogram.
Optionally, the division module 401, also particularly useful for:
Select a corresponding cryptographic Hash of the segmentation to generate vector from each attribute field respectively, by it is described to
Amount is used as element number;
The corresponding daily record data of the element number is divided into a storage unit;The storage unit and institute
State element number one-to-one correspondence.
Optionally, further include:
Processing module, if after the memory space for the storage unit is filled with, recording the member letter of the storage unit
Breath, and will be in the daily record data write-in data file in the storage unit;Wherein, the metamessage includes:Each category
The codomain of number, each attribute field that cryptographic Hash, the storage unit of property field are filled with is in the storage unit
Maximum value and minimum value and location information.
Optionally, the processing module, is additionally operable to:
Daily record data in multiple storage units is written into same data file, in the file header of the data file
Including:The element number of the storage unit is with the daily record data in the storage unit in the data file bias internal amount
Correspondence.
Optionally, the processing module, is additionally operable to:
It records the mapping relations of each attribute field, the enabling time of the mapping relations and terminates time, described
The metamessage of storage unit and the set of data file.
The device of the present embodiment, can be used for executing as Fig. 1-3 it is any shown in embodiment of the method technical solution, realize
Principle is similar with technique effect, and details are not described herein again.
One of ordinary skill in the art will appreciate that:Realize that all or part of step of above method embodiment can pass through
Program instruction relevant software and hardware is completed, and program above-mentioned can be stored in a computer read/write memory medium,
The program when being executed, executes step including the steps of the foregoing method embodiments;And storage medium above-mentioned includes:ROM, RAM, magnetic disc
Or the various media that can store program code such as CD.
Finally it should be noted that:Above example is only to illustrate the technical solution of the application, rather than its limitations;Although
The application is described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that:It still may be used
With technical scheme described in the above embodiments is modified or equivalent replacement of some of the technical features;
And these modifications or replacements, each embodiment technical solution of the application that it does not separate the essence of the corresponding technical solution spirit and
Range.
Claims (10)
1. a kind of daily record data sharding method, which is characterized in that including:
The codomain of multiple attribute fields of daily record data is divided into N number of segmentation;N is the integer more than 1;
The mapping of each attribute field corresponding N number of segmentation and cryptographic Hash is established according to the sequence of N number of segmentation
Relationship;The cryptographic Hash is continuously arranged integer, the sequence consensus of the cryptographic Hash to put in order with N number of segmentation;
The corresponding daily record data of each cryptographic Hash is divided into a storage unit.
2. according to the method described in claim 1, it is characterized in that, the codomain of multiple attribute fields by daily record data point
It is not divided into N number of segmentation, including:
The daily record data for obtaining sampling, builds according to the daily record data of the sampling in the codomain of the multiple attribute field respectively
It is vertical to wait deep histogram;
The codomain is divided into N number of segmentation according to the equal deep histogram.
3. method according to claim 1 or 2, which is characterized in that described by each corresponding daily record number of the cryptographic Hash
According to being divided into a storage unit, including:
The corresponding cryptographic Hash of the segmentation of selection one generates vector from each attribute field respectively, by the vector work
For element number;
The corresponding daily record data of the element number is divided into a storage unit;The storage unit and the list
Member number corresponds.
4. according to the method described in claim 3, it is characterized in that, described divide the corresponding daily record data of the element number
To after in a storage unit, further include:
If after the memory space of the storage unit is filled with, recording the metamessage of the storage unit, and the storage is single
In daily record data write-in data file in member;Wherein, the metamessage includes:The cryptographic Hash of each attribute field, institute
State the codomain of number, each attribute field that storage unit is filled in the maximum value and minimum value of the storage unit and
Location information.
5. according to the method described in claim 4, it is characterized in that, further including:
Same data file is written into daily record data in multiple storage units, is wrapped in the file header of the data file
It includes:Pair of daily record data in the element number of the storage unit and the storage unit in the data file bias internal amount
It should be related to.
6. according to the method described in claim 5, it is characterized in that, further including:
It records the mapping relations of each attribute field, the enabling time of the mapping relations and terminates time, the storage
The metamessage of unit and the set of data file.
7. a kind of daily record data slicing apparatus, which is characterized in that including:
Division module, for the codomain of multiple attribute fields of daily record data to be divided into N number of segmentation;N is whole more than 1
Number;
Mapping block, for according to the sequence of N number of segmentation establish the corresponding N number of segmentation of each attribute field with
The mapping relations of cryptographic Hash;The cryptographic Hash is continuously arranged integer, and the cryptographic Hash puts in order and N number of segmentation
Sequence consensus;
The division module is additionally operable to the corresponding daily record data of each cryptographic Hash being divided into a storage unit.
8. device according to claim 7, which is characterized in that the division module is specifically used for:
The daily record data for obtaining sampling, builds according to the daily record data of the sampling in the codomain of the multiple attribute field respectively
It is vertical to wait deep histogram;
The codomain is divided into N number of segmentation according to the equal deep histogram.
9. device according to claim 7 or 8, which is characterized in that the division module, also particularly useful for:
The corresponding cryptographic Hash of the segmentation of selection one generates vector from each attribute field respectively, by the vector work
For element number;
The corresponding daily record data of the element number is divided into a storage unit;The storage unit and the list
Member number corresponds.
10. device according to claim 9, which is characterized in that further include:
Processing module, if after the memory space for the storage unit is filled with, the metamessage of the storage unit is recorded, and
It will be in the daily record data write-in data file in the storage unit;Wherein, the metamessage includes:Each attribute field
Cryptographic Hash, the number that the storage unit is filled with, each attribute field codomain the storage unit maximum value
With minimum value and location information.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510420017.7A CN105117402B (en) | 2015-07-16 | 2015-07-16 | Daily record data sharding method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510420017.7A CN105117402B (en) | 2015-07-16 | 2015-07-16 | Daily record data sharding method and device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN105117402A CN105117402A (en) | 2015-12-02 |
CN105117402B true CN105117402B (en) | 2018-08-28 |
Family
ID=54665394
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201510420017.7A Active CN105117402B (en) | 2015-07-16 | 2015-07-16 | Daily record data sharding method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN105117402B (en) |
Families Citing this family (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105930256B (en) * | 2016-04-14 | 2018-07-17 | 北京思特奇信息技术股份有限公司 | A kind of log-output method and device using log4j single cent parts |
CN107590157B (en) * | 2016-07-08 | 2021-03-23 | 腾讯科技(深圳)有限公司 | Data storage method, data query method and related equipment |
CN106354434B (en) * | 2016-08-31 | 2019-07-23 | 中国人民大学 | The storage method and system of daily record data |
CN106599127A (en) * | 2016-12-01 | 2017-04-26 | 深圳市风云实业有限公司 | Log storage and query method applied to standalone server |
CN107330106B (en) * | 2017-07-07 | 2020-11-20 | 苏州浪潮智能科技有限公司 | Data filtering method and device based on FPGA |
CN108415869B (en) * | 2018-02-28 | 2020-06-26 | 北京零壹空间科技有限公司 | Method and device for sending serial data |
CN109101830A (en) * | 2018-09-03 | 2018-12-28 | 安徽太阳石科技有限公司 | Real time data safety protecting method and system based on block chain |
CN109657182B (en) * | 2018-12-18 | 2020-09-08 | 深圳店匠科技有限公司 | Webpage generation method, system and computer readable storage medium |
CN111382463B (en) * | 2020-04-02 | 2022-11-29 | 中国工商银行股份有限公司 | Block chain system and method based on stream data |
CN112632018B (en) * | 2020-12-21 | 2022-05-17 | 深圳市杰成软件有限公司 | Business process event log sampling method and system |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104408159A (en) * | 2014-12-04 | 2015-03-11 | 曙光信息产业(北京)有限公司 | Data correlating, loading and querying method and device |
CN104536988A (en) * | 2014-12-10 | 2015-04-22 | 杭州斯凯网络科技有限公司 | MonetDB distributed computing storage method |
CN104572809A (en) * | 2014-11-17 | 2015-04-29 | 杭州斯凯网络科技有限公司 | Distributive relational database free expansion method |
CN104598519A (en) * | 2014-12-11 | 2015-05-06 | 浙江浙大中控信息技术有限公司 | Continuous-memory-based database index system and processing method |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8990177B2 (en) * | 2011-10-27 | 2015-03-24 | Yahoo! Inc. | Lock-free transactional support for large-scale storage systems |
US9754050B2 (en) * | 2012-02-28 | 2017-09-05 | Microsoft Technology Licensing, Llc | Path-decomposed trie data structures |
US9405643B2 (en) * | 2013-11-26 | 2016-08-02 | Dropbox, Inc. | Multi-level lookup architecture to facilitate failure recovery |
GB201400191D0 (en) * | 2014-01-07 | 2014-02-26 | Cryptic Software Ltd | Data file searching method |
-
2015
- 2015-07-16 CN CN201510420017.7A patent/CN105117402B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104572809A (en) * | 2014-11-17 | 2015-04-29 | 杭州斯凯网络科技有限公司 | Distributive relational database free expansion method |
CN104408159A (en) * | 2014-12-04 | 2015-03-11 | 曙光信息产业(北京)有限公司 | Data correlating, loading and querying method and device |
CN104536988A (en) * | 2014-12-10 | 2015-04-22 | 杭州斯凯网络科技有限公司 | MonetDB distributed computing storage method |
CN104598519A (en) * | 2014-12-11 | 2015-05-06 | 浙江浙大中控信息技术有限公司 | Continuous-memory-based database index system and processing method |
Also Published As
Publication number | Publication date |
---|---|
CN105117402A (en) | 2015-12-02 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN105117402B (en) | Daily record data sharding method and device | |
US10372723B2 (en) | Efficient query processing using histograms in a columnar database | |
CN103577440B (en) | A kind of data processing method and device in non-relational database | |
US9367574B2 (en) | Efficient query processing in columnar databases using bloom filters | |
CN101777017B (en) | Rapid recovery method of continuous data protection system | |
CN101963982A (en) | Method for managing metadata of redundancy deletion and storage system based on location sensitive Hash | |
CN106970930A (en) | Message, which is sent, determines method and device, tables of data creation method and device | |
CN102779138A (en) | Hard disk access method of real time data | |
CN105095247A (en) | Symbolic data analysis method and system | |
CN110399096A (en) | Metadata of distributed type file system caches the method, apparatus and equipment deleted again | |
EP4150481A1 (en) | Execution-time dynamic range partitioning transformations | |
CN107506466A (en) | A kind of small documents storage method and system | |
CN116339643B (en) | Formatting method, formatting device, formatting equipment and formatting medium for disk array | |
CN104408097A (en) | Hybrid indexing method and system based on character field hot update | |
CN101290621B (en) | Safe digital card memory search method | |
CN115858471A (en) | Service data change recording method, device, computer equipment and medium | |
CN105224596A (en) | A kind of method of visit data and device | |
CN111026827A (en) | Data service method and device for soil erosion factors and electronic equipment | |
CN108021562A (en) | Deposit method, apparatus and distributed file system applied to distributed file system | |
CN115576947A (en) | Data management method and device, combined library, electronic equipment and storage medium | |
CN117196602A (en) | Payment data processing method and device, computer equipment and storage medium | |
CN116578571A (en) | Method, device, computer equipment and storage medium for updating guest group data | |
CN117435581A (en) | Index identification method, apparatus, device, storage medium, and program product | |
CN116186075A (en) | Method, device, equipment and medium for realizing scattering points in visual range of map | |
CN117725266A (en) | Load curve data processing method and device and intelligent ammeter |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |