CN106484332A - A kind of date storage method and device - Google Patents
A kind of date storage method and device Download PDFInfo
- Publication number
- CN106484332A CN106484332A CN201610890096.2A CN201610890096A CN106484332A CN 106484332 A CN106484332 A CN 106484332A CN 201610890096 A CN201610890096 A CN 201610890096A CN 106484332 A CN106484332 A CN 106484332A
- Authority
- CN
- China
- Prior art keywords
- data
- hash table
- layer
- attribute
- value
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0602—Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
- G06F3/0604—Improving or facilitating administration, e.g. storage management
- G06F3/0607—Improving or facilitating administration, e.g. storage management by facilitating the process of upgrading existing storage systems, e.g. for improving compatibility between host and storage device
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/951—Indexing; Web crawling techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0628—Interfaces specially adapted for storage systems making use of a particular technique
- G06F3/0638—Organizing or formatting or addressing of data
Abstract
The invention discloses a kind of date storage method and device, belong to network data analysis technical field.The date storage method of the present invention is temporary in the buffer according to time order and function order by the data for grabbing, data of the record with same type attribute position in the buffer, content using record by the data in caching according to storing in hard disk after attribute type sequence, data Coutinuous store that will be with same type attribute.The date storage method of the invention can improve the reading efficiency of data, convenient network data is carried out point, with good application value.
Description
Technical field
The present invention relates to network data analysis technical field, specifically provides a kind of date storage method and device.
Background technology
In network data analysis are carried out, it is often necessary to capture the data in network and stored, so as to follow-up logarithm
According to be read out analysis.At present, the storage mode of network data is, by the data for grabbing according to crawl sequencing, according to
Secondary storage is in a hard disk.However, this kind of storage mode is unfavorable for the reading of data, that is, read the data stored using this kind of mode
When less efficient, be unfavorable for the analysis of network data.
Content of the invention
The technical assignment of the present invention is for above-mentioned problem, provides a kind of reading efficiency that can improve data, side
Just date storage method network data being analyzed.
The further technical assignment of the present invention is to provide a kind of reading efficiency that can realize above-mentioned raising data, and it is right to facilitate
The data storage device of the method that network data is analyzed.
For achieving the above object, the invention provides following technical scheme:
A kind of date storage method and device, the data for grabbing are temporary in the buffer according to time order and function order, remember
Data of the record with same type attribute position in the buffer, the content using record is by the data in caching according to Attribute class
Store in hard disk after type sequence, data Coutinuous store that will be with same type attribute, the date storage method concrete
Step is:
S1:The sequencing of foundation crawl data, by a plurality of data storage for grabbing in the first caching, wherein, respectively
Data described in bar have the attribute of N number of same type, and N is the integer more than 1;
S2:According to position of many data in the first caching, the first nested Hash table is generated, wherein, described the
One nested Hash table is made up of N layer the first Hash table nesting, and the key of each layer first Hash table is followed successively by described N number of mutually similar
The attribute of type, and ground floor is followed successively by next layer of the first Hash table to the value of N-1 layer first Hash table, n-th layer
The value of one Hash table is address value of the data described in each bar in the first caching;
S3:According to default traversal order, the address value in the first Hash table of the n-th layer is obtained successively;
S4:According to the order for obtaining each address value, in caching first successively, each described address value is corresponding
Data storage is to hard disk.
In step S1, when needing network data is analyzed, using gripping tool, from network, data are captured,
And according to the time order and function order for grabbing, store data in caching, wherein can be within the default time period, whenever grabbing
The instrument of taking grabs a network data, will the network data be deposited into caching in, or, gripping tool is in preset time period
A plurality of network data is inside grabbed, then, according to crawl order, many data is deposited in caching.Network data has
Attribute, many data of crawl have the attribute of same type.
In order to the data in step S1 are resequenced, data of the record with same type attribute are needed in the buffer
Position, is recorded by the way of the Hash table in the present invention.Hash table is utilized can be deposited in key-value pair data storage, i.e. Hash table
Several Hash records are stored up, each Hash record has a key and a value, and both have corresponding relation.Generate in the present invention
Hash table be nested multilayer Hash table, nested form is, the value of ground floor to N-1 layer Hash table from inside to outside according to
The secondary Hash table for next layer.The number of plies of the nesting Hash table is equal with the number of data identical attribute type, i.e. step
N value in S1 is equal with the N value in step S2.Key in every layer of Hash table is a type of attribute, the nesting for so generating
Hash table is using the data of key record same type attribute in every layer of Hash table.Hash record number in every layer of Hash table is permissible
For one, or multiple, in per layer, the key of each Hash record is the attribute of same type, but property value not phase
With.It is data address in the buffer that innermost layer is the value of n-th layer Hash table.
The process for generating the first nested Hash table is:Data storage is in the buffer, that is to say, that per data all to one
Address in caching.As every data is respectively provided with the attribute of N type, according to the attribute of data itself each type attribute
It is worth, in layer the property value identical Hash table record of key for searching content and data, until finding the Hash of n-th layer Hash table
Record, data address in the buffer is stored in the value of the Hash record for finding, and so can generate repeatedly the
One nested Hash table.
The first nested Hash table that step S2 is generated includes multilayer Hash table, and Hash table has default traversal order,
Similar with array traversal order, traveled through according to the content of key successively.Specifically, from the beginning of ground floor Hash table, when ground floor
In Hash table when being recorded as multiple, then according to preset order, from the beginning of first Hash record.When the value of the Hash record is
During next Hash table, then enter in next layer of Hash table, start to judge from first record of next layer of Hash table, when under this
In one layer of Hash table Hash record value be under next layer of Hash table when, enter next layer of Hash table under this, judge that this is lower
The value of the Hash record in one layer of Hash table, repeat the above steps, until the value of the Hash record in certain layer of Hash table is data
Address in the buffer, now, obtains the address in the caching, so gets the value of first Hash record in n-th layer.
The value of in n-th layer other Hash records is got in the same manner.After traversal terminates, the value for getting is each data ground in the buffer
Location is worth, and the acquisition order of address value is got according to the attribute type classification of data.
In being stored the data in each address value to hard disk successively in step S4, the order of storage is for getting each ground
The order of location value.So, by the original data according to crawl sequencing storage, according to attribute type Coutinuous store, Ci Zhongcun
Scattered data are condensed together by storage mode according to attribute type, so as to facilitating and quickly find tool when reading
There are the data of same alike result type, data reading performance using redundancy is high.
Step S3 and step S4 can be alternately performed, i.e., whenever step S3 gets address value, just according to the address
Value, finds the corresponding data of the address value in the buffer, and then execution step S4 is by data storage to hard disk.
Preferably, in step S4, according to the order for obtaining each address value, each institute in caching first successively
Stating the corresponding data storage of address value to hard disk includes, according to the order for obtaining each address value, to delay described first successively
The corresponding data storage of each described address value is deposited to the second caching, the data during second is cached are stored successively to hard
Disk.
Preferably, also including to generate and store the second nested Hash table, wherein, the described second nested Hash table is by N layer
The nested composition of second Hash table, the key of per layer of second Hash table is triple, and ground floor is breathed out to N-1 layer described second
The value of uncommon table is followed successively by next layer of the second Hash table, and the value of the second Hash table of n-th layer is data described in each bar in the hard disk
In address value.
Preferably, the first element of every layer of triple is followed successively by the attribute of N number of same type, described in per layer
The second element of triple is initial address of the data that include of the attribute of respective layer corresponding types in the hard disk, per layer of institute
The third element for stating triple is data length of the data that include of the attribute of respective layer corresponding types in the hard disk.
Data storage according to the described second nested Hash table, reads the data of storage to hard disk from the hard disk.Its
Include:1) when data read command is received, the attribute type in the data read command and property value are extracted;2) exist
In described second nested Hash table, target triple corresponding with the attribute type and property value is determined;3) according to the mesh
The second element and third element of mark triple, reads the data of storage from the hard disk.
A kind of data storage device, obtains including the first buffer memory unit, the first Hash table signal generating unit, buffer address value
Unit and hard disk data storage unit is taken, the first buffer memory unit is used for the sequencing according to crawl data, will grab
To the first caching, wherein, data described in each bar have the attribute of N number of same type to a plurality of data storage that gets;Described first
Hash table signal generating unit is used for the position according to many data in the first caching, generates the first nested Hash table, and first
Nested Hash table is made up of N layer the first Hash table nesting;The buffer address value acquiring unit is used for according to default traversal order,
Address value in the n-th layer first Hash table is obtained successively;The hard disk data storage unit is used for according to each institute of acquisition
The order of address value is stated, the corresponding data storage of each described address value is to hard disk in caching described first successively.
The key of first Hash table of each layer is followed successively by the attribute of N number of same type, and ground floor is to N-1 layer institute
The value for stating the first Hash table is followed successively by next layer of the first Hash table, and the value of the first Hash table of n-th layer exists for data described in each bar
Address value in first caching.
Preferably, the hard disk data storage unit includes the second buffer memory subelement and hard disc data storage son list
Unit, the second buffer memory subelement are used for according to the order for obtaining each address value, each in caching described first successively
The corresponding data storage of the individual address value is to the second caching;The hard disc data storing sub-units are used for delaying described second
Data in depositing are stored successively to hard disk.
Preferably, also include the second Hash table signal generating unit, for generating and storing the second nested Hash table, wherein,
Described second nested Hash table is made up of N layer the second Hash table nesting.
The key of per layer of second Hash table be triple, and the value of ground floor to N-1 layer second Hash table according to
Secondary the second Hash table for next layer, the value of the second Hash table of n-th layer is address value of the data in the hard disk described in each bar.
First element of each layer triple is followed successively by the attribute of N number of same type, per layer of triple
Second element is initial address of the data that include of the attribute of respective layer corresponding types in the hard disk, per layer of triple
Third element be data length of the data that include of attribute of respective layer corresponding types in the hard disk.
After data Cun Chudao hard disk, read from the hard disk according to second set of Hash table by hard disc data reading unit
The data of storage are taken, the hard disc data reading unit includes:1) command reception subelement is read, and data reading is received for working as
During instruction fetch, the attribute type in the data read command and property value is extracted;2) target triple determination subelement, is used for
In the described second nested Hash table, target triple corresponding with the attribute type and property value is determined;3) hard disc data
Subelement is read, and for the second element according to the target triple and third element, storage is read from the hard disk
Data.
Compared with prior art, the date storage method of the present invention has beneficial effect following prominent:The number of the present invention
The data for grabbing are temporary in the buffer according to time order and function order according to storage method, then record with same type attribute
Data position in the buffer, the content using record is by the data in caching according to storing hard disk after attribute type sequence
In, that is, there is the data Coutinuous store of same type attribute, continuously data can be read according to attribute conditions when reading, improve
Data reading performance using redundancy, overcomes in prior art and is sequentially stored in hard disk the data for grabbing according to crawl sequencing,
Cause to be unfavorable for the deficiency of digital independent.
Description of the drawings
Fig. 1 is the flow chart of date storage method of the present invention;
Fig. 2 is the flow chart of the reading data storage method of the present invention;
Fig. 3 is the structural representation of data storage device of the present invention;
Fig. 4 is the structural representation of the device of the reading data storage of the present invention.
Specific embodiment
Below in conjunction with drawings and Examples, the date storage method and device to the present invention is described in further detail.
Embodiment
As shown in figure 1, the date storage method of the present invention:The data for grabbing are temporarily stored according to time order and function order slow
In depositing, data of the record with same type attribute position in the buffer, the data in caching are pressed using the content of record
According to storing in hard disk after attribute type sequence, data Coutinuous store that will be with same type attribute.The data storage side
The concretely comprising the following steps of method:
S1:The sequencing of foundation crawl data, by a plurality of data storage for grabbing in the first caching, wherein, respectively
Data described in bar have the attribute of N number of same type, and N is the integer more than 1.
When needing network data is analyzed, using gripping tool, data are captured from network, and according to crawl
The time order and function order for arriving, stores data in caching, wherein can be within the default time period, whenever gripping tool is captured
To a network data, will the network data be deposited into caching in, or, gripping tool grabs many in preset time period
Bar network data, then, according to crawl order, many data is put in caching.Network data has attribute, crawl
Many data have the attribute of same type.The attribute type of data can be time, physical address, data direction, IP session
Packet, TCP session data bag etc., then read all data that demand can be that certain physical address sends in certain time period
Bag, then the attribute type for storing in the first nested Hash table is time, physical address and data direction respectively;Or read demand
All packets of certain physical address reception in certain time period, then the attribute type difference for storing in the first nested Hash table
It is time, physical address and data direction;Or reading demand is all packets of certain IP session certain time period Nei, then read
The demand of taking is time and IP session data bag;Or reading demand is all packets of certain TCP session certain time period Nei, then
Reading demand is time and TCP session data bag.Certainly, above, several properties type and reading demand are only example, specifically
Situation can be different because of the data for grabbing, the embodiment of the present invention is simultaneously not specifically limited.
As a example by grabbing the packet in TCP session, each packet includes " source IP address ", " source port ", " mesh
Mark IP address " and " target port ", these four attributes are the attribute of same type.
S2:According to position of many data in the first caching, the first nested Hash table is generated, wherein, described the
One nested Hash table is made up of N layer the first Hash table nesting, and the key of each layer first Hash table is followed successively by described N number of mutually similar
The attribute of type, and ground floor is followed successively by next layer of the first Hash table to the value of N-1 layer first Hash table, n-th layer
The value of one Hash table is address value of the data described in each bar in the first caching.
In order to the data in step S1 are resequenced, data of the record with same type attribute are needed in the buffer
Position, is recorded by the way of the Hash table in the present invention.Hash table is utilized can be deposited in key-value pair data storage, i.e. Hash table
Several Hash records are stored up, each Hash record has a key and a value, and both have corresponding relation.Generate in the present invention
Hash table be nested multilayer Hash table, nested form is, the value of ground floor to N-1 layer Hash table from inside to outside according to
The secondary Hash table for next layer.The number of plies of nested Hash table is equal with the number of data identical attribute type, i.e., in step S1
N value equal with the N value in step S2.Key in every layer of Hash table is a type of attribute, the nested Hash for so generating
Table is using the data of key record same type attribute in every layer of Hash table.Hash record number in every layer of Hash table can be one
Individual, or multiple, in per layer, the key of each Hash record is the attribute of same type, but property value differs.Most
It is data address in the buffer that internal layer is the value of n-th layer Hash table.
The process for generating the first nested Hash table is:Data storage is in the buffer, that is to say, that per data all to one
Address in caching.As every data is respectively provided with the attribute of N type, according to the attribute of data itself each type attribute
It is worth, in layer the property value identical Hash table record of key for searching content and data, until finding the Hash of n-th layer Hash table
Record, data address in the buffer is stored in the value of the Hash record for finding, and so can generate repeatedly the
One nested Hash table.
The form of the first nested Hash table is as shown in table 1 below:For example, the data for grabbing include the category of three same types
Property, respectively attribute A, attribute B and attribute C, Hash table as shown in table 1, the Hash table include three layers, wherein, from the inside to the outside,
The key of ground floor Hash table belongs to attribute A, and the key of each Hash record is respectively a1、a2……an, the value of each Hash record is
Second layer Hash table, the key of second layer Hash table represent attribute B, and the key of each Hash record is respectively b1、b2……bn, each
The value of Hash record is third layer Hash table, and the key of third layer Hash table represents attribute C, and the key of each Hash record is respectively
c1、c-2……cn, the value of each Hash record is the buffer address of data.
Table 1:
From the point of view of innermost layer, the property value of the pieces of data that data acquisition system 1 includes is respectively a1、b1And c1, data acquisition system 2
Comprising the property value of pieces of data be respectively a2、b2And c2, the property value of the pieces of data that data acquisition system n includes is respectively an、
bnAnd cn.
From in terms of outermost layer, data are polymerized according to attribute A, i.e., all data are classified according to the value of attribute A,
The data that the value of attribute A is different are belonging respectively to different Hash records.From in terms of the second layer, on the basis of classification of ground floor, number
According to being polymerized according to attribute B, i.e. the data that the value of attribute B is different are belonging respectively to different Hash records.The like,
Until most interior one layer of Hash table.It should be noted that each Hash record in each layer of Hash table is with storage order
, this is sequentially crawl time order and function order.
The nested Hash table of first as shown in Table 1 can be seen that the first nested Hash table and be divided using the key of every layer of Hash table
The buffer address of the data of other recording different types attribute, and when the same type of property value of multiple data is identical, this is multiple
The buffer address value of data can be stored in same Hash record.
The detailed process for generating shown in above-mentioned table 1 first nested Hash table is:
Assume that the data for grabbing have the attribute of three types, respectively attribute A, attribute B and attribute C.Grab
Data are respectively (d1,d2,d3….dn), pieces of data is kept in caching, just can get every data ground in the buffer
Location.
A Hash table HA, Hash table HA are set up as outermost Hash table, the key of Hash table HA is used for storing
The value of the attribute A of the data for grabbing, the Hash table HB being worth for storing next layer;The key of Hash table HB is used for storing crawl
The value of the attribute B of the data for arriving, the Hash table HC being worth for storing next layer;The key of Hash table HC is grabbed for storage
The value of the attribute C of data, is worth for data storage address in the buffer.
The first step, obtains the first data d for grabbing1, the value of the attribute A of first data is Ad, in the Hash
In table HA, key for searching is recorded for the Hash of Ad, if finding, carries out subsequent step, otherwise, adds one in Hash table HA
Key is Ad and the record being worth for empty Hash table.
Second step, it is assumed that (or interpolation) that previous step finds in Hash table HA is recorded as rhb, records the value of rhb
And a Hash table HB.First data d1Attribute B value be Bd, in Hash table HB key for searching for Bd Hash remember
Record, if finding, carries out subsequent step, and otherwise, it is Bd and the note being worth for empty Hash table to add a key in Hash table HB
Record.
3rd step, it is assumed that (or interpolation) that previous step finds in Hash table HB is recorded as rhc, records the value of rhc
And a Hash table HC.The value of the attribute C of first data d1 is Cd, and in Hash table HC, key for searching is remembered for the Hash of Cd
Record, if finding, carries out subsequent step, otherwise, adds a key and be Cd and be worth for null data set conjunction in Hash table HC
Record.
4th step, it is assumed that (or interpolation) that previous step is found in Hash table HC is recorded as rds, then the value of rds is
Data acquisition system, that included in the data acquisition system is d1Address in the buffer, so far, completes the depositing of buffer address of a data
Storage process.Next proceed to obtain the next data d for grabbing2, in the same manner, by data d2Buffer address according to above-mentioned steps
Stored, the first nested Hash table will be generated after the completion of the buffer address storage of the data for grabbing.
Each layer Hash table is described with table 2, table 3 and table 4 below.Wherein:
From the point of view of outermost layer, the buffer address of data is to carry out poly- ordering by merging according to the value of attribute A, as shown in table 2:
Table 2:
Enter next layer to see, the buffer address of data is to carry out poly- ordering by merging according to the value of attribute B, as shown in table 3:
Table 3:
" attribute B " value is b1Data buffer address |
" attribute B " value is b2Data buffer address |
… |
" attribute B " value is bnData buffer address |
Enter next layer to see, the buffer address of data is to carry out poly- ordering by merging according to the value of attribute C, as shown in table 4:
Table 4:
" attribute C " value is c1Data buffer address |
" attribute C " value is c2Data buffer address |
… |
" attribute C " value is cnData buffer address |
S3:According to default traversal order, the address value in the first Hash table of the n-th layer is obtained successively.
S4:According to the order for obtaining each address value, in caching first successively, each described address value is corresponding
Data storage is to hard disk.First by first cache in data storage second caching in, then by second cache in data according to
Secondary store to hard disk.
Specifically, the data in first being cached are stored to the second caching according to the traversal order of the first nested Hash table
In, as the first nested Hash table is classified storage address value according to attribute type, then according to the traversal order by each ground
After location is worth during corresponding data Cun Chudao second are cached, belong to the data meeting Coutinuous store of same type attribute in the second caching,
Data in second caching are stored successively to hard disk, the data so as to store in hard disk are also continuously to deposit according to attribute type
Storage.In this kind of implementation, the data storage in being cached first using the second caching is to hard disk, it is to avoid directly delay first
When data storage in depositing is to hard disk, the hard disk I/O interface pressure that repeatedly write hard disk is caused.
The reading of data for convenience, generates and stores the second Hash table, and second Hash table is used for recording each type
The corresponding data of attribute initial address in a hard disk and data length.Specifically, the second nested Hash table is breathed out by N layer second
The nested composition of uncommon table, the key of per layer of second Hash table is triple, and ground floor is followed successively by the value of the second Hash table of N-1 layer
Next layer of the second Hash table, the value of the second Hash table of n-th layer is pieces of data address value in a hard disk;Each layer triple
First element is followed successively by the attribute of N number of same type, and the second element of every layer of triple is the attribute bag of respective layer corresponding types
The data for containing initial address in a hard disk, the third element of every layer of triple are the number that the attribute of respective layer corresponding types includes
According to data length in a hard disk.Second nested Hash table is using triple record attribute value, the starting point of data storage of key
Location and data length.The concrete form of the table is can be found in shown in table 5.
Table 5:
As can be seen from Table 5, the property value of attribute A is respectively (a1、a2…an), the property value difference (b of attribute B1、b2…
bn), the property value of attribute C is respectively (c1、c2…cn);Property value is a1Continuous data initial address be addr_a1, data
Length is len_a1, property value is a1And b1Continuous data initial address be addr_b1, data length be len_b1, with this
Analogize, it is known that other data.Digital independent can be carried out using the second nested Hash table, described in detail below:According to crawl
The attribute type of the data for arriving, using each embodiment above-mentioned by the data storage for grabbing to hard disk, to facilitate digital independent,
I.e. when data read command is received, according to the generate second nested Hash table, the data of storage are read from hard disk.Specifically
Ground, as shown in Fig. 2 the reading process includes:
1) when data read command is received, the attribute type in data read command and property value are extracted.
Wherein, data read command includes attribute type and property value.For example, attribute type is time period and four-tuple
That is source IP address, source port, target ip address and target port.
2):In the second nested Hash table, target triple corresponding with attribute type and property value is determined.
By taking the second nested Hash table shown in above-mentioned table 5 as an example, it is the attribute type for extracting to search attribute type in the table
And property value is the triple of the property value for extracting, it is assumed that attribute type is A and property value is a1, then the triple for finding is
(a1,addr_a1,len_a1);And for example attribute type is A and B, and property value is respectively a1And b2, then the triple for finding is
(b2,addr_b2,len_b2).
3):According to second element and the third element of target triple, the data of storage are read from hard disk.I.e. with target
The initial address of the second element of triple is starting point, reads the consecutive numbers of the data length that third element represents from hard disk
According to this section of continuous data is the data for needing to read.
For example, the attribute type of the data for grabbing includes time and TCP session data bag, further, TCP session number
It is four-tuple according to the attribute type of bag, then the data type of storage includes time and four-tuple, when by data according to above-mentioned reality
Apply example offer storage mode store in hard disk after, data are reset according to attribute type, that is, have same time and/
Or record has same time and/or the phase of Coutinuous store in the data Coutinuous store of identical four-tuple, and the second nested Hash table
With initial address and the data length of the data of four-tuple, so when the data of certain four-tuple in certain time period are read, only
Initial address and the data length of the segment data need to be found in the second nested Hash table, started to read from the initial address and be somebody's turn to do
The continuous data of segment data length.
As shown in figure 3, the data storage device of the present invention includes that the first buffer memory unit, the first Hash table generate list
Unit, buffer address value acquiring unit and hard disk data storage unit.
First buffer memory unit is used for the sequencing according to crawl data, by a plurality of data storage for grabbing to the
One caching, wherein, pieces of data has the attribute of N number of same type.
First Hash table signal generating unit is used for the position according to many data in the first caching, generates the first nested Hash
Table, the first nested Hash table are made up of N layer the first Hash table nesting.The key of the first Hash table of each layer is followed successively by N number of same type
Attribute, and ground floor is followed successively by next layer of the first Hash table, the first Hash of n-th layer to the value of the first Hash table of N-1 layer
The value of table is address value of the pieces of data in the first caching.
Buffer address value acquiring unit is used for, according to traversal order is preset, obtaining the ground in the first Hash table of n-th layer successively
Location is worth.
Hard disk data storage unit is used for according to the order for obtaining each address value, each address in caching first successively
It is worth corresponding data storage to hard disk.
Hard disk data storage unit includes the second buffer memory subelement and hard disc data storing sub-units, and second deposits
Storage subelement is used for according to the order for obtaining each address value, the corresponding data storage of each address value in caching first successively
To the second caching.Data during hard disc data storing sub-units are used for caching second are stored successively to hard disk.
The data storage device also includes the second Hash table signal generating unit, for generating and storing the second nested Hash table;
Wherein, the second nested Hash table is made up of N layer the second Hash table nesting, and the key of per layer of second Hash table is triple, and first
The value of layer to the second Hash table of N-1 layer is followed successively by next layer of the second Hash table, and the value of the second Hash table of n-th layer is each bar number
According to address value in a hard disk.First element of each layer triple is followed successively by the attribute of N number of same type, and the of every layer of triple
Was Used is the data that include of the attribute of respective layer corresponding types initial address in a hard disk, the third element of every layer of triple
The data included for the attribute of respective layer corresponding types data length in a hard disk.
The data storage device also includes hard disc data reading unit, for according to the second nested Hash table, from hard disk
Read the data of storage.Hard disc data reading unit include read command reception subelement, target triple determination subelement and
Hard disc data reads subelement.Wherein, reading command reception subelement is used for, when data read command is received, extracting data
Read the attribute type in instruction and property value;Target triple determination subelement is used in the second nested Hash table, determines
Target triple corresponding with attribute type and property value;Hard disc data reads subelement is used for according to target triple second
Element and third element, read the data of storage from hard disk.
Embodiment described above, the simply present invention more preferably specific embodiment, those skilled in the art is at this
The usual variations and alternatives carried out in the range of inventive technique scheme all should be comprising within the scope of the present invention.
Claims (7)
1. a kind of date storage method, it is characterised in that:The data for grabbing are temporary in the buffer according to time order and function order,
Data of the record with same type attribute position in the buffer, the content using record is by the data in caching according to attribute
Store after byte orderings in hard disk, data Coutinuous store that will be with same type attribute, the tool of the date storage method
Body step is:
S1:According to the sequencing of crawl data, by a plurality of data storage for grabbing in the first caching, wherein, each bar institute
The attribute that data have N number of same type is stated, N is the integer more than 1;
S2:According to position of many data in the first caching, the first nested Hash table is generated, wherein, described first is embedding
Set Hash table is made up of N layer the first Hash table nesting, and the key of each layer first Hash table is followed successively by N number of same type
Attribute, and ground floor is followed successively by next layer of the first Hash table to the value of N-1 layer first Hash table, n-th layer first is breathed out
The value of uncommon table is address value of the data described in each bar in the first caching;
S3:According to default traversal order, the address value in the first Hash table of the n-th layer is obtained successively;
S4:According to the order for obtaining each address value, the corresponding data of each described address value in caching first successively
Store to hard disk.
2. date storage method according to claim 1, it is characterised in that:In step S4, according to obtaining, each is described
The order of location value, in caching first successively, the corresponding data storage of each described address value to hard disk is included according to obtaining each
The order of the address value, in caching described first successively, the corresponding data storage of each described address value is to the second caching
In, the data during second is cached are stored successively to hard disk.
3. date storage method according to claim 2, it is characterised in that:Also include to generate and store the second nested Hash
Table, wherein, the described second nested Hash table is made up of N layer the second Hash table nesting, and the key of per layer of second Hash table is three
Tuple, and ground floor is followed successively by next layer of the second Hash table to the value of N-1 layer second Hash table, n-th layer second is breathed out
The value of uncommon table is address value of the data in the hard disk described in each bar.
4. date storage method according to claim 3, it is characterised in that:First element of every layer of triple is successively
For the attribute of N number of same type, the second element of per layer of triple is that the attribute of respective layer corresponding types includes
Initial address of the data in the hard disk, the third element of per layer of triple are that the attribute of respective layer corresponding types includes
Data length of the data in the hard disk.
5. a kind of data storage device, it is characterised in that:Including the first buffer memory unit, the first Hash table signal generating unit, delay
Address value acquiring unit and hard disk data storage unit is deposited, the first buffer memory unit is used for the priority according to crawl data
Sequentially, by a plurality of data storage for grabbing to the first caching, wherein, data described in each bar have the attribute of N number of same type;
The first Hash table signal generating unit is used for the position according to many data in the first caching, generates the first nested Hash
Table, the first nested Hash table are made up of N layer the first Hash table nesting;The buffer address value acquiring unit is used for according to default time
Order is gone through, obtains the address value in the first Hash table of the n-th layer successively;The hard disk data storage unit is used for according to acquisition
The order of each address value, in caching described first successively, the corresponding data storage of each described address value is to hard disk.
6. data storage device according to claim 5, it is characterised in that:The hard disk data storage unit includes second
Buffer memory subelement and hard disc data storing sub-units, the second buffer memory subelement are used for according to each described address of acquisition
The order of value, in caching described first successively, the corresponding data storage of each described address value is to the second caching;Described hard
Data during disk data storage subunit operable is used for caching described second are stored successively to hard disk.
7. data storage device according to claim 6, it is characterised in that:Also include the second Hash table signal generating unit, use
In generating and storing the second nested Hash table, wherein, the described second nested Hash table is made up of N layer the second Hash table nesting.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610890096.2A CN106484332A (en) | 2016-10-12 | 2016-10-12 | A kind of date storage method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610890096.2A CN106484332A (en) | 2016-10-12 | 2016-10-12 | A kind of date storage method and device |
Publications (1)
Publication Number | Publication Date |
---|---|
CN106484332A true CN106484332A (en) | 2017-03-08 |
Family
ID=58270677
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610890096.2A Pending CN106484332A (en) | 2016-10-12 | 2016-10-12 | A kind of date storage method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106484332A (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107315801A (en) * | 2017-06-22 | 2017-11-03 | 中国人民解放军国防科学技术大学 | Parallel Discrete Event Simulation system initialization date storage method |
CN107453948A (en) * | 2017-07-28 | 2017-12-08 | 北京邮电大学 | The storage method and system of a kind of network measurement data |
CN111708720A (en) * | 2020-08-20 | 2020-09-25 | 北京思明启创科技有限公司 | Data caching method, device, equipment and medium |
CN112206532A (en) * | 2020-10-19 | 2021-01-12 | 珠海金山网络游戏科技有限公司 | Object processing method and device |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104407991A (en) * | 2014-12-10 | 2015-03-11 | 成都科来软件有限公司 | Data storage method and device |
US20160110392A1 (en) * | 2013-09-16 | 2016-04-21 | Netapp, Inc. | Dense tree volume metadata organization |
-
2016
- 2016-10-12 CN CN201610890096.2A patent/CN106484332A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20160110392A1 (en) * | 2013-09-16 | 2016-04-21 | Netapp, Inc. | Dense tree volume metadata organization |
CN104407991A (en) * | 2014-12-10 | 2015-03-11 | 成都科来软件有限公司 | Data storage method and device |
CN105335300A (en) * | 2014-12-10 | 2016-02-17 | 成都科来软件有限公司 | Data storage method and device |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107315801A (en) * | 2017-06-22 | 2017-11-03 | 中国人民解放军国防科学技术大学 | Parallel Discrete Event Simulation system initialization date storage method |
CN107315801B (en) * | 2017-06-22 | 2019-12-13 | 中国人民解放军国防科学技术大学 | parallel discrete event simulation system initialization data storage method |
CN107453948A (en) * | 2017-07-28 | 2017-12-08 | 北京邮电大学 | The storage method and system of a kind of network measurement data |
CN111708720A (en) * | 2020-08-20 | 2020-09-25 | 北京思明启创科技有限公司 | Data caching method, device, equipment and medium |
CN112206532A (en) * | 2020-10-19 | 2021-01-12 | 珠海金山网络游戏科技有限公司 | Object processing method and device |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN105335300B (en) | A kind of date storage method and device | |
CN106484332A (en) | A kind of date storage method and device | |
KR102307957B1 (en) | Stream selection for multi-stream storage | |
US10706105B2 (en) | Merge tree garbage metrics | |
US10725988B2 (en) | KVS tree | |
CN105138476B (en) | A kind of date storage method and system based on the storage of hadoop isomeries | |
CN103581331B (en) | The online moving method of virtual machine and system | |
US8898208B2 (en) | Method and system for processing images | |
CN103514210B (en) | Small documents processing method and processing device | |
CN105045528B (en) | A kind of quick FAT32 disk partition traversal and file search method | |
CN105095294B (en) | The method and device of isomery copy is managed in a kind of distributed memory system | |
WO2009076854A1 (en) | Data cache system and method for realizing high capacity cache | |
CN107798106B (en) | URL duplication removing method in distributed crawler system | |
CN107707600A (en) | A kind of date storage method and device | |
CN107704203A (en) | It polymerize delet method, device, equipment and the computer-readable storage medium of big file | |
CN107679212A (en) | A kind of data query optimization method for being applied to jump list data structure | |
CN104424204B (en) | Indexing Mechanism merging method, searching method, device and equipment | |
CN107807793A (en) | The storage of data trnascription isomery and access method in distributed computer storage system | |
CN106326464B (en) | A kind of network session packet indexing means based on retrieval information projection | |
CN109522242A (en) | A kind of method and apparatus for searching for Cache data | |
CN108846039A (en) | Data flow determines method and device | |
CN103646056B (en) | Method for storing and extracting historical data based on characteristic value storage | |
CN104572420A (en) | Information processing method and system | |
CN111723266A (en) | Mass data processing method and device | |
WO2019201091A1 (en) | Data processing method and device, and computer readable storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20170308 |