CN106484332A

CN106484332A - A kind of date storage method and device

Info

Publication number: CN106484332A
Application number: CN201610890096.2A
Authority: CN
Inventors: 陈思聪
Original assignee: Zhengzhou Yunhai Information Technology Co Ltd
Current assignee: Zhengzhou Yunhai Information Technology Co Ltd
Priority date: 2016-10-12
Filing date: 2016-10-12
Publication date: 2017-03-08

Abstract

The invention discloses a kind of date storage method and device, belong to network data analysis technical field.The date storage method of the present invention is temporary in the buffer according to time order and function order by the data for grabbing, data of the record with same type attribute position in the buffer, content using record by the data in caching according to storing in hard disk after attribute type sequence, data Coutinuous store that will be with same type attribute.The date storage method of the invention can improve the reading efficiency of data, convenient network data is carried out point, with good application value.

Description

A kind of date storage method and device

Technical field

The present invention relates to network data analysis technical field, specifically provides a kind of date storage method and device.

Background technology

In network data analysis are carried out, it is often necessary to capture the data in network and stored, so as to follow-up logarithm According to be read out analysis.At present, the storage mode of network data is, by the data for grabbing according to crawl sequencing, according to Secondary storage is in a hard disk.However, this kind of storage mode is unfavorable for the reading of data, that is, read the data stored using this kind of mode When less efficient, be unfavorable for the analysis of network data.

Content of the invention

The technical assignment of the present invention is for above-mentioned problem, provides a kind of reading efficiency that can improve data, side Just date storage method network data being analyzed.

The further technical assignment of the present invention is to provide a kind of reading efficiency that can realize above-mentioned raising data, and it is right to facilitate The data storage device of the method that network data is analyzed.

For achieving the above object, the invention provides following technical scheme：

A kind of date storage method and device, the data for grabbing are temporary in the buffer according to time order and function order, remember Data of the record with same type attribute position in the buffer, the content using record is by the data in caching according to Attribute class Store in hard disk after type sequence, data Coutinuous store that will be with same type attribute, the date storage method concrete Step is：

S1：The sequencing of foundation crawl data, by a plurality of data storage for grabbing in the first caching, wherein, respectively Data described in bar have the attribute of N number of same type, and N is the integer more than 1；

S2：According to position of many data in the first caching, the first nested Hash table is generated, wherein, described the One nested Hash table is made up of N layer the first Hash table nesting, and the key of each layer first Hash table is followed successively by described N number of mutually similar The attribute of type, and ground floor is followed successively by next layer of the first Hash table to the value of N-1 layer first Hash table, n-th layer The value of one Hash table is address value of the data described in each bar in the first caching；

S3：According to default traversal order, the address value in the first Hash table of the n-th layer is obtained successively；

S4：According to the order for obtaining each address value, in caching first successively, each described address value is corresponding Data storage is to hard disk.

In step S1, when needing network data is analyzed, using gripping tool, from network, data are captured, And according to the time order and function order for grabbing, store data in caching, wherein can be within the default time period, whenever grabbing The instrument of taking grabs a network data, will the network data be deposited into caching in, or, gripping tool is in preset time period A plurality of network data is inside grabbed, then, according to crawl order, many data is deposited in caching.Network data has Attribute, many data of crawl have the attribute of same type.

In order to the data in step S1 are resequenced, data of the record with same type attribute are needed in the buffer Position, is recorded by the way of the Hash table in the present invention.Hash table is utilized can be deposited in key-value pair data storage, i.e. Hash table Several Hash records are stored up, each Hash record has a key and a value, and both have corresponding relation.Generate in the present invention Hash table be nested multilayer Hash table, nested form is, the value of ground floor to N-1 layer Hash table from inside to outside according to The secondary Hash table for next layer.The number of plies of the nesting Hash table is equal with the number of data identical attribute type, i.e. step N value in S1 is equal with the N value in step S2.Key in every layer of Hash table is a type of attribute, the nesting for so generating Hash table is using the data of key record same type attribute in every layer of Hash table.Hash record number in every layer of Hash table is permissible For one, or multiple, in per layer, the key of each Hash record is the attribute of same type, but property value not phase With.It is data address in the buffer that innermost layer is the value of n-th layer Hash table.

The process for generating the first nested Hash table is：Data storage is in the buffer, that is to say, that per data all to one Address in caching.As every data is respectively provided with the attribute of N type, according to the attribute of data itself each type attribute It is worth, in layer the property value identical Hash table record of key for searching content and data, until finding the Hash of n-th layer Hash table Record, data address in the buffer is stored in the value of the Hash record for finding, and so can generate repeatedly the One nested Hash table.

The first nested Hash table that step S2 is generated includes multilayer Hash table, and Hash table has default traversal order, Similar with array traversal order, traveled through according to the content of key successively.Specifically, from the beginning of ground floor Hash table, when ground floor In Hash table when being recorded as multiple, then according to preset order, from the beginning of first Hash record.When the value of the Hash record is During next Hash table, then enter in next layer of Hash table, start to judge from first record of next layer of Hash table, when under this In one layer of Hash table Hash record value be under next layer of Hash table when, enter next layer of Hash table under this, judge that this is lower The value of the Hash record in one layer of Hash table, repeat the above steps, until the value of the Hash record in certain layer of Hash table is data Address in the buffer, now, obtains the address in the caching, so gets the value of first Hash record in n-th layer. The value of in n-th layer other Hash records is got in the same manner.After traversal terminates, the value for getting is each data ground in the buffer Location is worth, and the acquisition order of address value is got according to the attribute type classification of data.

In being stored the data in each address value to hard disk successively in step S4, the order of storage is for getting each ground The order of location value.So, by the original data according to crawl sequencing storage, according to attribute type Coutinuous store, Ci Zhongcun Scattered data are condensed together by storage mode according to attribute type, so as to facilitating and quickly find tool when reading There are the data of same alike result type, data reading performance using redundancy is high.

Step S3 and step S4 can be alternately performed, i.e., whenever step S3 gets address value, just according to the address Value, finds the corresponding data of the address value in the buffer, and then execution step S4 is by data storage to hard disk.

Preferably, in step S4, according to the order for obtaining each address value, each institute in caching first successively Stating the corresponding data storage of address value to hard disk includes, according to the order for obtaining each address value, to delay described first successively The corresponding data storage of each described address value is deposited to the second caching, the data during second is cached are stored successively to hard Disk.

Preferably, also including to generate and store the second nested Hash table, wherein, the described second nested Hash table is by N layer The nested composition of second Hash table, the key of per layer of second Hash table is triple, and ground floor is breathed out to N-1 layer described second The value of uncommon table is followed successively by next layer of the second Hash table, and the value of the second Hash table of n-th layer is data described in each bar in the hard disk In address value.

Preferably, the first element of every layer of triple is followed successively by the attribute of N number of same type, described in per layer The second element of triple is initial address of the data that include of the attribute of respective layer corresponding types in the hard disk, per layer of institute The third element for stating triple is data length of the data that include of the attribute of respective layer corresponding types in the hard disk.

Data storage according to the described second nested Hash table, reads the data of storage to hard disk from the hard disk.Its Include：1) when data read command is received, the attribute type in the data read command and property value are extracted；2) exist In described second nested Hash table, target triple corresponding with the attribute type and property value is determined；3) according to the mesh The second element and third element of mark triple, reads the data of storage from the hard disk.

A kind of data storage device, obtains including the first buffer memory unit, the first Hash table signal generating unit, buffer address value Unit and hard disk data storage unit is taken, the first buffer memory unit is used for the sequencing according to crawl data, will grab To the first caching, wherein, data described in each bar have the attribute of N number of same type to a plurality of data storage that gets；Described first Hash table signal generating unit is used for the position according to many data in the first caching, generates the first nested Hash table, and first Nested Hash table is made up of N layer the first Hash table nesting；The buffer address value acquiring unit is used for according to default traversal order, Address value in the n-th layer first Hash table is obtained successively；The hard disk data storage unit is used for according to each institute of acquisition The order of address value is stated, the corresponding data storage of each described address value is to hard disk in caching described first successively.

The key of first Hash table of each layer is followed successively by the attribute of N number of same type, and ground floor is to N-1 layer institute The value for stating the first Hash table is followed successively by next layer of the first Hash table, and the value of the first Hash table of n-th layer exists for data described in each bar Address value in first caching.

Preferably, the hard disk data storage unit includes the second buffer memory subelement and hard disc data storage son list Unit, the second buffer memory subelement are used for according to the order for obtaining each address value, each in caching described first successively The corresponding data storage of the individual address value is to the second caching；The hard disc data storing sub-units are used for delaying described second Data in depositing are stored successively to hard disk.

Preferably, also include the second Hash table signal generating unit, for generating and storing the second nested Hash table, wherein, Described second nested Hash table is made up of N layer the second Hash table nesting.

The key of per layer of second Hash table be triple, and the value of ground floor to N-1 layer second Hash table according to Secondary the second Hash table for next layer, the value of the second Hash table of n-th layer is address value of the data in the hard disk described in each bar.

First element of each layer triple is followed successively by the attribute of N number of same type, per layer of triple Second element is initial address of the data that include of the attribute of respective layer corresponding types in the hard disk, per layer of triple Third element be data length of the data that include of attribute of respective layer corresponding types in the hard disk.

After data Cun Chudao hard disk, read from the hard disk according to second set of Hash table by hard disc data reading unit The data of storage are taken, the hard disc data reading unit includes：1) command reception subelement is read, and data reading is received for working as During instruction fetch, the attribute type in the data read command and property value is extracted；2) target triple determination subelement, is used for In the described second nested Hash table, target triple corresponding with the attribute type and property value is determined；3) hard disc data Subelement is read, and for the second element according to the target triple and third element, storage is read from the hard disk Data.

Compared with prior art, the date storage method of the present invention has beneficial effect following prominent：The number of the present invention The data for grabbing are temporary in the buffer according to time order and function order according to storage method, then record with same type attribute Data position in the buffer, the content using record is by the data in caching according to storing hard disk after attribute type sequence In, that is, there is the data Coutinuous store of same type attribute, continuously data can be read according to attribute conditions when reading, improve Data reading performance using redundancy, overcomes in prior art and is sequentially stored in hard disk the data for grabbing according to crawl sequencing, Cause to be unfavorable for the deficiency of digital independent.

Description of the drawings

Fig. 1 is the flow chart of date storage method of the present invention；

Fig. 2 is the flow chart of the reading data storage method of the present invention；

Fig. 3 is the structural representation of data storage device of the present invention；

Fig. 4 is the structural representation of the device of the reading data storage of the present invention.

Specific embodiment

Below in conjunction with drawings and Examples, the date storage method and device to the present invention is described in further detail.

Embodiment

As shown in figure 1, the date storage method of the present invention：The data for grabbing are temporarily stored according to time order and function order slow In depositing, data of the record with same type attribute position in the buffer, the data in caching are pressed using the content of record According to storing in hard disk after attribute type sequence, data Coutinuous store that will be with same type attribute.The data storage side The concretely comprising the following steps of method：

S1：The sequencing of foundation crawl data, by a plurality of data storage for grabbing in the first caching, wherein, respectively Data described in bar have the attribute of N number of same type, and N is the integer more than 1.

When needing network data is analyzed, using gripping tool, data are captured from network, and according to crawl The time order and function order for arriving, stores data in caching, wherein can be within the default time period, whenever gripping tool is captured To a network data, will the network data be deposited into caching in, or, gripping tool grabs many in preset time period Bar network data, then, according to crawl order, many data is put in caching.Network data has attribute, crawl Many data have the attribute of same type.The attribute type of data can be time, physical address, data direction, IP session Packet, TCP session data bag etc., then read all data that demand can be that certain physical address sends in certain time period Bag, then the attribute type for storing in the first nested Hash table is time, physical address and data direction respectively；Or read demand All packets of certain physical address reception in certain time period, then the attribute type difference for storing in the first nested Hash table It is time, physical address and data direction；Or reading demand is all packets of certain IP session certain time period Nei, then read The demand of taking is time and IP session data bag；Or reading demand is all packets of certain TCP session certain time period Nei, then Reading demand is time and TCP session data bag.Certainly, above, several properties type and reading demand are only example, specifically Situation can be different because of the data for grabbing, the embodiment of the present invention is simultaneously not specifically limited.

As a example by grabbing the packet in TCP session, each packet includes " source IP address ", " source port ", " mesh Mark IP address " and " target port ", these four attributes are the attribute of same type.

S2：According to position of many data in the first caching, the first nested Hash table is generated, wherein, described the One nested Hash table is made up of N layer the first Hash table nesting, and the key of each layer first Hash table is followed successively by described N number of mutually similar The attribute of type, and ground floor is followed successively by next layer of the first Hash table to the value of N-1 layer first Hash table, n-th layer The value of one Hash table is address value of the data described in each bar in the first caching.

In order to the data in step S1 are resequenced, data of the record with same type attribute are needed in the buffer Position, is recorded by the way of the Hash table in the present invention.Hash table is utilized can be deposited in key-value pair data storage, i.e. Hash table Several Hash records are stored up, each Hash record has a key and a value, and both have corresponding relation.Generate in the present invention Hash table be nested multilayer Hash table, nested form is, the value of ground floor to N-1 layer Hash table from inside to outside according to The secondary Hash table for next layer.The number of plies of nested Hash table is equal with the number of data identical attribute type, i.e., in step S1 N value equal with the N value in step S2.Key in every layer of Hash table is a type of attribute, the nested Hash for so generating Table is using the data of key record same type attribute in every layer of Hash table.Hash record number in every layer of Hash table can be one Individual, or multiple, in per layer, the key of each Hash record is the attribute of same type, but property value differs.Most It is data address in the buffer that internal layer is the value of n-th layer Hash table.

The form of the first nested Hash table is as shown in table 1 below：For example, the data for grabbing include the category of three same types Property, respectively attribute A, attribute B and attribute C, Hash table as shown in table 1, the Hash table include three layers, wherein, from the inside to the outside, The key of ground floor Hash table belongs to attribute A, and the key of each Hash record is respectively a₁、a₂……a_n, the value of each Hash record is Second layer Hash table, the key of second layer Hash table represent attribute B, and the key of each Hash record is respectively b₁、b₂……b_n, each The value of Hash record is third layer Hash table, and the key of third layer Hash table represents attribute C, and the key of each Hash record is respectively c₁、c-₂……c_n, the value of each Hash record is the buffer address of data.

Table 1：

From the point of view of innermost layer, the property value of the pieces of data that data acquisition system 1 includes is respectively a₁、b₁And c₁, data acquisition system 2 Comprising the property value of pieces of data be respectively a₂、b₂And c₂, the property value of the pieces of data that data acquisition system n includes is respectively a_n、 b_nAnd c_n.

From in terms of outermost layer, data are polymerized according to attribute A, i.e., all data are classified according to the value of attribute A, The data that the value of attribute A is different are belonging respectively to different Hash records.From in terms of the second layer, on the basis of classification of ground floor, number According to being polymerized according to attribute B, i.e. the data that the value of attribute B is different are belonging respectively to different Hash records.The like, Until most interior one layer of Hash table.It should be noted that each Hash record in each layer of Hash table is with storage order , this is sequentially crawl time order and function order.

The nested Hash table of first as shown in Table 1 can be seen that the first nested Hash table and be divided using the key of every layer of Hash table The buffer address of the data of other recording different types attribute, and when the same type of property value of multiple data is identical, this is multiple The buffer address value of data can be stored in same Hash record.

The detailed process for generating shown in above-mentioned table 1 first nested Hash table is：

Assume that the data for grabbing have the attribute of three types, respectively attribute A, attribute B and attribute C.Grab Data are respectively (d₁,d₂,d₃….d_n), pieces of data is kept in caching, just can get every data ground in the buffer Location.

A Hash table HA, Hash table HA are set up as outermost Hash table, the key of Hash table HA is used for storing The value of the attribute A of the data for grabbing, the Hash table HB being worth for storing next layer；The key of Hash table HB is used for storing crawl The value of the attribute B of the data for arriving, the Hash table HC being worth for storing next layer；The key of Hash table HC is grabbed for storage The value of the attribute C of data, is worth for data storage address in the buffer.

The first step, obtains the first data d for grabbing₁, the value of the attribute A of first data is Ad, in the Hash In table HA, key for searching is recorded for the Hash of Ad, if finding, carries out subsequent step, otherwise, adds one in Hash table HA Key is Ad and the record being worth for empty Hash table.

Second step, it is assumed that (or interpolation) that previous step finds in Hash table HA is recorded as rhb, records the value of rhb And a Hash table HB.First data d₁Attribute B value be Bd, in Hash table HB key for searching for Bd Hash remember Record, if finding, carries out subsequent step, and otherwise, it is Bd and the note being worth for empty Hash table to add a key in Hash table HB Record.

3rd step, it is assumed that (or interpolation) that previous step finds in Hash table HB is recorded as rhc, records the value of rhc And a Hash table HC.The value of the attribute C of first data d1 is Cd, and in Hash table HC, key for searching is remembered for the Hash of Cd Record, if finding, carries out subsequent step, otherwise, adds a key and be Cd and be worth for null data set conjunction in Hash table HC Record.

4th step, it is assumed that (or interpolation) that previous step is found in Hash table HC is recorded as rds, then the value of rds is Data acquisition system, that included in the data acquisition system is d₁Address in the buffer, so far, completes the depositing of buffer address of a data Storage process.Next proceed to obtain the next data d for grabbing₂, in the same manner, by data d₂Buffer address according to above-mentioned steps Stored, the first nested Hash table will be generated after the completion of the buffer address storage of the data for grabbing.

Each layer Hash table is described with table 2, table 3 and table 4 below.Wherein：

From the point of view of outermost layer, the buffer address of data is to carry out poly- ordering by merging according to the value of attribute A, as shown in table 2：

Table 2：

Enter next layer to see, the buffer address of data is to carry out poly- ordering by merging according to the value of attribute B, as shown in table 3：

Table 3：

" attribute B " value is b₁Data buffer address
	" attribute B " value is b₂Data buffer address
…
	" attribute B " value is b_nData buffer address

Enter next layer to see, the buffer address of data is to carry out poly- ordering by merging according to the value of attribute C, as shown in table 4：

Table 4：

" attribute C " value is c₁Data buffer address
	" attribute C " value is c₂Data buffer address
…
	" attribute C " value is c_nData buffer address

S3：According to default traversal order, the address value in the first Hash table of the n-th layer is obtained successively.

S4：According to the order for obtaining each address value, in caching first successively, each described address value is corresponding Data storage is to hard disk.First by first cache in data storage second caching in, then by second cache in data according to Secondary store to hard disk.

Specifically, the data in first being cached are stored to the second caching according to the traversal order of the first nested Hash table In, as the first nested Hash table is classified storage address value according to attribute type, then according to the traversal order by each ground After location is worth during corresponding data Cun Chudao second are cached, belong to the data meeting Coutinuous store of same type attribute in the second caching, Data in second caching are stored successively to hard disk, the data so as to store in hard disk are also continuously to deposit according to attribute type Storage.In this kind of implementation, the data storage in being cached first using the second caching is to hard disk, it is to avoid directly delay first When data storage in depositing is to hard disk, the hard disk I/O interface pressure that repeatedly write hard disk is caused.

The reading of data for convenience, generates and stores the second Hash table, and second Hash table is used for recording each type The corresponding data of attribute initial address in a hard disk and data length.Specifically, the second nested Hash table is breathed out by N layer second The nested composition of uncommon table, the key of per layer of second Hash table is triple, and ground floor is followed successively by the value of the second Hash table of N-1 layer Next layer of the second Hash table, the value of the second Hash table of n-th layer is pieces of data address value in a hard disk；Each layer triple First element is followed successively by the attribute of N number of same type, and the second element of every layer of triple is the attribute bag of respective layer corresponding types The data for containing initial address in a hard disk, the third element of every layer of triple are the number that the attribute of respective layer corresponding types includes According to data length in a hard disk.Second nested Hash table is using triple record attribute value, the starting point of data storage of key Location and data length.The concrete form of the table is can be found in shown in table 5.

Table 5：

As can be seen from Table 5, the property value of attribute A is respectively (a₁、a₂…a_n), the property value difference (b of attribute B₁、b₂… b_n), the property value of attribute C is respectively (c₁、c₂…c_n)；Property value is a₁Continuous data initial address be addr_a₁, data Length is len_a₁, property value is a₁And b₁Continuous data initial address be addr_b₁, data length be len_b₁, with this Analogize, it is known that other data.Digital independent can be carried out using the second nested Hash table, described in detail below：According to crawl The attribute type of the data for arriving, using each embodiment above-mentioned by the data storage for grabbing to hard disk, to facilitate digital independent, I.e. when data read command is received, according to the generate second nested Hash table, the data of storage are read from hard disk.Specifically Ground, as shown in Fig. 2 the reading process includes：

1) when data read command is received, the attribute type in data read command and property value are extracted.

Wherein, data read command includes attribute type and property value.For example, attribute type is time period and four-tuple That is source IP address, source port, target ip address and target port.

2)：In the second nested Hash table, target triple corresponding with attribute type and property value is determined.

By taking the second nested Hash table shown in above-mentioned table 5 as an example, it is the attribute type for extracting to search attribute type in the table And property value is the triple of the property value for extracting, it is assumed that attribute type is A and property value is a₁, then the triple for finding is (a₁,addr_a₁,len_a₁)；And for example attribute type is A and B, and property value is respectively a₁And b₂, then the triple for finding is (b₂,addr_b₂,len_b₂).

3)：According to second element and the third element of target triple, the data of storage are read from hard disk.I.e. with target The initial address of the second element of triple is starting point, reads the consecutive numbers of the data length that third element represents from hard disk According to this section of continuous data is the data for needing to read.

For example, the attribute type of the data for grabbing includes time and TCP session data bag, further, TCP session number It is four-tuple according to the attribute type of bag, then the data type of storage includes time and four-tuple, when by data according to above-mentioned reality Apply example offer storage mode store in hard disk after, data are reset according to attribute type, that is, have same time and/ Or record has same time and/or the phase of Coutinuous store in the data Coutinuous store of identical four-tuple, and the second nested Hash table With initial address and the data length of the data of four-tuple, so when the data of certain four-tuple in certain time period are read, only Initial address and the data length of the segment data need to be found in the second nested Hash table, started to read from the initial address and be somebody's turn to do The continuous data of segment data length.

As shown in figure 3, the data storage device of the present invention includes that the first buffer memory unit, the first Hash table generate list Unit, buffer address value acquiring unit and hard disk data storage unit.

First buffer memory unit is used for the sequencing according to crawl data, by a plurality of data storage for grabbing to the One caching, wherein, pieces of data has the attribute of N number of same type.

First Hash table signal generating unit is used for the position according to many data in the first caching, generates the first nested Hash Table, the first nested Hash table are made up of N layer the first Hash table nesting.The key of the first Hash table of each layer is followed successively by N number of same type Attribute, and ground floor is followed successively by next layer of the first Hash table, the first Hash of n-th layer to the value of the first Hash table of N-1 layer The value of table is address value of the pieces of data in the first caching.

Buffer address value acquiring unit is used for, according to traversal order is preset, obtaining the ground in the first Hash table of n-th layer successively Location is worth.

Hard disk data storage unit is used for according to the order for obtaining each address value, each address in caching first successively It is worth corresponding data storage to hard disk.

Hard disk data storage unit includes the second buffer memory subelement and hard disc data storing sub-units, and second deposits Storage subelement is used for according to the order for obtaining each address value, the corresponding data storage of each address value in caching first successively To the second caching.Data during hard disc data storing sub-units are used for caching second are stored successively to hard disk.

The data storage device also includes the second Hash table signal generating unit, for generating and storing the second nested Hash table； Wherein, the second nested Hash table is made up of N layer the second Hash table nesting, and the key of per layer of second Hash table is triple, and first The value of layer to the second Hash table of N-1 layer is followed successively by next layer of the second Hash table, and the value of the second Hash table of n-th layer is each bar number According to address value in a hard disk.First element of each layer triple is followed successively by the attribute of N number of same type, and the of every layer of triple Was Used is the data that include of the attribute of respective layer corresponding types initial address in a hard disk, the third element of every layer of triple The data included for the attribute of respective layer corresponding types data length in a hard disk.

The data storage device also includes hard disc data reading unit, for according to the second nested Hash table, from hard disk Read the data of storage.Hard disc data reading unit include read command reception subelement, target triple determination subelement and Hard disc data reads subelement.Wherein, reading command reception subelement is used for, when data read command is received, extracting data Read the attribute type in instruction and property value；Target triple determination subelement is used in the second nested Hash table, determines Target triple corresponding with attribute type and property value；Hard disc data reads subelement is used for according to target triple second Element and third element, read the data of storage from hard disk.

Embodiment described above, the simply present invention more preferably specific embodiment, those skilled in the art is at this The usual variations and alternatives carried out in the range of inventive technique scheme all should be comprising within the scope of the present invention.

Claims

1. a kind of date storage method, it is characterised in that：The data for grabbing are temporary in the buffer according to time order and function order, Data of the record with same type attribute position in the buffer, the content using record is by the data in caching according to attribute Store after byte orderings in hard disk, data Coutinuous store that will be with same type attribute, the tool of the date storage method Body step is：

S1：According to the sequencing of crawl data, by a plurality of data storage for grabbing in the first caching, wherein, each bar institute The attribute that data have N number of same type is stated, N is the integer more than 1；

S2：According to position of many data in the first caching, the first nested Hash table is generated, wherein, described first is embedding Set Hash table is made up of N layer the first Hash table nesting, and the key of each layer first Hash table is followed successively by N number of same type Attribute, and ground floor is followed successively by next layer of the first Hash table to the value of N-1 layer first Hash table, n-th layer first is breathed out The value of uncommon table is address value of the data described in each bar in the first caching；

S4：According to the order for obtaining each address value, the corresponding data of each described address value in caching first successively Store to hard disk.

2. date storage method according to claim 1, it is characterised in that：In step S4, according to obtaining, each is described The order of location value, in caching first successively, the corresponding data storage of each described address value to hard disk is included according to obtaining each The order of the address value, in caching described first successively, the corresponding data storage of each described address value is to the second caching In, the data during second is cached are stored successively to hard disk.

3. date storage method according to claim 2, it is characterised in that：Also include to generate and store the second nested Hash Table, wherein, the described second nested Hash table is made up of N layer the second Hash table nesting, and the key of per layer of second Hash table is three Tuple, and ground floor is followed successively by next layer of the second Hash table to the value of N-1 layer second Hash table, n-th layer second is breathed out The value of uncommon table is address value of the data in the hard disk described in each bar.

4. date storage method according to claim 3, it is characterised in that：First element of every layer of triple is successively For the attribute of N number of same type, the second element of per layer of triple is that the attribute of respective layer corresponding types includes Initial address of the data in the hard disk, the third element of per layer of triple are that the attribute of respective layer corresponding types includes Data length of the data in the hard disk.

5. a kind of data storage device, it is characterised in that：Including the first buffer memory unit, the first Hash table signal generating unit, delay Address value acquiring unit and hard disk data storage unit is deposited, the first buffer memory unit is used for the priority according to crawl data Sequentially, by a plurality of data storage for grabbing to the first caching, wherein, data described in each bar have the attribute of N number of same type； The first Hash table signal generating unit is used for the position according to many data in the first caching, generates the first nested Hash Table, the first nested Hash table are made up of N layer the first Hash table nesting；The buffer address value acquiring unit is used for according to default time Order is gone through, obtains the address value in the first Hash table of the n-th layer successively；The hard disk data storage unit is used for according to acquisition The order of each address value, in caching described first successively, the corresponding data storage of each described address value is to hard disk.

6. data storage device according to claim 5, it is characterised in that：The hard disk data storage unit includes second Buffer memory subelement and hard disc data storing sub-units, the second buffer memory subelement are used for according to each described address of acquisition The order of value, in caching described first successively, the corresponding data storage of each described address value is to the second caching；Described hard Data during disk data storage subunit operable is used for caching described second are stored successively to hard disk.

7. data storage device according to claim 6, it is characterised in that：Also include the second Hash table signal generating unit, use In generating and storing the second nested Hash table, wherein, the described second nested Hash table is made up of N layer the second Hash table nesting.