CN102915373B - A kind of date storage method and device - Google Patents

A kind of date storage method and device Download PDF

Info

Publication number
CN102915373B
CN102915373B CN201210438962.6A CN201210438962A CN102915373B CN 102915373 B CN102915373 B CN 102915373B CN 201210438962 A CN201210438962 A CN 201210438962A CN 102915373 B CN102915373 B CN 102915373B
Authority
CN
China
Prior art keywords
data
data set
sub
item
category
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201210438962.6A
Other languages
Chinese (zh)
Other versions
CN102915373A (en
Inventor
倪颖杰
姚建华
李祖华
张军
朱开颜
刘桂英
马飞
李弢
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuxi Jiangnan Computing Technology Institute
Original Assignee
Wuxi Jiangnan Computing Technology Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuxi Jiangnan Computing Technology Institute filed Critical Wuxi Jiangnan Computing Technology Institute
Priority to CN201210438962.6A priority Critical patent/CN102915373B/en
Publication of CN102915373A publication Critical patent/CN102915373A/en
Application granted granted Critical
Publication of CN102915373B publication Critical patent/CN102915373B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

This application discloses a kind of date storage method and device, the method obtains data set to be analyzed, determines the data category of each Sub Data Set in this data set;And inquire about the corresponding relation of preset data category and data classifying rules, determine the data classifying rules of Sub Data Set, data classifying rules according to Sub Data Set, this Sub Data Set is divided into the first Sub Data Set and the second Sub Data Set, and determines the memory module of each item data in the first Sub Data Set and the second Sub Data Set;According to described memory module, each item data in described first Sub Data Set is stored to structural data memory block, described second Sub Data Set is stored to unstructured data memory block.When utilizing the method to carry out data storage, the convenience of data query and statistics can be improved, it is also possible to adjust the storage mode of data more flexibly.

Description

A kind of date storage method and device
Technical field
The application relates to big technical field of data processing, a kind of date storage method And device.
Background technology
Along with automatization and the quickening of data genaration speed of data genaration, need data volume to be processed anxious Sharp increase adds, and big data age has come.Big data have the features such as the scale of construction is big, data type is various, The analysis mining of big data then will be from the data of various type, and quick obtaining is to valuable data.
Before carrying out the analysis mining of big data, it is generally required to gather and store the mass data got Information.Big data include the multi-class datas such as structuring, semi-structured and destructuring, at present, common Big data storage method has two kinds, and one of which is the data collected all to be stored as structural data In relational database system, but when carrying out the storage of big data with this storage mode, then influence whether The inquiry velocity of the data such as text;Another kind of storage mode is all as destructuring using the data that collect Data store in file system, but when this kind of storage mode carries out data storage, then cannot be carried out relatively For complicated statistical analysis.Simultaneously as the framework difference of relational database and file system is relatively big, when When data storage method is changed, it is impossible to adjust data storage method flexibly.
Summary of the invention
In view of this, the application provides a kind of date storage method and device, stores in this way Time, reduce the situation that inquiry velocity is slow and statistical analysis is inconvenient present in prior art, and can have Effect improves the motility adjusting data storage.
For achieving the above object, the application provides following technical scheme: a kind of date storage method, including:
Obtain data set to be analyzed;
Determine the data category of each Sub Data Set in described data set;
Inquire about the corresponding relation of preset data category and data classifying rules, determine described Sub Data Set Data classifying rules;
According to the data classifying rules of described Sub Data Set, described Sub Data Set is divided into the first subdata Collection and the second Sub Data Set, and determine each item data in described first Sub Data Set and the second Sub Data Set Memory module;
According to described memory module, each item data in described first Sub Data Set is stored to structuring number According to memory block, described second Sub Data Set is stored to unstructured data memory block.
Preferably, according to described memory module, each item data in described first Sub Data Set is stored To structural data memory block, described second Sub Data Set was stored before unstructured data memory block, Also include:
Build unique condition code identifying described Sub Data Set;
Described according to described memory module, each item data in described first Sub Data Set is stored to structure Change data storage area, described second Sub Data Set is stored to unstructured data memory block, including:
According to described memory module, deposit corresponding with each item data in the first Sub Data Set for described condition code Store up to structural data memory block and described condition code is corresponding with each item data in the second Sub Data Set Store to unstructured data memory block.
Preferably, described acquisition data set to be analyzed, including:
The data of designated ratio are extracted as described data set to be analyzed from the initial data collected.
Preferably, the data category of described each Sub Data Set determined in described data set, including:
Analyze the organized formats of the data of each Sub Data Set in described data set, determine described Sub Data Set Data category corresponding to the organized formats of data.
Preferably, the data category of described each Sub Data Set determined in described data set, including:
Inquire about the Data Identification included in each Sub Data Set in described data set, determine and described son The data category that Data Identification included in data set is corresponding.
Preferably, described memory module, including:
The data memory format of data item, the data space of data item and/or index information.
Preferably, also include:
Receive described preset data category and the corresponding relation of data classifying rules is more newly requested;
According to described more newly requested, change or add the corresponding relation of data category and data classifying rules.
On the other hand, the date storage method of corresponding the application, present invention also provides a kind of data storage Device, including:
Data capture unit, is used for obtaining data set to be analyzed;
Classification determination unit, for determining the data category of each Sub Data Set in described data set;
Classifying rules determines unit, and for inquiring about, preset data category is corresponding with data classifying rules to close System, determines the data classifying rules of described Sub Data Set;
Data sorting unit, for the data classifying rules according to described Sub Data Set, by described subdata Collection is divided into the first Sub Data Set and the second Sub Data Set, and determines described first Sub Data Set and the second son The memory module of each item data in data set;
Memory element, for according to described memory module, by each item data in described first Sub Data Set Store to structural data memory block, described second Sub Data Set is stored to unstructured data memory block.
Preferably, described device also includes:
Condition code signal generating unit, for building unique condition code identifying described Sub Data Set;
Described memory element, particularly as follows: for according to described memory module, by described condition code and first Each item data correspondence in Sub Data Set stores to structural data memory block, and by described condition code and the Each item data correspondence in two Sub Data Set stores to unstructured data memory block.
Preferably, described data capture unit, particularly as follows: for extracting from the initial data collected The data of designated ratio are as described data set to be analyzed.
Preferably, described classification determination unit, including:
First category determines unit, for analyzing the tissue of the data of each Sub Data Set in described data set Form, determines the data category that the organized formats of the data of described Sub Data Set is corresponding.
Preferably, described classification determination unit, including:
Second category determines unit, for inquiring about included in each Sub Data Set in described data set Data Identification, determines the data category corresponding with the Data Identification included in described Sub Data Set.
Preferably, the memory module that described data sorting unit is determined includes: the data storage of data item Form, the data space of data item and/or index information.
Preferably, described device also includes:
Update request reception unit, for receiving described preset data category and data classifying rules Corresponding relation more newly requested;
Policy Updates unit, for according to described more newly requested, change or adds data category and divide with data The corresponding relation of rule-like.
Understand via above-mentioned technical scheme, compared with prior art, this application provides a kind of data and deposit Method for storing and device, the method sets different data classifying ruless for different types of data, really After the data classifying rules that the data type of sub-data is corresponding, the data in Sub Data Set are divided two Divided data, is deposited into structured data memory block and the data item of unstructured data memory block to realize adjusting, With can reduce the problem of retrieval inconvenience individually with structured storage mode, simultaneously with individually with non-knot Structure data storage method is compared, and decreases the situation that cannot be carried out complex statistics.Meanwhile, storage is worked as When demand changes, can directly adjust the data classifying rules that data category is corresponding, can adjust easily The storage mode of certain class data.
Accompanying drawing explanation
In order to be illustrated more clearly that the embodiment of the present application or technical scheme of the prior art, below will be to reality Execute the required accompanying drawing used in example or description of the prior art to be briefly described, it should be apparent that below, Accompanying drawing in description is only embodiments herein, for those of ordinary skill in the art, not On the premise of paying creative work, it is also possible to obtain other accompanying drawing according to the accompanying drawing provided.
Fig. 1 shows the schematic flow sheet of the application one embodiment of a kind of date storage method;
Fig. 2 shows the schematic flow sheet of the application another embodiment of a kind of date storage method;
Fig. 3 shows the structural representation of the application one embodiment of a kind of data storage device;
Fig. 4 shows the structural representation of the application another embodiment of a kind of data storage device.
Detailed description of the invention
Below in conjunction with the accompanying drawing in the embodiment of the present application, the technical scheme in the embodiment of the present application is entered Row clearly and completely describes, it is clear that described embodiment is only some embodiments of the present application, Rather than whole embodiments.Based on the embodiment in the application, those of ordinary skill in the art are not having Have and make the every other embodiment obtained under creative work premise, broadly fall into the application protection Scope.
The application can be used in numerous general or special purpose calculating device context or configuration.Such as: individual Computer, server computer, handheld device or portable set, laptop device, multiprocessor Device, the distributed computing environment including any of the above device or equipment etc..
The application can retouch in the general context of computer executable instructions State, such as program module.Usually, program module includes performing particular task or realizing specific abstract The routine of data type, program, object, assembly, data structure etc..Can also be at distributed meter Calculate in environment and put into practice the application, in these distributed computing environment, connected by by communication network The remote processing devices connect is to perform task.In a distributed computing environment, program module may be located at In local and remote computer-readable storage medium including storage device.
See Fig. 1, it is shown that the schematic flow sheet of the application one embodiment of a kind of date storage method, this The method of embodiment includes:
Step 101: obtain data set to be analyzed.
Typically being required to carry out the collection of data before carrying out big data mining, substantial amounts of data are by from life Become in each target source of data and collect, in order to follow-up carry out data mining.The number in different application field May all differences according to acquisition mode and the data that collect.As, in field of scientific study and calculating Machine emulation field, data are obtained by supercomputer parallel computation, and these supercomputers are permissible As the target source of generation data, and the data collected are generally the calculating data of magnanimity;And for mutually Working application field, then by disposing network data acquisition equipment at gateway or data center, and can depend on Network packet is gathered by fixed port according to Internet protocol.
In this application, this data set to be analyzed refers to the use collected from the source data that target source generates In the data acquisition system carrying out data mining.Concrete, after collecting source data from target source, can be by The institute's active data collected is all as data set to be analyzed.But the source data collected in actual applications Data volume relatively big, therefore using all of source data all as data set to be analyzed, data mining can be made Data processing amount is relatively big and the longest, accordingly it is also possible to extraction is specified from the source data collected The sample data of ratio is as data set to be analyzed.Wherein, when sample drawn data from source data, The various types of data messages comprising in this source data in this sample data should be made.As, with the Internet In data mining as a example by, the data to be analyzed got should comprise different Internet protocol as far as possible Application data.
Step 102: determine the data category of each Sub Data Set in this data set.
In the embodiment of the present application, after getting data set to be analyzed, it is not directly this to be treated point All data in analysis data set the most directly store to a permanent storage area.
Owing to data set to be analyzed containing substantial amounts of various different types of data message, therefore, In the embodiment of the present application, it is necessary first to each Sub Data Set data set included in be analyzed to this Data content is analyzed, and determines the data category of each subdata intensive data.
Wherein, data category can be understood as the functional category that certain data is had, different pieces of information classification The meaning that data can be expressed is different.Same category of data have and can make a distinction with other data Data orga-nizational format or attribute information.In different applications, the division of data category is the most Difference, specifically can be as the criterion to be actually needed.
In order to make it easy to understand, be to enter as a example by the data acquisition system got from the Internet by data set to be analyzed Row is introduced.The data set to be analyzed got from the Internet contains and obtains from various different agreement ports The network packet of different application agreements that get, various.Each network packet can be managed Solution is a Sub Data Set, and different network packet may be corresponding different data categories.Specifically , the data set got can include the data categories such as webpage, mail, microblogging, instant messaging Data set, accordingly, it is determined that the data type of each Sub Data Set can determine that in the data set got Go out that the data content in each subdata belongs in the data categories such as webpage, mail, microblogging is any.
Wherein it is determined that the data class of each Sub Data Set in data set has multiple otherwise, Qi Zhongyi The mode of kind can be: analyzes the organized formats of the data of each Sub Data Set in this data set, determines each The data category that the organized formats of subdata intensive data is corresponding.Generally, different pieces of information classification tool There is different data orga-nizational format, by the data content in each Sub Data Set is analyzed, determine Go out the number that the data orga-nizational format in Sub Data Set just can determine that the data in this Sub Data Set are belonged to According to classification.Such as, in internet data, mail is postal with EML format organization data, i.e. data category The general format of the data of part is EML form, therefore when analyzing data orga-nizational format in Sub Data Set is EML form, just can determine that the data category of this Sub Data Set is email type.
Another kind determines that the data class in subdata is otherwise: each Sub Data Set in inquiry data set Included in Data Identification, according to the data category represented by this Data Identification, determine in Sub Data Set The data category of data.It is to say, distinguish different pieces of information class when the data of the data set gathered contain During other Data Identification, it is also possible to determine by analyzing the Data Identification comprised in each Sub Data Set Data category in this data set.
It addition, in actual applications, according to the feature of the data in different application field, it is also possible to there are other Determine data class otherwise, numerous to list herein.
Step 103: inquire about the corresponding relation of preset data category and data classifying rules, determine subdata The data classifying rules of collection.
The embodiment of the present application presets the classifying rules of different pieces of information classification, for different pieces of information class Another characteristic is set with different data classifying ruless, defines and contain this data in data classifying rules The classifying rules of data item in the Sub Data Set of classification, and store each data item in which way. Wherein, this preset data category can be carried out according to the actual requirements with the corresponding relation of data classifying rules Set.
Step 104: according to the data classifying rules of Sub Data Set, Sub Data Set is divided into the first subdata Collection and the second Sub Data Set, and determine the storage of each item data in the first Sub Data Set and the second Sub Data Set Pattern.
By inquiring about this preset corresponding relation, each self-corresponding number of each subdata can be determined respectively According to classifying rules.For any one Sub Data Set, according to the data of this Sub Data Set determined Classifying rules, the data item in this Sub Data Set being incorporated into is the first Sub Data Set and the second Sub Data Set, Meanwhile, storage format when each data item stores in the first Sub Data Set, and the second son are determined The memory module of each data item in data set.
Wherein, the memory module of data item defines when carrying out the storage of this data item, the number of this data item According to storage format, the data space of this data item, whether index information and whether compressing is set deposits One or more information such as storage.
Step 105: according to the memory module determined, each item data in this first Sub Data Set is stored To structural data memory block, described second Sub Data Set is stored to unstructured data memory block.
From existing that data in all data sets directly store a direct memory block is different, the application Embodiment combines structural data storage mode and the advantage of unstructured data storage two ways, First Sub Data Set and the second Sub Data Set be all divided into for any one Sub Data Set, and according to The memory module of each data item in the first Sub Data Set determined, by each item number in the first Sub Data Set According to storing to structural data memory block;Meanwhile, according to each data item in the second Sub Data Set determined Memory module, each item data in the second Sub Data Set is deposited into unstructured data memory block.
Wherein, the data in structural data memory block are the shapes of the database table with ad hoc structure Formula exists, and this structural data memory block is it can be appreciated that a relevant database.Contrary, non- In structured data storage the most there is not specific structured features in the data mode of storage, and concrete, this is non- Structural data memory block can be understood as file system.
It should be noted that for any one Sub Data Set, which data item in this Sub Data Set can To be divided into the first Sub Data Set, which data item can be divided into the second Sub Data Set, is all referred to The data classifying rules corresponding with this Sub Data Set determines.When needs change storage mode, it is only necessary to more Change the corresponding relation of this data category and data classifying rules, reset corresponding to certain class data category Data classifying rules i.e. adjustable data storage method.
In order to preferably embody database table and by file storage in the way of respective advantage, optional , when dividing a Sub Data Set, can be the data item belonging to structural data in Sub Data Set It is divided into the first Sub Data Set, and the unstructured data in Sub Data Set is divided into the second Sub Data Set, And then make the data being deposited into this structural data memory block be structurized data item, and store and arrive The data item of unstructured data memory block is unstructured data.
In order to make it easy to understand, be introduced as a example by the data category of Sub Data Set is as email type, an envelope The data content of mail contains: source address, source port, destination address, destination interface, mail class The number such as type, sender, addressee, mail header, message body, Email attachment, post time According to item.And assume that this and data are the data classifying rules regulation that mail is corresponding, by mail header, mail The structural data such as text, Email attachment incorporates into is the first Sub Data Set, and concrete regulation mail header, The memory module of the data item such as message body, Email attachment;By the transmission time, source address, source port, It is that the unstructured datas such as destination address, destination interface, email type, sender, addressee incorporate into Data in one Sub Data Set, and then when storing, then by each item number in this first Sub Data Set Store to structural data memory block according to according to the memory module determined, and by the second Sub Data Set Each item data stores destructuring memory block according to the memory module of its correspondence, it is achieved thereby that according to knot The feature of structure data and non-structural data self stores respectively, optimizes data storage, it is simple to Carry out adding up the operations such as retrieval to the data of storage.Certainly, this example is only wrap with the first Sub Data Set Containing structural data, and the second Sub Data Set is introduced as a example by only comprising unstructured data, but Actual application is likely to adjust as required some structural data is divided to the second subnumber According to collection, or unstructured data is divided to the first Sub Data Set.
In the date storage method of the present embodiment, after getting data set to be analyzed, need to be determined this The data category of each subdata intensive data in data set, and classify with data from preset data category In the corresponding relation of rule, determine the data classifying rules that subdata intensive data is corresponding, by this subdata Collection is divided into the first Sub Data Set and the second Sub Data Set, and obtains the first Sub Data Set and the second subdata Concentrate the memory module of each data item, and then according to the memory module determined, by the first Sub Data Set Store respectively to structural data memory block and unstructured data memory block with the second Sub Data Set.Due to When data store, determine different classifying ruless according to different data categories, and by same category Data be divided into two parts and store respectively to structural data memory block and destructuring memory block, thus drop Low inconvenient individually with the retrieval existing for structured storage mode, and overcome individually with non-knot The problem that cannot be carried out complex statistics existing for structure data.Meanwhile, the embodiment of the present application needs when storage When asking change, can directly adjust the data classifying rules that data category is corresponding, can be adjusted certain easily The storage mode of class data, efficient and convenient.
See Fig. 2, it illustrates the flow process signal of the application another embodiment of a kind of data storage method Figure, the method for the present embodiment includes:
Step 201: obtain data set to be analyzed.
Step 202: determine the data category of each Sub Data Set in this data set.
Step 203: inquire about the corresponding relation of preset data category and data classifying rules, determine subdata The data classifying rules of collection.
Step 204: according to the data classifying rules of Sub Data Set, Sub Data Set is divided into the first subdata Collection and the second Sub Data Set, and determine the storage of each item data in the first Sub Data Set and the second Sub Data Set Pattern.
The operating process to step 204 of the step 201 of the present embodiment and the step 101 of embodiment illustrated in fig. 1 Operating process to step 104 is similar to, and associated description may refer to the description of embodiment illustrated in fig. 1, This repeats no more.
Step 205: build unique condition code identifying Sub Data Set.
In the present embodiment, after determining the classifying rules of each subdata intensive data, need respectively A condition code is built for each Sub Data Set in this data set.Wherein, this feature code is mark one The mark of Sub Data Set, e.g., condition code can be a 32-bit number, and condition code is with Sub Data Set one by one Corresponding.
When building each Sub Data Set characteristic of correspondence code, a Sub Data Set can determined After the memory module of data classifying rules and this subdata intensive data, i.e. generate the spy of this Sub Data Set Levy code.It is of course also possible to be when classifying rules and the phase determining all Sub Data Set in this data set After the memory module answered, unified generation and the condition code of total Sub Data Set number equal number.
It should be noted that this subnumber of mark can also be comprised in each Sub Data Set characteristic of correspondence code Mark according to the data category of intensive data.
Step 206: according to this memory module, condition code is corresponding with each item data in the first Sub Data Set Store to structural data memory block, and deposit corresponding with each item data in the second Sub Data Set for condition code Storage is to unstructured data memory block.
In the present embodiment, each item data in the first Sub Data Set is stored to structural data memory block Time, need the Sub Data Set characteristic of correspondence code belonging to this first Sub Data Set is deposited into structuring together Data storage area, meanwhile, in this structural data memory block, each data item of this first Sub Data Set is equal Corresponding with this feature code.In other words, be i.e. deposited in the first Sub Data Set in structured storage district is each Item data has incidence relation with this feature code being stored in this Sub Data Set simultaneously.Corresponding, at non-knot Structure memory block stores each item data of this second Sub Data Set and this feature code the most simultaneously, this feature code with Each item data in second Sub Data Set also has incidence relation.
When needing to inquire about the data stored respectively in same Sub Data Set to two memory blocks, the most permissible This Sub Data Set characteristic of correspondence code of direct basis is inquired about, and can be conveniently and quickly searched same All data item of Sub Data Set.
As a example by still the data in Sub Data Set are an envelope mail, when determining each item data in this mail After division rule, generate and this mail characteristic of correspondence code M.And then by the mail header in mail, postal What the data item such as part text, Email attachment were corresponding with this feature code M is deposited into structural data memory block; By the transmission time of mail, source address, source port, destination address, destination interface, email type, send out What part people, addressee were corresponding with this feature code M is deposited into unstructured data memory block.Due to structuring Many envelopes that memory block and destructuring memory block store different mail address the most respectively, different time sends Mail, if the condition code of being not provided with, then needs to input multiple search conditions in structural data memory block The data message that this envelope mail is relevant can be retrieved, also need to input in destructuring memory block simultaneously Multiple search conditions search for the data message that this envelope mail is relevant, just can obtain the complete of this envelope mail Information.If the search condition of input is incorrect, it is also possible to there will be and simultaneously scans for sealing mail more Data message, in addition it is also necessary to user the most further retrieves and just can obtain required mail.By Data item relevant to this envelope mail in two memory blocks is corresponding store condition code after, when needs inquiry should When sealing the related data information of mail, then can be able to search and this envelope mail with direct basis this feature code Relevant all data item, decrease data processing amount, also improve the accuracy of data search.
In the present embodiment at each item number by the first Sub Data Set in Sub Data Set and the second Sub Data Set Before storing, generate a condition code for Sub Data Set, and by condition code and this first Sub Data Set In each item data correspondence be deposited into structural data memory block, and by each item data in the second Sub Data Set Corresponding with this feature code store to unstructured data memory block, so when data mining, the need to When inquiring about all data of same Sub Data Set, can by this feature code in two memory blocks efficiently Inquire all data corresponding with this feature code.
Further, in any one date storage method more than the embodiment of the present application, for the ease of The corresponding relation of preset data category with data classifying rules is modified, or adds new number According to the data classifying rules that classification is corresponding, the method for the embodiment of the present application can also include: receives preset Data category and the corresponding relation of data classifying rules more newly requested;According to more newly requested, change or Add the corresponding relation of data category and data classifying rules.When receiving more newly requested, according to this more The newly requested classifying rules to be added comprised or content to be modified, divide with data corresponding data category Rule-like is modified.
The date storage method of corresponding the embodiment of the present application, the embodiment of the present application additionally provides a kind of data and deposits Storage device, sees Fig. 3, it is shown that the structural representation of the present invention one embodiment of a kind of data storage device, The device of the present embodiment includes: data capture unit 301, classification determination unit 302, classifying rules determine Unit 303, data sorting unit 304 and memory element 305.
Wherein, this data capture unit 301, it is used for obtaining data set to be analyzed.
Classification determination unit 302, for determining the data category of each Sub Data Set in this data set.
Classifying rules determines unit 303, corresponding with data classifying rules for inquiring about preset data category Relation, determines the data classifying rules of described Sub Data Set.
Data sorting unit 304, for the data classifying rules according to Sub Data Set, by each Sub Data Set It is divided into the first Sub Data Set and the second Sub Data Set, and determines this first Sub Data Set and the second subdata Concentrate the memory module of each item data.
Wherein, the memory module that this data sorting unit is determined, including: the data storage lattice of data item Formula, the data space of data item and/or index information.
Memory element 305, for according to described memory module, by each item number in described first Sub Data Set According to storing to structural data memory block, described second Sub Data Set is stored to unstructured data storage District.
Wherein, this data capture unit obtains the mode of data to be analyzed to be had multiple, corresponding one of which side Formula, this data acquisition list 304 yuan, particularly as follows: specify ratio for extraction from the initial data collected The data of example are as described data set to be analyzed.
In actual applications, the data class during classification determination unit determines Sub Data Set the most also may be used There to be various ways, corresponding one way in which, the category determines that unit 302 includes:
First category determines unit, for analyzing the tissue of the data of each Sub Data Set in described data set Form, determines the data category that the organized formats of the data of described Sub Data Set is corresponding.
Corresponding another kind determines that in Sub Data Set, data class is otherwise, and the category determines that unit 302 is permissible Including:
Second category determines unit, for inquiring about included in each Sub Data Set in described data set Data Identification, determines the data category corresponding with the Data Identification included in described Sub Data Set.
See Fig. 4, it is shown that a kind of structural representation storing another embodiment of device of the present invention, this reality The difference of the storage device and embodiment illustrated in fig. 3 of executing example is, this storage device in the present embodiment Also include: condition code signal generating unit 306.
This feature code signal generating unit 306, for building unique condition code identifying Sub Data Set.
Wherein, condition code and Sub Data Set one_to_one corresponding.
Accordingly, this memory element 305, particularly as follows: for according to this memory module, by condition code and the Each item data correspondence in one Sub Data Set stores to structural data memory block, and by condition code and second Each item data correspondence in Sub Data Set stores to unstructured data memory block.
In the present embodiment before memory element carries out data storage, condition code signal generating unit it is every height Data set generates unique condition code identifying a Sub Data Set, and then by memory element by Sub Data Set Condition code and each data item in the first Sub Data Set of this Sub Data Set are deposited into structural data storage District, and the condition code of this Sub Data Set is deposited with each item data in the second Sub Data Set of this Sub Data Set Enter to unstructured data memory block, thus during data query in carrying out same Sub Data Set, only need All of data message in this Sub Data Set can be inquired easily according to this feature code.
Further, can also include in one device embodiment of the application any of the above: more newly requested Unit and Policy Updates unit.
Wherein, update request reception unit, for receiving, described preset data category is classified with data Corresponding relation more newly requested of rule;
Policy Updates unit, for according to described more newly requested, change or adds data category and divide with data The corresponding relation of rule-like.
For aforesaid each method embodiment, in order to be briefly described, therefore it is all expressed as a series of dynamic Combining, but those skilled in the art should know, the application is not by described sequence of movement Limiting, because according to the application, some step can use other orders or carry out simultaneously.Secondly, Those skilled in the art also should know, embodiment described in this description belongs to preferred embodiment, Necessary to involved action and module not necessarily the application.
It should be noted that each embodiment in this specification all uses the mode gone forward one by one to describe, each What embodiment stressed is all the difference with other embodiments, identical similar between each embodiment Part see mutually.For device class embodiment, due to the basic phase of itself and embodiment of the method Seemingly, so describe is fairly simple, relevant part sees the part of embodiment of the method and illustrates.
Finally, in addition it is also necessary to explanation, in this article, the relational terms of such as first and second or the like It is used merely to separate an entity or operation with another entity or operating space, and not necessarily requires Or imply relation or the order that there is any this reality between these entities or operation.And, art Language " includes ", " comprising " or its any other variant are intended to comprising of nonexcludability, thus Make to include that the process of a series of key element, method, article or equipment not only include those key elements, and Also include other key elements being not expressly set out, or also include for this process, method, article or The key element that person's equipment is intrinsic.In the case of there is no more restriction, by statement " including ... " The key element limited, it is not excluded that also deposit in including the process of described key element, method, article or equipment In other identical element.
For convenience of description, it is divided into various unit to be respectively described with function when describing apparatus above.Certainly, The function of each unit can be realized in same or multiple softwares and/or hardware when implementing the application.
Above a kind of date storage method provided herein and device are described in detail, this Literary composition applies specific case principle and the embodiment of the application are set forth, above example Explanation be only intended to help and understand the present processes and core concept thereof;Simultaneously for this area Those skilled in the art, according to the thought of the application, the most all can Change part, and in sum, this specification content should not be construed as the restriction to the application.

Claims (8)

1. a date storage method, it is characterised in that including:
Obtain data set to be analyzed;
Determine the data category of each Sub Data Set in described data set;
Inquire about the corresponding relation of preset data category and data classifying rules, determine described Sub Data Set Data classifying rules;
According to the data classifying rules of described Sub Data Set, described Sub Data Set is divided into the first subdata Collection and the second Sub Data Set, and determine each item data in described first Sub Data Set and the second Sub Data Set Memory module;
According to described memory module, each item data in described first Sub Data Set is stored to structuring number According to memory block, described second Sub Data Set is stored to unstructured data memory block;
The data category of wherein said each Sub Data Set determined in described data set, including:
Analyze the organized formats of the data of each Sub Data Set in described data set, determine described Sub Data Set Data category corresponding to the organized formats of data;
Or,
Inquire about the Data Identification included in each Sub Data Set in described data set, determine and described son The data category that Data Identification included in data set is corresponding;
Wherein, according to described memory module, each item data in described first Sub Data Set is stored to Structural data memory block, stored described second Sub Data Set before unstructured data memory block, Also include:
Build unique condition code identifying described Sub Data Set;
Described according to described memory module, each item data in described first Sub Data Set is stored to structure Change data storage area, described second Sub Data Set is stored to unstructured data memory block, including:
According to described memory module, deposit corresponding with each item data in the first Sub Data Set for described condition code Store up to structural data memory block and described condition code is corresponding with each item data in the second Sub Data Set Store to unstructured data memory block.
Method the most according to claim 1, it is characterised in that described acquisition data set to be analyzed, Including:
The data of designated ratio are extracted as described data set to be analyzed from the initial data collected.
Method the most according to claim 1, it is characterised in that described memory module, including:
The data memory format of data item, the data space of data item and/or index information.
Method the most according to claim 1, it is characterised in that also include:
Receive described preset data category and the corresponding relation of data classifying rules is more newly requested;
According to described more newly requested, change or add the corresponding relation of data category and data classifying rules.
5. a data storage device, it is characterised in that including:
Data capture unit, is used for obtaining data set to be analyzed;
Classification determination unit, for determining the data category of each Sub Data Set in described data set;
Classifying rules determines unit, and for inquiring about, preset data category is corresponding with data classifying rules to close System, determines the data classifying rules of described Sub Data Set;
Data sorting unit, for the data classifying rules according to described Sub Data Set, by described subdata Collection is divided into the first Sub Data Set and the second Sub Data Set, and determines described first Sub Data Set and the second son The memory module of each item data in data set;
Memory element, for according to described memory module, by each item data in described first Sub Data Set Store to structural data memory block, described second Sub Data Set is stored to unstructured data memory block;
Wherein said classification determination unit, including:
First category determines unit, for analyzing the tissue of the data of each Sub Data Set in described data set Form, determines the data category that the organized formats of the data of described Sub Data Set is corresponding;
Or,
Second category determines unit, for inquiring about included in each Sub Data Set in described data set Data Identification, determines the data category corresponding with the Data Identification included in described Sub Data Set;
Wherein, also include:
Condition code signal generating unit, for building unique condition code identifying described Sub Data Set;
Described memory element, particularly as follows: for according to described memory module, by described condition code and first Each item data correspondence in Sub Data Set stores to structural data memory block, and by described condition code and the Each item data correspondence in two Sub Data Set stores to unstructured data memory block.
Device the most according to claim 5, it is characterised in that described data capture unit, specifically For: for extracting the data of designated ratio from the initial data collected as described data set to be analyzed.
Device the most according to claim 5, it is characterised in that described data sorting unit is determined Memory module include: the data memory format of data item, the data space of data item and/or index Information.
Device the most according to claim 5, it is characterised in that also include:
Update request reception unit, for receiving described preset data category and data classifying rules Corresponding relation more newly requested;
Policy Updates unit, for according to described more newly requested, change or adds data category and divide with data The corresponding relation of rule-like.
CN201210438962.6A 2012-11-06 2012-11-06 A kind of date storage method and device Active CN102915373B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201210438962.6A CN102915373B (en) 2012-11-06 2012-11-06 A kind of date storage method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201210438962.6A CN102915373B (en) 2012-11-06 2012-11-06 A kind of date storage method and device

Publications (2)

Publication Number Publication Date
CN102915373A CN102915373A (en) 2013-02-06
CN102915373B true CN102915373B (en) 2016-08-10

Family

ID=47613739

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201210438962.6A Active CN102915373B (en) 2012-11-06 2012-11-06 A kind of date storage method and device

Country Status (1)

Country Link
CN (1) CN102915373B (en)

Families Citing this family (28)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103440290A (en) * 2013-08-16 2013-12-11 曙光信息产业股份有限公司 Big data loading system and method
CN103440288A (en) * 2013-08-16 2013-12-11 曙光信息产业股份有限公司 Big data storage method and device
CN103440303A (en) * 2013-08-21 2013-12-11 曙光信息产业股份有限公司 Heterogeneous cloud storage system and data processing method thereof
CN103440130A (en) * 2013-08-26 2013-12-11 成都金山数字娱乐科技有限公司 Data processing method and device
CN104731800B (en) * 2013-12-20 2018-10-23 中国银联股份有限公司 Data analysis set-up
CN103745262A (en) * 2013-12-30 2014-04-23 远光软件股份有限公司 Data collection method and device
CN104090901B (en) * 2013-12-31 2017-06-13 腾讯数码(天津)有限公司 A kind of method that data are processed, device and server
CN103699694B (en) * 2014-01-13 2017-08-29 联想(北京)有限公司 A kind of data processing method and device
WO2015165112A1 (en) * 2014-04-30 2015-11-05 Pivotal Software, Inc. Validating analytics results
CN104102701B (en) * 2014-07-07 2017-10-13 浪潮(北京)电子信息产业有限公司 A kind of historical data based on hive is achieved and querying method
CN104462287B (en) * 2014-11-27 2018-10-12 华为技术服务有限公司 A kind of method, apparatus and system of data processing
CN104715040A (en) * 2015-03-23 2015-06-17 浪潮集团有限公司 Data classification method and device
CN106469195A (en) * 2016-08-31 2017-03-01 国信优易数据有限公司 Based on conforming data file Valuation Method and system
CN107103060B (en) * 2017-04-14 2021-02-26 湖南云智迅联科技发展有限公司 Storage method and system of sensing data
CN107453948A (en) * 2017-07-28 2017-12-08 北京邮电大学 The storage method and system of a kind of network measurement data
CN109598648A (en) * 2017-09-30 2019-04-09 北京国双科技有限公司 Information processing method and device
CN110020357B (en) * 2017-10-31 2021-08-24 北京国双科技有限公司 Data storage method, data storage device, storage medium and processor
CN108228101B (en) * 2017-12-28 2022-03-15 北京盛和大地数据科技有限公司 Method and system for managing data
CN110633315A (en) * 2018-06-20 2019-12-31 中国移动通信集团有限公司 Data processing method and device and computer storage medium
CN109522352A (en) * 2018-11-08 2019-03-26 内蒙古伊泰煤炭股份有限公司 Industrial data management system and method
CN109446204B (en) * 2018-11-27 2022-04-15 北京微播视界科技有限公司 Data storage method and device for instant messaging, electronic equipment and medium
CN110941640A (en) * 2018-12-25 2020-03-31 广州中软信息技术有限公司 Intelligent screening method, device, equipment, system and medium for problem clues
CN109947706A (en) * 2019-02-13 2019-06-28 上海泉涸信息科技有限公司 File management system and file management method
CN111192072B (en) * 2019-10-29 2023-08-04 腾讯科技(深圳)有限公司 User grouping method and device and storage medium
CN112783825B (en) * 2019-11-04 2024-01-02 富泰华工业(深圳)有限公司 Data archiving method, device, computer device and storage medium
CN111210879B (en) * 2020-01-06 2021-03-26 中国海洋大学 Hierarchical storage optimization method for super-large-scale drug data
CN111966645A (en) * 2020-08-12 2020-11-20 南方科技大学 Supercomputer data storage method, device, system and storage medium
CN113655968B (en) * 2021-08-24 2024-06-18 上海晋朔信息科技有限公司 Unstructured data storage method

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101042747A (en) * 2006-03-24 2007-09-26 上海中经互联网络有限公司 Economic operation analysis system
CN101174957A (en) * 2007-10-09 2008-05-07 南京财经大学 Cooperation service platform facing different source data
CN101441629A (en) * 2007-11-19 2009-05-27 上海新纳广告传媒有限公司 Automatic acquiring method of non-structured web page information

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100274750A1 (en) * 2009-04-22 2010-10-28 Microsoft Corporation Data Classification Pipeline Including Automatic Classification Rules

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101042747A (en) * 2006-03-24 2007-09-26 上海中经互联网络有限公司 Economic operation analysis system
CN101174957A (en) * 2007-10-09 2008-05-07 南京财经大学 Cooperation service platform facing different source data
CN101441629A (en) * 2007-11-19 2009-05-27 上海新纳广告传媒有限公司 Automatic acquiring method of non-structured web page information

Also Published As

Publication number Publication date
CN102915373A (en) 2013-02-06

Similar Documents

Publication Publication Date Title
CN102915373B (en) A kind of date storage method and device
CN102915347B (en) A kind of distributed traffic clustering method and system
US8655805B2 (en) Method for classification of objects in a graph data stream
CN110019486A (en) Collecting method, device, equipment and storage medium
CN108632100B (en) Method and system for discovering and presenting network application access information
CN105224606A (en) A kind of disposal route of user ID and device
Yan et al. Quegel: A general-purpose query-centric framework for querying big graphs
CN105912716A (en) Short text classification method and apparatus
CN103258049A (en) Association rule mining method based on mass data
WO2020219862A1 (en) Machine learning classifier for identifying internet service providers from website tracking
CN102567494B (en) Website classification method and device
CN108234233B (en) Log processing method and device
CN103544259B (en) Aggregating sorting TopK inquiry processing method and system
JP6756744B2 (en) Location information provision method and equipment
CN113031951B (en) Menu generation method, menu generation device, computer equipment and storage medium
CN101141370A (en) Gridding service based electric power enterprise real-time data processing method
KR20190108657A (en) Extracting similar group elements
CN105389330B (en) Across the community open source resources of one kind match correlating method
CN106599189A (en) Dynamic Skyline inquiry device based on cloud computing
CN109636682A (en) A kind of teaching resource auto-collection system
CN103984700B (en) A kind of isomeric data analysis method for scientific and technological information vertical search
KR101973328B1 (en) Correlation analysis and visualization method of Hadoop based machine tool environmental data
Wang et al. Research of massive web log data mining based on cloud computing
CN106406985A (en) A distributed computing frame and a distributed computing method
Czyzowicz et al. Enhancing hyperlink structure for improving web performance

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
EE01 Entry into force of recordation of patent licensing contract

Application publication date: 20130206

Assignee: Yangzhou Wanfang Electronic Technology Co., Ltd.

Assignor: Jiangnan Computing Technology Inst., Wuxi

Contract record no.: 2017320000002

Denomination of invention: Data storage method and device

Granted publication date: 20160810

License type: Exclusive License

Record date: 20170116

LICC Enforcement, change and cancellation of record of contracts on the licence for exploitation of a patent or utility model