CN102915373B - A kind of date storage method and device - Google Patents
A kind of date storage method and device Download PDFInfo
- Publication number
- CN102915373B CN102915373B CN201210438962.6A CN201210438962A CN102915373B CN 102915373 B CN102915373 B CN 102915373B CN 201210438962 A CN201210438962 A CN 201210438962A CN 102915373 B CN102915373 B CN 102915373B
- Authority
- CN
- China
- Prior art keywords
- data
- data set
- sub
- item
- category
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
This application discloses a kind of date storage method and device, the method obtains data set to be analyzed, determines the data category of each Sub Data Set in this data set;And inquire about the corresponding relation of preset data category and data classifying rules, determine the data classifying rules of Sub Data Set, data classifying rules according to Sub Data Set, this Sub Data Set is divided into the first Sub Data Set and the second Sub Data Set, and determines the memory module of each item data in the first Sub Data Set and the second Sub Data Set;According to described memory module, each item data in described first Sub Data Set is stored to structural data memory block, described second Sub Data Set is stored to unstructured data memory block.When utilizing the method to carry out data storage, the convenience of data query and statistics can be improved, it is also possible to adjust the storage mode of data more flexibly.
Description
Technical field
The application relates to big technical field of data processing, a kind of date storage method
And device.
Background technology
Along with automatization and the quickening of data genaration speed of data genaration, need data volume to be processed anxious
Sharp increase adds, and big data age has come.Big data have the features such as the scale of construction is big, data type is various,
The analysis mining of big data then will be from the data of various type, and quick obtaining is to valuable data.
Before carrying out the analysis mining of big data, it is generally required to gather and store the mass data got
Information.Big data include the multi-class datas such as structuring, semi-structured and destructuring, at present, common
Big data storage method has two kinds, and one of which is the data collected all to be stored as structural data
In relational database system, but when carrying out the storage of big data with this storage mode, then influence whether
The inquiry velocity of the data such as text;Another kind of storage mode is all as destructuring using the data that collect
Data store in file system, but when this kind of storage mode carries out data storage, then cannot be carried out relatively
For complicated statistical analysis.Simultaneously as the framework difference of relational database and file system is relatively big, when
When data storage method is changed, it is impossible to adjust data storage method flexibly.
Summary of the invention
In view of this, the application provides a kind of date storage method and device, stores in this way
Time, reduce the situation that inquiry velocity is slow and statistical analysis is inconvenient present in prior art, and can have
Effect improves the motility adjusting data storage.
For achieving the above object, the application provides following technical scheme: a kind of date storage method, including:
Obtain data set to be analyzed;
Determine the data category of each Sub Data Set in described data set;
Inquire about the corresponding relation of preset data category and data classifying rules, determine described Sub Data Set
Data classifying rules;
According to the data classifying rules of described Sub Data Set, described Sub Data Set is divided into the first subdata
Collection and the second Sub Data Set, and determine each item data in described first Sub Data Set and the second Sub Data Set
Memory module;
According to described memory module, each item data in described first Sub Data Set is stored to structuring number
According to memory block, described second Sub Data Set is stored to unstructured data memory block.
Preferably, according to described memory module, each item data in described first Sub Data Set is stored
To structural data memory block, described second Sub Data Set was stored before unstructured data memory block,
Also include:
Build unique condition code identifying described Sub Data Set;
Described according to described memory module, each item data in described first Sub Data Set is stored to structure
Change data storage area, described second Sub Data Set is stored to unstructured data memory block, including:
According to described memory module, deposit corresponding with each item data in the first Sub Data Set for described condition code
Store up to structural data memory block and described condition code is corresponding with each item data in the second Sub Data Set
Store to unstructured data memory block.
Preferably, described acquisition data set to be analyzed, including:
The data of designated ratio are extracted as described data set to be analyzed from the initial data collected.
Preferably, the data category of described each Sub Data Set determined in described data set, including:
Analyze the organized formats of the data of each Sub Data Set in described data set, determine described Sub Data Set
Data category corresponding to the organized formats of data.
Preferably, the data category of described each Sub Data Set determined in described data set, including:
Inquire about the Data Identification included in each Sub Data Set in described data set, determine and described son
The data category that Data Identification included in data set is corresponding.
Preferably, described memory module, including:
The data memory format of data item, the data space of data item and/or index information.
Preferably, also include:
Receive described preset data category and the corresponding relation of data classifying rules is more newly requested;
According to described more newly requested, change or add the corresponding relation of data category and data classifying rules.
On the other hand, the date storage method of corresponding the application, present invention also provides a kind of data storage
Device, including:
Data capture unit, is used for obtaining data set to be analyzed;
Classification determination unit, for determining the data category of each Sub Data Set in described data set;
Classifying rules determines unit, and for inquiring about, preset data category is corresponding with data classifying rules to close
System, determines the data classifying rules of described Sub Data Set;
Data sorting unit, for the data classifying rules according to described Sub Data Set, by described subdata
Collection is divided into the first Sub Data Set and the second Sub Data Set, and determines described first Sub Data Set and the second son
The memory module of each item data in data set;
Memory element, for according to described memory module, by each item data in described first Sub Data Set
Store to structural data memory block, described second Sub Data Set is stored to unstructured data memory block.
Preferably, described device also includes:
Condition code signal generating unit, for building unique condition code identifying described Sub Data Set;
Described memory element, particularly as follows: for according to described memory module, by described condition code and first
Each item data correspondence in Sub Data Set stores to structural data memory block, and by described condition code and the
Each item data correspondence in two Sub Data Set stores to unstructured data memory block.
Preferably, described data capture unit, particularly as follows: for extracting from the initial data collected
The data of designated ratio are as described data set to be analyzed.
Preferably, described classification determination unit, including:
First category determines unit, for analyzing the tissue of the data of each Sub Data Set in described data set
Form, determines the data category that the organized formats of the data of described Sub Data Set is corresponding.
Preferably, described classification determination unit, including:
Second category determines unit, for inquiring about included in each Sub Data Set in described data set
Data Identification, determines the data category corresponding with the Data Identification included in described Sub Data Set.
Preferably, the memory module that described data sorting unit is determined includes: the data storage of data item
Form, the data space of data item and/or index information.
Preferably, described device also includes:
Update request reception unit, for receiving described preset data category and data classifying rules
Corresponding relation more newly requested;
Policy Updates unit, for according to described more newly requested, change or adds data category and divide with data
The corresponding relation of rule-like.
Understand via above-mentioned technical scheme, compared with prior art, this application provides a kind of data and deposit
Method for storing and device, the method sets different data classifying ruless for different types of data, really
After the data classifying rules that the data type of sub-data is corresponding, the data in Sub Data Set are divided two
Divided data, is deposited into structured data memory block and the data item of unstructured data memory block to realize adjusting,
With can reduce the problem of retrieval inconvenience individually with structured storage mode, simultaneously with individually with non-knot
Structure data storage method is compared, and decreases the situation that cannot be carried out complex statistics.Meanwhile, storage is worked as
When demand changes, can directly adjust the data classifying rules that data category is corresponding, can adjust easily
The storage mode of certain class data.
Accompanying drawing explanation
In order to be illustrated more clearly that the embodiment of the present application or technical scheme of the prior art, below will be to reality
Execute the required accompanying drawing used in example or description of the prior art to be briefly described, it should be apparent that below,
Accompanying drawing in description is only embodiments herein, for those of ordinary skill in the art, not
On the premise of paying creative work, it is also possible to obtain other accompanying drawing according to the accompanying drawing provided.
Fig. 1 shows the schematic flow sheet of the application one embodiment of a kind of date storage method;
Fig. 2 shows the schematic flow sheet of the application another embodiment of a kind of date storage method;
Fig. 3 shows the structural representation of the application one embodiment of a kind of data storage device;
Fig. 4 shows the structural representation of the application another embodiment of a kind of data storage device.
Detailed description of the invention
Below in conjunction with the accompanying drawing in the embodiment of the present application, the technical scheme in the embodiment of the present application is entered
Row clearly and completely describes, it is clear that described embodiment is only some embodiments of the present application,
Rather than whole embodiments.Based on the embodiment in the application, those of ordinary skill in the art are not having
Have and make the every other embodiment obtained under creative work premise, broadly fall into the application protection
Scope.
The application can be used in numerous general or special purpose calculating device context or configuration.Such as: individual
Computer, server computer, handheld device or portable set, laptop device, multiprocessor
Device, the distributed computing environment including any of the above device or equipment etc..
The application can retouch in the general context of computer executable instructions
State, such as program module.Usually, program module includes performing particular task or realizing specific abstract
The routine of data type, program, object, assembly, data structure etc..Can also be at distributed meter
Calculate in environment and put into practice the application, in these distributed computing environment, connected by by communication network
The remote processing devices connect is to perform task.In a distributed computing environment, program module may be located at
In local and remote computer-readable storage medium including storage device.
See Fig. 1, it is shown that the schematic flow sheet of the application one embodiment of a kind of date storage method, this
The method of embodiment includes:
Step 101: obtain data set to be analyzed.
Typically being required to carry out the collection of data before carrying out big data mining, substantial amounts of data are by from life
Become in each target source of data and collect, in order to follow-up carry out data mining.The number in different application field
May all differences according to acquisition mode and the data that collect.As, in field of scientific study and calculating
Machine emulation field, data are obtained by supercomputer parallel computation, and these supercomputers are permissible
As the target source of generation data, and the data collected are generally the calculating data of magnanimity;And for mutually
Working application field, then by disposing network data acquisition equipment at gateway or data center, and can depend on
Network packet is gathered by fixed port according to Internet protocol.
In this application, this data set to be analyzed refers to the use collected from the source data that target source generates
In the data acquisition system carrying out data mining.Concrete, after collecting source data from target source, can be by
The institute's active data collected is all as data set to be analyzed.But the source data collected in actual applications
Data volume relatively big, therefore using all of source data all as data set to be analyzed, data mining can be made
Data processing amount is relatively big and the longest, accordingly it is also possible to extraction is specified from the source data collected
The sample data of ratio is as data set to be analyzed.Wherein, when sample drawn data from source data,
The various types of data messages comprising in this source data in this sample data should be made.As, with the Internet
In data mining as a example by, the data to be analyzed got should comprise different Internet protocol as far as possible
Application data.
Step 102: determine the data category of each Sub Data Set in this data set.
In the embodiment of the present application, after getting data set to be analyzed, it is not directly this to be treated point
All data in analysis data set the most directly store to a permanent storage area.
Owing to data set to be analyzed containing substantial amounts of various different types of data message, therefore,
In the embodiment of the present application, it is necessary first to each Sub Data Set data set included in be analyzed to this
Data content is analyzed, and determines the data category of each subdata intensive data.
Wherein, data category can be understood as the functional category that certain data is had, different pieces of information classification
The meaning that data can be expressed is different.Same category of data have and can make a distinction with other data
Data orga-nizational format or attribute information.In different applications, the division of data category is the most
Difference, specifically can be as the criterion to be actually needed.
In order to make it easy to understand, be to enter as a example by the data acquisition system got from the Internet by data set to be analyzed
Row is introduced.The data set to be analyzed got from the Internet contains and obtains from various different agreement ports
The network packet of different application agreements that get, various.Each network packet can be managed
Solution is a Sub Data Set, and different network packet may be corresponding different data categories.Specifically
, the data set got can include the data categories such as webpage, mail, microblogging, instant messaging
Data set, accordingly, it is determined that the data type of each Sub Data Set can determine that in the data set got
Go out that the data content in each subdata belongs in the data categories such as webpage, mail, microblogging is any.
Wherein it is determined that the data class of each Sub Data Set in data set has multiple otherwise, Qi Zhongyi
The mode of kind can be: analyzes the organized formats of the data of each Sub Data Set in this data set, determines each
The data category that the organized formats of subdata intensive data is corresponding.Generally, different pieces of information classification tool
There is different data orga-nizational format, by the data content in each Sub Data Set is analyzed, determine
Go out the number that the data orga-nizational format in Sub Data Set just can determine that the data in this Sub Data Set are belonged to
According to classification.Such as, in internet data, mail is postal with EML format organization data, i.e. data category
The general format of the data of part is EML form, therefore when analyzing data orga-nizational format in Sub Data Set is
EML form, just can determine that the data category of this Sub Data Set is email type.
Another kind determines that the data class in subdata is otherwise: each Sub Data Set in inquiry data set
Included in Data Identification, according to the data category represented by this Data Identification, determine in Sub Data Set
The data category of data.It is to say, distinguish different pieces of information class when the data of the data set gathered contain
During other Data Identification, it is also possible to determine by analyzing the Data Identification comprised in each Sub Data Set
Data category in this data set.
It addition, in actual applications, according to the feature of the data in different application field, it is also possible to there are other
Determine data class otherwise, numerous to list herein.
Step 103: inquire about the corresponding relation of preset data category and data classifying rules, determine subdata
The data classifying rules of collection.
The embodiment of the present application presets the classifying rules of different pieces of information classification, for different pieces of information class
Another characteristic is set with different data classifying ruless, defines and contain this data in data classifying rules
The classifying rules of data item in the Sub Data Set of classification, and store each data item in which way.
Wherein, this preset data category can be carried out according to the actual requirements with the corresponding relation of data classifying rules
Set.
Step 104: according to the data classifying rules of Sub Data Set, Sub Data Set is divided into the first subdata
Collection and the second Sub Data Set, and determine the storage of each item data in the first Sub Data Set and the second Sub Data Set
Pattern.
By inquiring about this preset corresponding relation, each self-corresponding number of each subdata can be determined respectively
According to classifying rules.For any one Sub Data Set, according to the data of this Sub Data Set determined
Classifying rules, the data item in this Sub Data Set being incorporated into is the first Sub Data Set and the second Sub Data Set,
Meanwhile, storage format when each data item stores in the first Sub Data Set, and the second son are determined
The memory module of each data item in data set.
Wherein, the memory module of data item defines when carrying out the storage of this data item, the number of this data item
According to storage format, the data space of this data item, whether index information and whether compressing is set deposits
One or more information such as storage.
Step 105: according to the memory module determined, each item data in this first Sub Data Set is stored
To structural data memory block, described second Sub Data Set is stored to unstructured data memory block.
From existing that data in all data sets directly store a direct memory block is different, the application
Embodiment combines structural data storage mode and the advantage of unstructured data storage two ways,
First Sub Data Set and the second Sub Data Set be all divided into for any one Sub Data Set, and according to
The memory module of each data item in the first Sub Data Set determined, by each item number in the first Sub Data Set
According to storing to structural data memory block;Meanwhile, according to each data item in the second Sub Data Set determined
Memory module, each item data in the second Sub Data Set is deposited into unstructured data memory block.
Wherein, the data in structural data memory block are the shapes of the database table with ad hoc structure
Formula exists, and this structural data memory block is it can be appreciated that a relevant database.Contrary, non-
In structured data storage the most there is not specific structured features in the data mode of storage, and concrete, this is non-
Structural data memory block can be understood as file system.
It should be noted that for any one Sub Data Set, which data item in this Sub Data Set can
To be divided into the first Sub Data Set, which data item can be divided into the second Sub Data Set, is all referred to
The data classifying rules corresponding with this Sub Data Set determines.When needs change storage mode, it is only necessary to more
Change the corresponding relation of this data category and data classifying rules, reset corresponding to certain class data category
Data classifying rules i.e. adjustable data storage method.
In order to preferably embody database table and by file storage in the way of respective advantage, optional
, when dividing a Sub Data Set, can be the data item belonging to structural data in Sub Data Set
It is divided into the first Sub Data Set, and the unstructured data in Sub Data Set is divided into the second Sub Data Set,
And then make the data being deposited into this structural data memory block be structurized data item, and store and arrive
The data item of unstructured data memory block is unstructured data.
In order to make it easy to understand, be introduced as a example by the data category of Sub Data Set is as email type, an envelope
The data content of mail contains: source address, source port, destination address, destination interface, mail class
The number such as type, sender, addressee, mail header, message body, Email attachment, post time
According to item.And assume that this and data are the data classifying rules regulation that mail is corresponding, by mail header, mail
The structural data such as text, Email attachment incorporates into is the first Sub Data Set, and concrete regulation mail header,
The memory module of the data item such as message body, Email attachment;By the transmission time, source address, source port,
It is that the unstructured datas such as destination address, destination interface, email type, sender, addressee incorporate into
Data in one Sub Data Set, and then when storing, then by each item number in this first Sub Data Set
Store to structural data memory block according to according to the memory module determined, and by the second Sub Data Set
Each item data stores destructuring memory block according to the memory module of its correspondence, it is achieved thereby that according to knot
The feature of structure data and non-structural data self stores respectively, optimizes data storage, it is simple to
Carry out adding up the operations such as retrieval to the data of storage.Certainly, this example is only wrap with the first Sub Data Set
Containing structural data, and the second Sub Data Set is introduced as a example by only comprising unstructured data, but
Actual application is likely to adjust as required some structural data is divided to the second subnumber
According to collection, or unstructured data is divided to the first Sub Data Set.
In the date storage method of the present embodiment, after getting data set to be analyzed, need to be determined this
The data category of each subdata intensive data in data set, and classify with data from preset data category
In the corresponding relation of rule, determine the data classifying rules that subdata intensive data is corresponding, by this subdata
Collection is divided into the first Sub Data Set and the second Sub Data Set, and obtains the first Sub Data Set and the second subdata
Concentrate the memory module of each data item, and then according to the memory module determined, by the first Sub Data Set
Store respectively to structural data memory block and unstructured data memory block with the second Sub Data Set.Due to
When data store, determine different classifying ruless according to different data categories, and by same category
Data be divided into two parts and store respectively to structural data memory block and destructuring memory block, thus drop
Low inconvenient individually with the retrieval existing for structured storage mode, and overcome individually with non-knot
The problem that cannot be carried out complex statistics existing for structure data.Meanwhile, the embodiment of the present application needs when storage
When asking change, can directly adjust the data classifying rules that data category is corresponding, can be adjusted certain easily
The storage mode of class data, efficient and convenient.
See Fig. 2, it illustrates the flow process signal of the application another embodiment of a kind of data storage method
Figure, the method for the present embodiment includes:
Step 201: obtain data set to be analyzed.
Step 202: determine the data category of each Sub Data Set in this data set.
Step 203: inquire about the corresponding relation of preset data category and data classifying rules, determine subdata
The data classifying rules of collection.
Step 204: according to the data classifying rules of Sub Data Set, Sub Data Set is divided into the first subdata
Collection and the second Sub Data Set, and determine the storage of each item data in the first Sub Data Set and the second Sub Data Set
Pattern.
The operating process to step 204 of the step 201 of the present embodiment and the step 101 of embodiment illustrated in fig. 1
Operating process to step 104 is similar to, and associated description may refer to the description of embodiment illustrated in fig. 1,
This repeats no more.
Step 205: build unique condition code identifying Sub Data Set.
In the present embodiment, after determining the classifying rules of each subdata intensive data, need respectively
A condition code is built for each Sub Data Set in this data set.Wherein, this feature code is mark one
The mark of Sub Data Set, e.g., condition code can be a 32-bit number, and condition code is with Sub Data Set one by one
Corresponding.
When building each Sub Data Set characteristic of correspondence code, a Sub Data Set can determined
After the memory module of data classifying rules and this subdata intensive data, i.e. generate the spy of this Sub Data Set
Levy code.It is of course also possible to be when classifying rules and the phase determining all Sub Data Set in this data set
After the memory module answered, unified generation and the condition code of total Sub Data Set number equal number.
It should be noted that this subnumber of mark can also be comprised in each Sub Data Set characteristic of correspondence code
Mark according to the data category of intensive data.
Step 206: according to this memory module, condition code is corresponding with each item data in the first Sub Data Set
Store to structural data memory block, and deposit corresponding with each item data in the second Sub Data Set for condition code
Storage is to unstructured data memory block.
In the present embodiment, each item data in the first Sub Data Set is stored to structural data memory block
Time, need the Sub Data Set characteristic of correspondence code belonging to this first Sub Data Set is deposited into structuring together
Data storage area, meanwhile, in this structural data memory block, each data item of this first Sub Data Set is equal
Corresponding with this feature code.In other words, be i.e. deposited in the first Sub Data Set in structured storage district is each
Item data has incidence relation with this feature code being stored in this Sub Data Set simultaneously.Corresponding, at non-knot
Structure memory block stores each item data of this second Sub Data Set and this feature code the most simultaneously, this feature code with
Each item data in second Sub Data Set also has incidence relation.
When needing to inquire about the data stored respectively in same Sub Data Set to two memory blocks, the most permissible
This Sub Data Set characteristic of correspondence code of direct basis is inquired about, and can be conveniently and quickly searched same
All data item of Sub Data Set.
As a example by still the data in Sub Data Set are an envelope mail, when determining each item data in this mail
After division rule, generate and this mail characteristic of correspondence code M.And then by the mail header in mail, postal
What the data item such as part text, Email attachment were corresponding with this feature code M is deposited into structural data memory block;
By the transmission time of mail, source address, source port, destination address, destination interface, email type, send out
What part people, addressee were corresponding with this feature code M is deposited into unstructured data memory block.Due to structuring
Many envelopes that memory block and destructuring memory block store different mail address the most respectively, different time sends
Mail, if the condition code of being not provided with, then needs to input multiple search conditions in structural data memory block
The data message that this envelope mail is relevant can be retrieved, also need to input in destructuring memory block simultaneously
Multiple search conditions search for the data message that this envelope mail is relevant, just can obtain the complete of this envelope mail
Information.If the search condition of input is incorrect, it is also possible to there will be and simultaneously scans for sealing mail more
Data message, in addition it is also necessary to user the most further retrieves and just can obtain required mail.By
Data item relevant to this envelope mail in two memory blocks is corresponding store condition code after, when needs inquiry should
When sealing the related data information of mail, then can be able to search and this envelope mail with direct basis this feature code
Relevant all data item, decrease data processing amount, also improve the accuracy of data search.
In the present embodiment at each item number by the first Sub Data Set in Sub Data Set and the second Sub Data Set
Before storing, generate a condition code for Sub Data Set, and by condition code and this first Sub Data Set
In each item data correspondence be deposited into structural data memory block, and by each item data in the second Sub Data Set
Corresponding with this feature code store to unstructured data memory block, so when data mining, the need to
When inquiring about all data of same Sub Data Set, can by this feature code in two memory blocks efficiently
Inquire all data corresponding with this feature code.
Further, in any one date storage method more than the embodiment of the present application, for the ease of
The corresponding relation of preset data category with data classifying rules is modified, or adds new number
According to the data classifying rules that classification is corresponding, the method for the embodiment of the present application can also include: receives preset
Data category and the corresponding relation of data classifying rules more newly requested;According to more newly requested, change or
Add the corresponding relation of data category and data classifying rules.When receiving more newly requested, according to this more
The newly requested classifying rules to be added comprised or content to be modified, divide with data corresponding data category
Rule-like is modified.
The date storage method of corresponding the embodiment of the present application, the embodiment of the present application additionally provides a kind of data and deposits
Storage device, sees Fig. 3, it is shown that the structural representation of the present invention one embodiment of a kind of data storage device,
The device of the present embodiment includes: data capture unit 301, classification determination unit 302, classifying rules determine
Unit 303, data sorting unit 304 and memory element 305.
Wherein, this data capture unit 301, it is used for obtaining data set to be analyzed.
Classification determination unit 302, for determining the data category of each Sub Data Set in this data set.
Classifying rules determines unit 303, corresponding with data classifying rules for inquiring about preset data category
Relation, determines the data classifying rules of described Sub Data Set.
Data sorting unit 304, for the data classifying rules according to Sub Data Set, by each Sub Data Set
It is divided into the first Sub Data Set and the second Sub Data Set, and determines this first Sub Data Set and the second subdata
Concentrate the memory module of each item data.
Wherein, the memory module that this data sorting unit is determined, including: the data storage lattice of data item
Formula, the data space of data item and/or index information.
Memory element 305, for according to described memory module, by each item number in described first Sub Data Set
According to storing to structural data memory block, described second Sub Data Set is stored to unstructured data storage
District.
Wherein, this data capture unit obtains the mode of data to be analyzed to be had multiple, corresponding one of which side
Formula, this data acquisition list 304 yuan, particularly as follows: specify ratio for extraction from the initial data collected
The data of example are as described data set to be analyzed.
In actual applications, the data class during classification determination unit determines Sub Data Set the most also may be used
There to be various ways, corresponding one way in which, the category determines that unit 302 includes:
First category determines unit, for analyzing the tissue of the data of each Sub Data Set in described data set
Form, determines the data category that the organized formats of the data of described Sub Data Set is corresponding.
Corresponding another kind determines that in Sub Data Set, data class is otherwise, and the category determines that unit 302 is permissible
Including:
Second category determines unit, for inquiring about included in each Sub Data Set in described data set
Data Identification, determines the data category corresponding with the Data Identification included in described Sub Data Set.
See Fig. 4, it is shown that a kind of structural representation storing another embodiment of device of the present invention, this reality
The difference of the storage device and embodiment illustrated in fig. 3 of executing example is, this storage device in the present embodiment
Also include: condition code signal generating unit 306.
This feature code signal generating unit 306, for building unique condition code identifying Sub Data Set.
Wherein, condition code and Sub Data Set one_to_one corresponding.
Accordingly, this memory element 305, particularly as follows: for according to this memory module, by condition code and the
Each item data correspondence in one Sub Data Set stores to structural data memory block, and by condition code and second
Each item data correspondence in Sub Data Set stores to unstructured data memory block.
In the present embodiment before memory element carries out data storage, condition code signal generating unit it is every height
Data set generates unique condition code identifying a Sub Data Set, and then by memory element by Sub Data Set
Condition code and each data item in the first Sub Data Set of this Sub Data Set are deposited into structural data storage
District, and the condition code of this Sub Data Set is deposited with each item data in the second Sub Data Set of this Sub Data Set
Enter to unstructured data memory block, thus during data query in carrying out same Sub Data Set, only need
All of data message in this Sub Data Set can be inquired easily according to this feature code.
Further, can also include in one device embodiment of the application any of the above: more newly requested
Unit and Policy Updates unit.
Wherein, update request reception unit, for receiving, described preset data category is classified with data
Corresponding relation more newly requested of rule;
Policy Updates unit, for according to described more newly requested, change or adds data category and divide with data
The corresponding relation of rule-like.
For aforesaid each method embodiment, in order to be briefly described, therefore it is all expressed as a series of dynamic
Combining, but those skilled in the art should know, the application is not by described sequence of movement
Limiting, because according to the application, some step can use other orders or carry out simultaneously.Secondly,
Those skilled in the art also should know, embodiment described in this description belongs to preferred embodiment,
Necessary to involved action and module not necessarily the application.
It should be noted that each embodiment in this specification all uses the mode gone forward one by one to describe, each
What embodiment stressed is all the difference with other embodiments, identical similar between each embodiment
Part see mutually.For device class embodiment, due to the basic phase of itself and embodiment of the method
Seemingly, so describe is fairly simple, relevant part sees the part of embodiment of the method and illustrates.
Finally, in addition it is also necessary to explanation, in this article, the relational terms of such as first and second or the like
It is used merely to separate an entity or operation with another entity or operating space, and not necessarily requires
Or imply relation or the order that there is any this reality between these entities or operation.And, art
Language " includes ", " comprising " or its any other variant are intended to comprising of nonexcludability, thus
Make to include that the process of a series of key element, method, article or equipment not only include those key elements, and
Also include other key elements being not expressly set out, or also include for this process, method, article or
The key element that person's equipment is intrinsic.In the case of there is no more restriction, by statement " including ... "
The key element limited, it is not excluded that also deposit in including the process of described key element, method, article or equipment
In other identical element.
For convenience of description, it is divided into various unit to be respectively described with function when describing apparatus above.Certainly,
The function of each unit can be realized in same or multiple softwares and/or hardware when implementing the application.
Above a kind of date storage method provided herein and device are described in detail, this
Literary composition applies specific case principle and the embodiment of the application are set forth, above example
Explanation be only intended to help and understand the present processes and core concept thereof;Simultaneously for this area
Those skilled in the art, according to the thought of the application, the most all can
Change part, and in sum, this specification content should not be construed as the restriction to the application.
Claims (8)
1. a date storage method, it is characterised in that including:
Obtain data set to be analyzed;
Determine the data category of each Sub Data Set in described data set;
Inquire about the corresponding relation of preset data category and data classifying rules, determine described Sub Data Set
Data classifying rules;
According to the data classifying rules of described Sub Data Set, described Sub Data Set is divided into the first subdata
Collection and the second Sub Data Set, and determine each item data in described first Sub Data Set and the second Sub Data Set
Memory module;
According to described memory module, each item data in described first Sub Data Set is stored to structuring number
According to memory block, described second Sub Data Set is stored to unstructured data memory block;
The data category of wherein said each Sub Data Set determined in described data set, including:
Analyze the organized formats of the data of each Sub Data Set in described data set, determine described Sub Data Set
Data category corresponding to the organized formats of data;
Or,
Inquire about the Data Identification included in each Sub Data Set in described data set, determine and described son
The data category that Data Identification included in data set is corresponding;
Wherein, according to described memory module, each item data in described first Sub Data Set is stored to
Structural data memory block, stored described second Sub Data Set before unstructured data memory block,
Also include:
Build unique condition code identifying described Sub Data Set;
Described according to described memory module, each item data in described first Sub Data Set is stored to structure
Change data storage area, described second Sub Data Set is stored to unstructured data memory block, including:
According to described memory module, deposit corresponding with each item data in the first Sub Data Set for described condition code
Store up to structural data memory block and described condition code is corresponding with each item data in the second Sub Data Set
Store to unstructured data memory block.
Method the most according to claim 1, it is characterised in that described acquisition data set to be analyzed,
Including:
The data of designated ratio are extracted as described data set to be analyzed from the initial data collected.
Method the most according to claim 1, it is characterised in that described memory module, including:
The data memory format of data item, the data space of data item and/or index information.
Method the most according to claim 1, it is characterised in that also include:
Receive described preset data category and the corresponding relation of data classifying rules is more newly requested;
According to described more newly requested, change or add the corresponding relation of data category and data classifying rules.
5. a data storage device, it is characterised in that including:
Data capture unit, is used for obtaining data set to be analyzed;
Classification determination unit, for determining the data category of each Sub Data Set in described data set;
Classifying rules determines unit, and for inquiring about, preset data category is corresponding with data classifying rules to close
System, determines the data classifying rules of described Sub Data Set;
Data sorting unit, for the data classifying rules according to described Sub Data Set, by described subdata
Collection is divided into the first Sub Data Set and the second Sub Data Set, and determines described first Sub Data Set and the second son
The memory module of each item data in data set;
Memory element, for according to described memory module, by each item data in described first Sub Data Set
Store to structural data memory block, described second Sub Data Set is stored to unstructured data memory block;
Wherein said classification determination unit, including:
First category determines unit, for analyzing the tissue of the data of each Sub Data Set in described data set
Form, determines the data category that the organized formats of the data of described Sub Data Set is corresponding;
Or,
Second category determines unit, for inquiring about included in each Sub Data Set in described data set
Data Identification, determines the data category corresponding with the Data Identification included in described Sub Data Set;
Wherein, also include:
Condition code signal generating unit, for building unique condition code identifying described Sub Data Set;
Described memory element, particularly as follows: for according to described memory module, by described condition code and first
Each item data correspondence in Sub Data Set stores to structural data memory block, and by described condition code and the
Each item data correspondence in two Sub Data Set stores to unstructured data memory block.
Device the most according to claim 5, it is characterised in that described data capture unit, specifically
For: for extracting the data of designated ratio from the initial data collected as described data set to be analyzed.
Device the most according to claim 5, it is characterised in that described data sorting unit is determined
Memory module include: the data memory format of data item, the data space of data item and/or index
Information.
Device the most according to claim 5, it is characterised in that also include:
Update request reception unit, for receiving described preset data category and data classifying rules
Corresponding relation more newly requested;
Policy Updates unit, for according to described more newly requested, change or adds data category and divide with data
The corresponding relation of rule-like.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201210438962.6A CN102915373B (en) | 2012-11-06 | 2012-11-06 | A kind of date storage method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201210438962.6A CN102915373B (en) | 2012-11-06 | 2012-11-06 | A kind of date storage method and device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN102915373A CN102915373A (en) | 2013-02-06 |
CN102915373B true CN102915373B (en) | 2016-08-10 |
Family
ID=47613739
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201210438962.6A Active CN102915373B (en) | 2012-11-06 | 2012-11-06 | A kind of date storage method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN102915373B (en) |
Families Citing this family (28)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103440290A (en) * | 2013-08-16 | 2013-12-11 | 曙光信息产业股份有限公司 | Big data loading system and method |
CN103440288A (en) * | 2013-08-16 | 2013-12-11 | 曙光信息产业股份有限公司 | Big data storage method and device |
CN103440303A (en) * | 2013-08-21 | 2013-12-11 | 曙光信息产业股份有限公司 | Heterogeneous cloud storage system and data processing method thereof |
CN103440130A (en) * | 2013-08-26 | 2013-12-11 | 成都金山数字娱乐科技有限公司 | Data processing method and device |
CN104731800B (en) * | 2013-12-20 | 2018-10-23 | 中国银联股份有限公司 | Data analysis set-up |
CN103745262A (en) * | 2013-12-30 | 2014-04-23 | 远光软件股份有限公司 | Data collection method and device |
CN104090901B (en) * | 2013-12-31 | 2017-06-13 | 腾讯数码(天津)有限公司 | A kind of method that data are processed, device and server |
CN103699694B (en) * | 2014-01-13 | 2017-08-29 | 联想(北京)有限公司 | A kind of data processing method and device |
WO2015165112A1 (en) * | 2014-04-30 | 2015-11-05 | Pivotal Software, Inc. | Validating analytics results |
CN104102701B (en) * | 2014-07-07 | 2017-10-13 | 浪潮(北京)电子信息产业有限公司 | A kind of historical data based on hive is achieved and querying method |
CN104462287B (en) * | 2014-11-27 | 2018-10-12 | 华为技术服务有限公司 | A kind of method, apparatus and system of data processing |
CN104715040A (en) * | 2015-03-23 | 2015-06-17 | 浪潮集团有限公司 | Data classification method and device |
CN106469195A (en) * | 2016-08-31 | 2017-03-01 | 国信优易数据有限公司 | Based on conforming data file Valuation Method and system |
CN107103060B (en) * | 2017-04-14 | 2021-02-26 | 湖南云智迅联科技发展有限公司 | Storage method and system of sensing data |
CN107453948A (en) * | 2017-07-28 | 2017-12-08 | 北京邮电大学 | The storage method and system of a kind of network measurement data |
CN109598648A (en) * | 2017-09-30 | 2019-04-09 | 北京国双科技有限公司 | Information processing method and device |
CN110020357B (en) * | 2017-10-31 | 2021-08-24 | 北京国双科技有限公司 | Data storage method, data storage device, storage medium and processor |
CN108228101B (en) * | 2017-12-28 | 2022-03-15 | 北京盛和大地数据科技有限公司 | Method and system for managing data |
CN110633315A (en) * | 2018-06-20 | 2019-12-31 | 中国移动通信集团有限公司 | Data processing method and device and computer storage medium |
CN109522352A (en) * | 2018-11-08 | 2019-03-26 | 内蒙古伊泰煤炭股份有限公司 | Industrial data management system and method |
CN109446204B (en) * | 2018-11-27 | 2022-04-15 | 北京微播视界科技有限公司 | Data storage method and device for instant messaging, electronic equipment and medium |
CN110941640A (en) * | 2018-12-25 | 2020-03-31 | 广州中软信息技术有限公司 | Intelligent screening method, device, equipment, system and medium for problem clues |
CN109947706A (en) * | 2019-02-13 | 2019-06-28 | 上海泉涸信息科技有限公司 | File management system and file management method |
CN111192072B (en) * | 2019-10-29 | 2023-08-04 | 腾讯科技(深圳)有限公司 | User grouping method and device and storage medium |
CN112783825B (en) * | 2019-11-04 | 2024-01-02 | 富泰华工业(深圳)有限公司 | Data archiving method, device, computer device and storage medium |
CN111210879B (en) * | 2020-01-06 | 2021-03-26 | 中国海洋大学 | Hierarchical storage optimization method for super-large-scale drug data |
CN111966645A (en) * | 2020-08-12 | 2020-11-20 | 南方科技大学 | Supercomputer data storage method, device, system and storage medium |
CN113655968B (en) * | 2021-08-24 | 2024-06-18 | 上海晋朔信息科技有限公司 | Unstructured data storage method |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101042747A (en) * | 2006-03-24 | 2007-09-26 | 上海中经互联网络有限公司 | Economic operation analysis system |
CN101174957A (en) * | 2007-10-09 | 2008-05-07 | 南京财经大学 | Cooperation service platform facing different source data |
CN101441629A (en) * | 2007-11-19 | 2009-05-27 | 上海新纳广告传媒有限公司 | Automatic acquiring method of non-structured web page information |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20100274750A1 (en) * | 2009-04-22 | 2010-10-28 | Microsoft Corporation | Data Classification Pipeline Including Automatic Classification Rules |
-
2012
- 2012-11-06 CN CN201210438962.6A patent/CN102915373B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101042747A (en) * | 2006-03-24 | 2007-09-26 | 上海中经互联网络有限公司 | Economic operation analysis system |
CN101174957A (en) * | 2007-10-09 | 2008-05-07 | 南京财经大学 | Cooperation service platform facing different source data |
CN101441629A (en) * | 2007-11-19 | 2009-05-27 | 上海新纳广告传媒有限公司 | Automatic acquiring method of non-structured web page information |
Also Published As
Publication number | Publication date |
---|---|
CN102915373A (en) | 2013-02-06 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN102915373B (en) | A kind of date storage method and device | |
CN102915347B (en) | A kind of distributed traffic clustering method and system | |
US8655805B2 (en) | Method for classification of objects in a graph data stream | |
CN110019486A (en) | Collecting method, device, equipment and storage medium | |
CN108632100B (en) | Method and system for discovering and presenting network application access information | |
CN105224606A (en) | A kind of disposal route of user ID and device | |
Yan et al. | Quegel: A general-purpose query-centric framework for querying big graphs | |
CN105912716A (en) | Short text classification method and apparatus | |
CN103258049A (en) | Association rule mining method based on mass data | |
WO2020219862A1 (en) | Machine learning classifier for identifying internet service providers from website tracking | |
CN102567494B (en) | Website classification method and device | |
CN108234233B (en) | Log processing method and device | |
CN103544259B (en) | Aggregating sorting TopK inquiry processing method and system | |
JP6756744B2 (en) | Location information provision method and equipment | |
CN113031951B (en) | Menu generation method, menu generation device, computer equipment and storage medium | |
CN101141370A (en) | Gridding service based electric power enterprise real-time data processing method | |
KR20190108657A (en) | Extracting similar group elements | |
CN105389330B (en) | Across the community open source resources of one kind match correlating method | |
CN106599189A (en) | Dynamic Skyline inquiry device based on cloud computing | |
CN109636682A (en) | A kind of teaching resource auto-collection system | |
CN103984700B (en) | A kind of isomeric data analysis method for scientific and technological information vertical search | |
KR101973328B1 (en) | Correlation analysis and visualization method of Hadoop based machine tool environmental data | |
Wang et al. | Research of massive web log data mining based on cloud computing | |
CN106406985A (en) | A distributed computing frame and a distributed computing method | |
Czyzowicz et al. | Enhancing hyperlink structure for improving web performance |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant | ||
EE01 | Entry into force of recordation of patent licensing contract |
Application publication date: 20130206 Assignee: Yangzhou Wanfang Electronic Technology Co., Ltd. Assignor: Jiangnan Computing Technology Inst., Wuxi Contract record no.: 2017320000002 Denomination of invention: Data storage method and device Granted publication date: 20160810 License type: Exclusive License Record date: 20170116 |
|
LICC | Enforcement, change and cancellation of record of contracts on the licence for exploitation of a patent or utility model |