Towards the onion formula data organization method and system of big data analysis
Technical field
The present invention relates to big data analysis technical field, in particular it relates to the onion formula data group analyzed towards big data
Organization method and system.
Background technology
With continuing to develop for information technology, particularly network technology, the networking with internet as representative is in society
Every aspect, industry-by-industry launch, breach the limitation in time and space, global metadata just increases with surprising rapidity, people
Class society just enters the DT epoch by information-technology age, and digitlization changes behavior pattern, the values of consumer, changes enterprise
Business model and operation mode.Gartner researchs show that we produce the data more than 2.5EB daily, have been enter into E grades of epoch,
Scholarly forecast, to the year two thousand twenty, global metadata total amount will be more than 40ZB.According to statistics, Google will be processed more than 24PB's daily
Data, its data volume is thousands of times of data contained by all paper publication things in American National library;FaceBook updates daily
Photo amount more than 10,000,000, daily people click on " liking " button on website or make the comments more than 3,000,000,000 times;
Up to 800,000,000 visitor is monthly received in YouTube video website, and average each second just has a segment length regarding more than a hour
Frequency is uploaded;Wechat, most social medias is used as us at one's side, and monthly any active ues have reached 5.49 hundred million, user's covering
Country more than 200, more than 20 kinds of language, light audio chat data is just more than 2.8 hundred million minutes in daily data volume;Sina weibo
Registered user nearly 500,000,000 is had more than 3 years in the time short.
The appearance and development of big data, huge facility is brought to our life.Meanwhile, the diversity of data, complexity
And the huge scale of construction also makes Data Analysis Services face unprecedented challenge, how preferably management and use big data into
It is the topic of common concern.The parallel processing skill of a collection of big data platform and correlation with Hadoop as representative was occurred in that in recent years
Art, but lack more efficiently data organizational form all the time, this analysis and utilization to data brings very big obstruction, its a large amount of property
(Volume), the feature of diversity (Variety), high speed (Velocity) and value (Value) so that people analyze number
More and more challenged according to facing.First it is the challenge of data complexity, the data type and pattern of big data have more
Diversity, contacts also complex, and the quality of data is very different, causes data to be encountered by phase on understanding, calculating and expressing
When big difficulty, semantic analysis also becomes extremely complex, largely have impact on new data organization model with the cognition of emotion
Design and fabrication.Next to that the challenge of computational complexity, the These characteristics of big data cause that conventional machine learning, information are searched
Rope and Data Collection cannot get effective support of current big data, it is impossible to enough carry out the data analysis of global formula and calculate, because
And need to depart from the constraint of traditional calculations in good time when calculating.3rd is the challenge of system complexity, currently, even Hadoop
Etc. big data processing platform, can also there is that calculating cycle is long, difficulty is higher asks facing that data are big, in the case of complex structure
Topic, this problem provides more acute not only to overall structure, calculating mechanism and the calculation of big data processing system
Challenge, while also all causing huge challenge in terms of the speed of service of data handling system and its power consumption.
Therefore, by the complex nature quantification of big data, the built in problem that effective data is included, each number of combing
The internal connection existed between, effective parsing is carried out to complicated model system, reduces its complexity, can be in certain journey
The big data model of complexity is helped us understand on degree, its substantive characteristics is understood, and then preferably obtain abstract
Knowledge information.In the solution procedure of big data, data life period is had in mind, it is data-centered, it is complicated in above-mentioned data
On the basis of person's character quantification, the corresponding effective computation model of research rationally improves data computation schema, it is established that more specification
The data pattern of change, the correlation theory to big data is furtherd investigate, and constantly explores sufficient data, carries out hierarchical classification meter
Calculate.
The present invention proposes a kind of onion formula data organization method towards big data analysis, for specific subject goal,
Subject goal attribute is hierarchical, multi-zone supervision will be carried out after the data quantitative based on the subject goal, it is big data condition
The tissue of lower data is sorted out there is provided reference, efficiently solves the problems, such as the organization and administration of mass data.
The content of the invention
For defect of the prior art, it is an object of the invention to provide a kind of onion formula data towards big data analysis
Method for organizing and system.
According to the onion formula data organization method analyzed towards big data that the present invention is provided, comprise the following steps:
Step 1:The onion formula layered representation model of object-oriented object is set up, the destination object includes:It is objective to deposit
Individual, tissue, department;
Step 2:The onion formula layered representation model set up according to step 1 is that each level of destination object sets corresponding
Weights;
Step 3:Destination object is quantified;
Step 4:The significance level of data correspondence destination object is judged by calculating the onion value of data, it is described important
Degree is the layered position for being equal to the data in onion formula layered representation model;Onion value gets over Gao Ze relative to main body mesh
Target significance level is higher;
Step 5:Classification storage is carried out to data according to onion value, the data retrieval based on onion value is set up.Reduce data
Search space, accelerates the systematic searching speed of data, lifts data mining analysis efficiency.
Preferably, the onion formula layered representation model of the object-oriented object in the step 1 includes:N level, by
It is interior to being followed successively by outward:Core layer, inner nuclear layer and outer layer, the degree of correlation of the level then with destination object closer to internal layer are higher;
Wherein:The outer layer is again including several layerings;N is the natural number more than or equal to 2.
Preferably, the step 2 includes:The weights of onion formula layered representation model level from inside to outside are remembered respectively
It is λ1,λ2,...,λi,...,λn, the size of weights represents the significance level relative to destination object, closer to core level
Weights are bigger;DefinitionλiI-th layer of weights are represented, M is represented in hierarchical model for constant
The weights of each level and.
Preferably, the step 3 includes:By onion formula layered representation model, i-th layer carries out quantification treatment, uses αiRepresent
For the quantized value of i-th layer of the onion formula layered representation model of destination object, the n quantized value of level is respectively obtained, be designated as
α1,α2,...,αi,...,αn;And defineV is expressed as constant, represents that data are total relative to the quantization of destination object
Value.
Preferably, the computing formula of onion value N is as follows in the step 4:
Define Yi, 0≤i≤n, Y0> Y1> ... > Yi> ... > Yn,
In formula:YiRepresent i-th onion value border of level;
If Yi-1> N >=Yi, 1≤i≤n, then data belong to i-th layer of onion formula layered representation model.
Preferably, the step 5 includes:Data directory is set up with onion value N, data are divided according to the description of step 4
Class is stored, and the close data arranged in sequence of onion value is together.So as to then when index is set up, not only succinctly, and speed is fast,
Lift the treatment effeciency of data.
According to the onion formula data organization system analyzed towards big data that the present invention is provided, including such as lower module:
Model building module, the onion formula layered representation model for setting up object-oriented object, the destination object
Including:The individuality of objective reality, tissue, department;
Weight setting module, the onion formula layered representation model to setting up is that each level of destination object sets corresponding
Weights;
Quantization modules, for quantifying to destination object;
Onion value computing module, for judging the important of data correspondence destination object by calculating the onion value of data
Degree, the significance level is the layered position for being equal to the data in onion formula layered representation model;Onion value is higher
Significance level then relative to subject goal is higher;
Retrieval module, for data to be carried out with classification storage according to onion value, sets up the data retrieval based on onion value.
Compared with prior art, the present invention has following beneficial effect:
The onion formula data organization method and system towards big data analysis that the present invention is provided are solved under the conditions of big data
Mass data is difficult to the problem of tissue classification, using onion value as the standard of data classification storage, and sets up based on onion value
Data directory, lifts the retrieval rate of data, improves the efficiency of data mining analysis.
Brief description of the drawings
The detailed description made to non-limiting example with reference to the following drawings by reading, further feature of the invention,
Objects and advantages will become more apparent upon:
Fig. 1 is the onion formula layered representation model framework figure for subject goal;
Fig. 2 is the onion formula layered representation model framework figure for individual goal;
Fig. 3 is the onion formula layered representation model framework figure for social organization;
Fig. 4 is the onion formula layered representation model framework figure for mechanism of department.
Specific embodiment
With reference to specific embodiment, the present invention is described in detail.Following examples will be helpful to the technology of this area
Personnel further understand the present invention, but the invention is not limited in any way.It should be pointed out that to the ordinary skill of this area
For personnel, without departing from the inventive concept of the premise, some changes and improvements can also be made.These belong to the present invention
Protection domain.
It is magnanimity under the conditions of big data according to the onion formula data organization method analyzed towards big data that the present invention is provided
The tissue of data is sorted out there is provided reference, and then is to carry out depth data excavation, analysis for subject goal to lay the foundation.It is first
First, the onion formula layered representation model for subject goal is set up, objective attribute target attribute is divided into some levels, can marked from the inside to the outside
Note is core layer, inner nuclear layer and outer layer etc.;Each layer sets corresponding weights, represents important journey of the generic attribute to subject goal
Degree, weights are bigger, and importance is higher.Secondly, under the conditions of big data, pair mass data associated with subject goal is concluded
Analysis, defines the intension and data attribute of each level in above-mentioned onion formula hierarchical model, according to data generic and important
Each hierarchy attributes of data are carried out quantificational description by degree.Finally, level of the data for subject goal is defined, according to above-mentioned mould
Type level weights and data quantized value, calculate the onion value of data, and judge the affiliated level in data descriptive model.
Onion formula layered representation model based on subject goal, it is specific as follows:
(1) the onion formula layered representation model for destination object is set up, as shown in Figure 1.The model is by subject goal
Object is divided into n level, can be labeled as successively from the inside to the outside:Core layer, inner nuclear layer and outer layer etc., core data mainly influences
Or embody the speciality of subject goal, the more influences of kernel data or the intensions of subject goal are embodied, outlier data then major embodiment
Feature, extension of subject goal etc..
(2) it is that each layer of destination object sets corresponding weights, from inside to outside, respectively for the onion formula hierarchical model
It is designated as λ1,λ2,...,λi,…,λn, the size of weights represents significance level, and closer to core and internal layer, its significance level is higher,
Definition
(3) based on destination object, data are carried out with reductive analysis, each level in the above-mentioned onion formula hierarchical model of definition
Each hierarchy attributes of data, according to data generic and significance level, are carried out quantificational description by intension and data attribute.With being designated as
α1,α2,…αi,…αn, and defineV is constant, represents data in the quantization total value of destination object.
(4) the onion value of data is calculated.The computing formula of onion value N:According to onion value N, resulting number
According to the affiliated level in descriptive model, Y is definedi(0≤i≤n) is the onion value border of each level, then Y0> Y1> ... > Yi
> ... > Yn, criterion is as follows:If Yi-1> N > Yi(1≤i≤n), then data belong to i-th layer.
(5) data are stored according to onion value, and is set up the index based on onion value.
More detailed explanation is done to the technical scheme in the present invention with reference to specific embodiment.
Embodiment 1:Social Individual
Onion formula data organization method based on Social Individual under the conditions of big data.
1st, the onion formula layered representation model based on Social Individual, specific as follows:
(1) the onion formula layered representation model for Social Individual is set up, according to above-mentioned " a kind of based on subject goal
Onion formula layered representation model " is defined, and the model for the onion formula stratification of Social Individual can be divided into 3 layers by us
It is secondary, core layer, inner nuclear layer and outer layer can be labeled as successively from the inside to the outside, as shown in Figure 2.Core layer data be mainly influence or
Embody the individual character and speciality of Social Individual, such as social relationships, life experience;The more influences of kernel data embody personal think of
Think He Sanguan, such as learning experiences, industry occupation;Outlier data then major embodiment personal knowledge expertise, hobby, life
The information of the aspects such as custom, health status, specific descriptions refer to table 1.
(2) it is that each layer of target sets corresponding weights for the onion formula hierarchical model, from inside to outside, is designated as respectively
λ1、λ2、λ3, the size of weights represents significance level, it is assumed that in the model for Social Individual, make λ1=0.6, λ2=0.3,
λ3=0.1,
2nd, as the further scheme of the present embodiment:Based on the above-mentioned onion formula layered representation mould for Social Individual
Type, it is specific as follows we have proposed a kind of data organization method for Social Individual:
(1) based on Social Individual, the big data to being collected into is analyzed, using the level and data defined in table 1 as
Data are carried out data attribute extraction by metadata by the metadata of this definition, define the object-oriented each affiliated class of level of data
Each hierarchy attributes of data are carried out quantificational description, it is assumed that use α by other and significance level1、α2、α3Represent, α1Represent core in data
The quantitative values of layer, α2Represent the quantitative values of inner nuclear layer in data, α3Represent the quantitative values of outer layer in data, under this condition, order
(2) the onion value N of data is calculated, the value represents significance level of the data to target, and computational methods are as follows:
According to onion value N, data affiliated level in descriptive model is judged.Define Yi, 0≤i≤3, such as in the model,
Make Y0=60, Y1=46, Y2=27, Y3=10, criterion is as follows:If Yi-1> N >=Yi(1≤i≤3), then data belong to i-th
Layer.
It is assumed that the α of certain data1=80, α2=10, α3=10, then the onion value of the dataThen at the data
In ground floor (being core layer data);It is assumed that the α of certain data1=30, α2=40, α3=30, then the onion value of the dataThen the data are in the second layer (being kernel layer data).
Embodiment 2:Social organization
Onion formula data organization method based on social organization under the conditions of big data.
1st, the onion formula layered representation model based on social organization's target, specific as follows:
(1) the onion formula layered representation model for social organization is set up, according to a kind of above-mentioned " onion based on target
Formula layered representation model " is defined, and we can be by some levels of onion formula stratification model for being directed to social organization (for example
It is decomposed into 3 levels), core layer, inner nuclear layer and outer layer can be labeled as successively from the inside to the outside, as shown in Figure 3.For social group
Knit, property, objective of core layer data major embodiment social organization etc., such as affiliated industry, positioning objective;Kernel data is more
The routine matter of the tissue, such as organization activity, offer service are provided;Some of the outlier data then major embodiment tissue are external
Window information, such as notifies bulletin, contact method.Referring specifically to table 2.
(2) it is that each layer of target sets corresponding weights for the onion formula hierarchical model, from inside to outside, is designated as respectively
λ1、λ2、λ3, the size of weights represents significance level, it is assumed that λ1=0.5, λ2=0.4, λ3=0.1,
2nd, as the further scheme of the present embodiment:Based on the above-mentioned onion formula layered representation mould for social organization
Type, it is specific as follows we have proposed a kind of data organization method for social organization:
(1) based on social organization, the big data to being collected into is analyzed, using the level and data defined in table 2 as
Data are carried out data attribute extraction by metadata by the metadata of this definition, define the object-oriented each affiliated class of level of data
Each hierarchy attributes of data are carried out quantificational description by other and significance level.Use α1、α2、α3Represent, αiI-th layer of quantized value is represented,
Definition
(2) the onion value N of data is calculated, the value represents significance level of the data to target, and computational methods are as follows:
(3) according to onion value N, data affiliated level in descriptive model is judged.Define Yi, 0≤i≤3, Y0=50, Y1=
42, Y2=29, Y3=10, criterion is as follows:If Yi-1> N >=Yi(1≤i≤3), then data belong to i-th layer.
It is assumed that for the data of certain social organization, its α1=70, α2=20, α3=10, then the onion value of the data=44, then the data are in ground floor (being core layer data);It is assumed that the α of certain data1=20, α2=50, α3=30,
The then onion value of the dataThen the data are in the second layer (being kernel layer data).
Embodiment 3:Mechanism of department
Onion formula data organization method based on mechanism of department under the conditions of big data.
1st, the onion formula layered representation model based on department's institution aim, specific as follows:
(1) the onion formula layered representation model for government organs is set up, according to a kind of above-mentioned " onion based on target
Formula layered representation model " is defined, and we can be by some levels of onion formula stratification model for being directed to mechanism of department (for example
It is decomposed into 3 levels), core layer, inner nuclear layer and outer layer can be labeled as successively from the inside to the outside, as shown in Figure 4.Core layer is main
The functional task of mechanism of embodiment department, such as function, mechanism are set;The more routine works for embodying the department of kernel data, such as
Authority's service, people's livelihood issue etc.;Some window to the outside world information of outlier data then major embodiment department, such as geographical position, contact
Mode etc..Referring specifically to table 3.
(2) it is that each layer of target sets corresponding weights for the onion formula hierarchical model, from inside to outside, is designated as respectively
λ1、λ2、λ3, the size of weights represents significance level, it is assumed that λ1=0.7, λ2=0.2, λ3=0.1,
2nd, as the further scheme of the present embodiment:Based on the above-mentioned onion formula layered representation mould for mechanism of department
Type, it is specific as follows we have proposed a kind of data organization method for mechanism of department:
(1) based on mechanism of department, the big data to being collected into is analyzed, using the level and data defined in table 3 as
Data are carried out data attribute extraction by metadata by the metadata of this definition, define the object-oriented each affiliated class of level of data
Each hierarchy attributes of data are carried out quantificational description by other and significance level.Use α1、α2、α3Represent, αiI-th layer of quantized value is represented,
And define
(2) the onion value N of data is calculated, the value represents significance level of the data to target, and computational methods are as follows:
(3) according to onion value N, data affiliated level in descriptive model is judged.Define Yi, 0≤i≤3, Y0=70, Y1=
54, Y2=23, Y3=10, criterion is as follows:If Yi-1> N >=Yi(1≤i≤3), then data belong to i-th layer.
It is assumed that for the data of certain mechanism of department, its α1=80, α2=10, α3=10, then the onion value of the data Then the data are in ground floor (being core layer data);It is assumed that the α of certain data1=20, α2=60, α3=20,
The then onion value of the dataThen the data are in the second layer (being kernel layer data).
Specific embodiment of the invention is described above.It is to be appreciated that the invention is not limited in above-mentioned
Particular implementation, those skilled in the art can within the scope of the claims make a variety of changes or change, this not shadow
Sound substance of the invention.In the case where not conflicting, feature in embodiments herein and embodiment can any phase
Mutually combination.