CN106156125A

CN106156125A - A kind of virtual identity management system replication policy based on different pieces of information organizational form

Info

Publication number: CN106156125A
Application number: CN201510163158.5A
Authority: CN
Inventors: 傅翔; 朱伟辉; 贾焰; 韩伟红; 李树栋; 李爱平; 周斌; 杨树强; 黄九鸣; 全拥; 邓璐; 刘斐
Original assignee: National University of Defense Technology
Current assignee: National University of Defense Technology
Priority date: 2015-04-08
Filing date: 2015-04-08
Publication date: 2016-11-23
Anticipated expiration: 2035-04-08
Also published as: CN106156125B

Abstract

The present invention discloses a kind of virtual identity management system replication policy based on different pieces of information organizational form, mainly includes the division of virtual identity data, the distribution of the data tissue of copy 1, the data tissue of copy 2, copy and data query.The replication policy to Cassandra database for the present invention improves, copy amount is set to 2, after virtual identity data are divided by Csassandra database consistent hashing algorithm, the copy of data is reorganized, select the division methods being conducive to inquiry to repartition virtual identity data, then defer to identical data copy and in the rule of same physical machine, copy is not placed.Use different Method of Data Organizations to tackle different inquiry request by different copies, reduce query time, reduce net cost, maximum system efficiency, it is adaptable to the data trnascription Placement Problems of virtual identity management system.

Description

A kind of virtual identity management system replication policy based on different pieces of information organizational form

Technical field

The invention belongs to Internet technical field, be specifically related to a kind of virtual body based on different pieces of information organizational form Part management system replication policy.

Background technology

EID (electronic IDentity) full name is citizen's network electronic identity, and elD is remotely to demonstrate,prove on network The authoritative electronic information file of bright individual's true identity.When eID remotely uses on network, use base Complete the checking of true identity in public security demographic database and elD service platform, personal identification can realized Authenticity and protection citizenship privacy while validation, there is authority, security, can chase after Trace back, facilitate the features such as easy-to-use.In internet, between the virtual identity under user and various application, platform There is the relation of one-to-many, and in the network environment based on eID, these corresponding relations above-mentioned all can base In this unique mark of eID, and virtual identity data refer to is exactly that eID user has under different application All data.

Consistent hashing algorithm in document [2] is a kind of special hash algorithm, when adjustment hash table size When, average only K/n data need to be remapped, and wherein K is the size of data volume, and n is buffering Size.Relatively, in other hash tables of great majority, the change of buffering array essentially results in wherein institute Data are had to be required for remapping.

Distributed consensus hash algorithm in document [3] is exactly to increase on the basis of consistent hashing algorithm The consideration of dummy node, its purpose is exactly the result of hash to be distributed to as far as possible fifty-fifty all of buffering In, so so that all of cushion space is all obtained by.

Document [3] have employed Cassandra database to store virtual identity data, by setting up outside index Improve search efficiency, use the replication policy that Cassandra carries to realize the backup of data, Appropriate application The technology of document [1] and document [2], the higher efficiency that is stored with to magnanimity virtual identity data.But the party Method is to improve search efficiency by setting up substantial amounts of outside index, and required memory space is relatively big, method comparison Complicated；In copy problem, continued to use the replication policy that Cassandra database carries, by copy only when Do storage redundancy to treat, the not effect of Appropriate application copy.

[1]JiaKui Zhao,PingFei Zhu,LiangHuai,Yang.Effective Data Localization Using Consistent Hashing in Cloud Time-Series Databases[J].Applied Mechanics and Materials,2013,347:2246-2251.

[2] uniformity Hash improves [EB/OL]. http://blog.163.com/lin_guoqian@126/blog/static/1693687432012151010409/.

[3] Deng Lu, the storage management key technology research of magnanimity virtual identity data and realization, 2010.

Content of the invention

For problem above, the method selection Cassandra database of bibliography of the present invention [3] stores virtual Identity database, provides a kind of virtual identity management system replication policy based on different pieces of information organizational form, It is applicable to the data trnascription Placement Problems of virtual identity management system.

Technical scheme is as follows:

A kind of virtual identity management system replication policy based on different pieces of information organizational form, mainly includes following Step:

(1) virtual identity data divide: apply the thought by row storage of Cassandra column database In virtual identity data, horizontal division and vertical division are carried out to virtual identity data, horizontal direction according to EID divides, and vertical direction divides according to application program；

(2) the data tissue of copy 1: all data objects of same user are stored together, storage After all data of a complete user, the more next user of storage, meanwhile, in the storage order of user, By centrally stored for the user in same area and be ranked up by the sequencing of hour of log-on；

(3) the data tissue of copy 2；

(4) copy distribution；

(5) data query: when client data to be inquired about, first analyzes query statement, then basis Analysis result selects instruction to be sent to copy 1 or copy 2.

Further, further comprising the steps of in described step (3):

1) by application platform, data object is divided；

2) inside each application platform, divide according to user location；

3) a regional user in application platform, sorts according to the mode of data trnascription 1；

4) if the user while unregistered under platform, then directly skip.

Further, further comprising the steps of in described step (4):

1) being stored separately copy 1 and copy 2, copy 1 is stored on the odd node of cluster, copy 2 It is stored on the even-numbered nodes of cluster；

2) odd node is adjacent with even-numbered nodes, and node 1 is adjacent with node 2n；

3) set two kinds of different hash functions, make the hash value of the data object calculating respectively by pair The Method of Data Organization of basis 1 and copy 2 is ranked up, then with consistent hashing algorithm by data and node Map.

Further, in described step 3) described in hash function be that object map is become another One object, can be first by object order, and the then sortord further according to object arranges hash function.

Further, in described step 3) in carry out consistent hashing algorithm calculate data object hash During value, the base unit of data object is set as all data under certain application for certain user, and one Individual eID account and Apply Names collectively constitute the major key of a data object.

Further, further comprising the steps of in described step (5):

1) query statement is analyzed, it may be judged whether for the inquiry based on user；

2) if with the inquiry based on user, then send an instruction to copy 1, and perform inquiry operation；

3) if not with the inquiry based on user, then analyzing query statement, it may be judged whether for application platform being Main inquiry；

4) if with the inquiry based on application platform, then send an instruction to copy 2, and perform inquiry operation；

5) if not with the inquiry based on application platform, then selecting copy according to current system load, and perform Inquiry operation.

The invention has the beneficial effects as follows: traditional replication policy requires that the copy of identical data can not be put into same In platform physical machine, but the placement between the copy of different pieces of information is not required.The present invention is mainly to Cassandra The replication policy of database improves, and copy amount is set to 2, when Csassandra database uniformity After virtual identity data are divided by hash algorithm, reorganizing the copy of data, selection is conducive to Virtual identity data are repartitioned by the division methods of inquiry, then defer to identical data copy not together Copy is placed by the rule of one physical machine.The organizational form of data trnascription 1 is conducive to eID user Data region-by-region manages, and when needs read data, according to the feature wanting request data, selection to be entered The copy of row operation, thus improves data access efficiency so that it is standby and enter that data trnascription is no longer intended merely to calamity The redundant storage of row, but use different Method of Data Organizations to tackle different inquiries by different copies Request, reduces query time, reduces net cost, maximum system efficiency, it is adaptable to virtual identity The data trnascription Placement Problems of management system.

Brief description

Fig. 1 is the data query flow chart of the present invention.

Fig. 2 is the copy distribution map of the present invention.

Fig. 3 is the virtual identity information figure in the embodiment of the present invention 1.

Fig. 4 is the organizational form figure of the copy 1 in the embodiment of the present invention 1.

Fig. 5 is the virtual identity data profile in the embodiment of the present invention 1.

Fig. 6 is the organizational form figure of the copy 2 in the embodiment of the present invention 1.

Detailed description of the invention

For the ease of understanding the present invention, below in conjunction with Figure of description and embodiment, the present invention is made furtherly Bright.

The present invention provides a kind of virtual identity management system replication policy based on different pieces of information organizational form, main Comprise the following steps:

(1) virtual identity data divide: apply the data model of Cassandra column database at virtual body Number, according to upper, carries out horizontal division and vertical division to virtual identity data, and horizontal direction is carried out according to eID Dividing, vertical direction divides according to application program；

(2) the data tissue of copy 1: all data objects of same user are stored together, storage After all data of a complete user, the more next user of storage, meanwhile, in the storage order of user, By centrally stored for the user in same area and be ranked up by fixing order；

(3) the data tissue of copy 2；

(4) copy distribution；

The development environment of the present invention: the X86 platform of (SuSE) Linux OS, JDK1.7, use java language Writing, data server needs to install the database software of Cassandra1.0 or more highest version, carries for system Support for data.

The running environment of the present invention: server end runs on the X86 platform being provided with (SuSE) Linux OS, Multiple machine nodes of JDK1.7 or more version, client is customary personal computer.

The below exemplary embodiment for the present invention:

Embodiment 1:

(1) virtual identity data divide: in domain space, user according to the demand of oneself in different application Register account number on platform, these application platforms include ecommerce, social networks, online game etc..Virtual These information unification are got up by identity management system by eID, and a user has unique eID mark, He has again different virtual account under different application platforms, and these data portion structures differ, and size is not With, and data volume is huge.Apply the data model of Cassandra column database in virtual identity data On, as shown in table 1:

Table 1 virtual identity data model

The present invention, according to model stored above, carries out horizontal division and vertical division, water to virtual identity data Square divide to according to eID, vertical direction divides according to application program, is carrying out uniformity When hash algorithm calculates data object hash value, data object unit is set as, and certain user applies at certain Under all data, and an eID account and Apply Names collectively constitute the " main of a data object Key ".The present invention is not concerned with the Method of Data Organization within a data object, but mainly solves different secondary Organizational form between notebook data object.

(2) Method of Data Organization of copy 1: in actual data request operation, often asks certain Virtual identity information under all application platforms for the user, for example, the eID data of user Zhang San occur in that different Often, there is stolen possibility, be now accomplished by checking all virtual identity data of Zhang San.Assume that user opens Three have applied for altogether including 8 application platforms such as social networks, e-commerce website, online game, then His all virtual identity informations in domain space are as it is shown on figure 3, one of them rectangle frame represents a number According to object.

If be distributed in these data objects on back end in a random fashion, then complete above-mentioned A lot of different nodes may be crossed over when inquiry, not only affect inquiry velocity, also can take network Bandwidth, therefore, based on the technology of the present invention, the organizational form of data trnascription 1 is by all numbers of same user It is stored together according to object, after having stored all data of a user, the more next user of storage, with this Analogize.User storage sequentially, it is considered to by centrally stored for the user in same area, unified area User be ranked up according to a fixing order.So be conducive to eID user data region-by-region pipe Reason.The organizational form of final copy 1 is as shown in Figure 4.

(3) organizational form of copy 2: in actual data request operation, it will usually some is applied The data of platform operate, for example, check the user situation in each area for the Taobao.Now need to obtain All Virtual User data of Taobao's application, the virtual identity data of Taobao are distributed as shown in Figure 5.

If be randomly dispersed on back end or according to pair all of data object according to consistent hashing This mode of 1 is distributed, and when carrying out this operation, can cross over a lot of node equally, take band Width, the efficiency of relatively low cluster, data trnascription 2 is as just the redundancy of copy 1 simultaneously, not Its effect is performed to maximum.Based on considerations above, data trnascription 2 is entered by the present invention as follows Row tissue: the 1st, first by application platform, data object is divided；2nd, inside each application platform, press Divide according to user location；3rd, a regional user in application platform, according to data trnascription 1 Mode sorts；If 4 certain user are unregistered under certain platform, then directly skip, as shown in Figure 6.

(4) copy distribution: copy 1 and copy 2 are stored separately by the present invention, i.e. copy 1 is stored in cluster Wherein on half node, copy 2 is stored on second half node of whole cluster, and its distribution is as shown in Figure 2. Wherein odd node is adjacent with even-numbered nodes, and node 1 is adjacent with node 2n.Select suitable hash function, The hash value making the acute object of total is ranked up according to copy 1 and copy 2 organizational form, then uses uniformity Data and node are mapped by hash algorithm.So far, all of data trnascription is all according to designation method distribution In cluster.

Described hash function is that an object map is become another object, can be first by object order, so After further according to the sortord of object, hash function is set.

(5) data query: when client data to be inquired about, first analyzes query statement, then basis Analysis result selects instruction to be sent to copy 1 or copy 2, and its flow process is as shown in Figure 1.

Traditional replication policy requires that the copy of identical data can not be put in same physical machine, but to difference Placement between the copy of data does not require.The replication policy of Cassandra database is mainly entered by the present invention Row improves, and copy amount is set to 2, when Csassandra database consistent hashing algorithm is to virtual body After number is according to dividing, the copy of data is reorganized, select the division methods pair being conducive to inquiry Virtual identity data are repartitioned, and then defer to identical data copy not in the rule of same physical machine Copy is placed.The organizational form of data trnascription 1 is conducive to managing eID user data region-by-region, When needs read data, according to the feature wanting request data, select the copy to operate, from And improving data access efficiency so that data trnascription is no longer intended merely to that calamity is standby and the redundant storage that carries out, and It is to use different Method of Data Organizations to tackle different inquiry request by different copies, when reducing inquiry Between, reduce net cost, maximum system efficiency, it is adaptable to the data of virtual identity management system are secondary This Placement Problems.

It is above having carried out exemplary description to the present invention, it is clear that the realization of the present invention is not by aforesaid way Restriction, as long as have employed the various improvement that technical solution of the present invention is carried out or not improved by the present invention's Design and technical scheme directly apply to other occasions, all within the scope of the present invention.

Claims

1. the virtual identity management system replication policy based on different pieces of information organizational form, its feature exists In comprising the following steps:

Step one: virtual identity data divide: Cassandra column database applied virtual by row storage On identity data, carrying out horizontal division and vertical division to virtual identity data, horizontal direction is entered according to eID Row divides, and vertical direction divides according to application program；

Step 2: the data tissue of copy 1: all data objects of same user are stored together, After having stored all data of a user, the more next user of storage, meanwhile, in the storage order of user On, by centrally stored for the user in same area and be ranked up by the sequencing of hour of log-on；

Step 3: the data tissue of copy 2；

Step 4: copy is distributed；

Step 5: data query: when client data to be inquired about, first analyzes query statement, then Instruction is selected to be sent to copy 1 or copy 2 according to analysis result.

2. a kind of virtual identity management system based on different pieces of information organizational form according to claim 1 Replication policy, it is characterised in that further comprising the steps of in described step 3:

Step A: data object is divided by application platform；

Step B: inside each application platform, divides according to user location；

Step C: a regional user in application platform, sorts according to the mode of data trnascription 1；

Step D: if the user while unregistered under platform, then directly skip.

3. a kind of virtual identity management system based on different pieces of information organizational form according to claim 1 Replication policy, it is characterised in that further comprising the steps of in described step 4:

Step a: being stored separately copy 1 and copy 2, copy 1 is stored on the odd node of cluster, secondary Originally it 2 is stored on the even-numbered nodes of cluster；

Step b: odd node is adjacent with even-numbered nodes, node 1 is adjacent with node 2n；

Step c: set two different hash functions, makes the hash value of data object respectively by copy 1 He The Method of Data Organization of copy 2 is ranked up, and then reflects data and node with consistent hashing algorithm Penetrate.

4. a kind of virtual identity management system based on different pieces of information organizational form according to claim 1 Replication policy, it is characterised in that further comprising the steps of in described step 5:

Step 1: analyze query statement, it may be judged whether for the inquiry based on user；

Step 2: if with the inquiry based on user, then send an instruction to copy 1, and perform inquiry operation；

Step 3: if not with the inquiry based on user, then analyzing query statement, it may be judged whether for application Inquiry based on platform；

Step 4: if with the inquiry based on application platform, then send an instruction to copy 2, and perform inquiry Operation；

Step 5: if not with the inquiry based on application platform, then selecting copy according to current system load, And perform inquiry operation.

5. a kind of virtual identity management system based on different pieces of information organizational form according to claim 3 Replication policy, it is characterised in that carry out consistent hashing algorithm in described step c and calculate data object hash During value, the base unit of data object is set as all data under certain application for certain user, and one Individual eID account and Apply Names collectively constitute the major key of a data object.