CN106156125B

CN106156125B - A method of the virtual identity management system copy based on different data organizational form

Info

Publication number: CN106156125B
Application number: CN201510163158.5A
Authority: CN
Inventors: 傅翔; 朱伟辉; 贾焰; 韩伟红; 李树栋; 李爱平; 周斌; 杨树强; 黄九鸣; 全拥; 邓璐; 刘斐
Original assignee: National University of Defense Technology
Current assignee: National University of Defense Technology
Priority date: 2015-04-08
Filing date: 2015-04-08
Publication date: 2019-08-23
Anticipated expiration: 2035-04-08
Also published as: CN106156125A

Abstract

A kind of method that the present invention discloses virtual identity management system copy based on different data organizational form mainly includes the division of virtual identity data, the distribution of the data organization of copy 1, the data organization of copy 2, copy and data query.The present invention improves the replication policy of Cassandra database, copy amount is set as 2, after Csassandra database divides virtual identity data with consistent hashing algorithm, the copy of data is reorganized, it selects the division methods for being conducive to inquiry to repartition virtual identity data, then defers to identical data copy and copy is not placed in the rule of same physical machine.Different Method of Data Organization is used by different copies to cope with different inquiry requests, reduces query time, reduces net cost, maximum system efficiency, and the data copy suitable for virtual identity management system places problem.

Description

A method of the virtual identity management system copy based on different data organizational form

Technical field

The invention belongs to Internet technical fields, and in particular to a kind of virtual identity management based on different data organizational form The method of system copy.

Background technique

EID (electronic IDentity) full name is the identity of citizen's network electronic, and elD is remotely demonstrate,proved on network The authoritative electronic information file of bright individual's true identity.When eID is long-range in use, using public security population is based on network Database and elD service platform complete the verifying of true identity, can be in the authenticity and validation for realizing personal identification While protect citizenship privacy, have the characteristics that authority, safety, can be traced, it is easy-to-use.In internet, use There is one-to-many relationships between virtual identity under family and various applications, platform, and in the network environment based on eID, These above-mentioned corresponding relationships can all be based on this unique identification of eID, and what virtual identity data referred to is exactly eID user in difference Using the lower all data having.

Consistent hashing algorithm in document [2] is a kind of special hash algorithm, when adjusting hash table size, is put down Only have K/n data needs to be remapped, wherein K is the size of data volume, and n is the size of buffering.Relatively, big In most other hash tables, the variation for buffering array essentially results in wherein all data and requires to remap.

Distributed consensus hash algorithm in document [3] is exactly to increase on the basis of consistent hashing algorithm The considerations of dummy node, purpose are exactly that the result of hash is fifty-fifty distributed in all bufferings as far as possible, in this way may be used So that all cushion spaces are all utilized.

Use Cassandra database in document [3] and store virtual identity data, by establish external index come Improve search efficiency, the backups of data realized using the replication policy that Cassandra is carried, be rationally utilized document [1] and The technology of document [2] is stored with higher efficiency to magnanimity virtual identity data.But this method is a large amount of by establishing For outside index to improve search efficiency, required memory space is larger, and algorithm comparison is complicated；In copy problem, continue to use Copy is regarded storage redundancy only to treat by the included replication policy of Cassandra database, and there is no rationally using secondary This effect.

[1]JiaKui Zhao,PingFei Zhu,LiangHuai,Yang.Effective Data Localization Using Consistent Hashing in Cloud Time-Series Databases[J].Applied Mechanics and Materials,2013,347:2246-2251.

[2] consistency Hash improves [EB/OL] http://blog.163.com/lin_guoqian@126/blog/ static/1693687432012151010409/.

[3] Deng Lu, the storage management key technology research of magnanimity virtual identity data and realization, 2010.

Summary of the invention

In view of the above problems, the method selection Cassandra database of bibliography [3] of the present invention stores virtual body Part database, provides a kind of method of virtual identity management system copy based on different data organizational form, is suitable for virtual The data copy of identity management system places problem.

Technical scheme is as follows:

A method of the virtual identity management system copy based on different data organizational form mainly includes following step It is rapid:

(1) virtual identity data divide: the thought by column storage of Cassandra column database is applied in virtual identity In data, horizontal division is carried out to virtual identity data and vertical division, horizontal direction are divided according to eID, vertical direction It is divided according to application program；

(2) data organization of copy 1: all data objects of the same user are stored together, and have stored a use After all data at family, then next user is stored, meanwhile, in the storage order of user, the user in the same area is collected Middle storage is simultaneously ranked up by the sequencing of registion time；

(3) data organization of copy 2；

(4) copy is distributed；

(5) data query: when client will inquire data, query statement is first analyzed, then based on the analysis results Selection instruction is sent to copy 1 or copy 2.

Further, further comprising the steps of in the step (3):

1) data object is divided by application platform；

2) it inside each application platform, is divided according to user location；

3) a regional user in application platform, sorts in the way of data copy 1；

4) if user is unregistered under platform, directly skip.

Further, further comprising the steps of in the step (4):

1) copy 1 and copy 2 are stored separately, copy 1 is stored on the odd node of cluster, and copy 2 is stored in cluster Even-numbered nodes on；

2) odd node is adjacent with even-numbered nodes, and node 1 is adjacent with node 2n；

3) two different hash functions are set, make the hash value for the data object being calculated respectively by copy 1 and pair This 2 Method of Data Organization is ranked up, and is then mapped data and node with consistent hashing algorithm.

Further, the hash function described in the step 3) is that an object is mapped to another is right As that, then further according to the sortord of object, hash function can be arranged first by object order.

It further, will when carrying out consistent hashing algorithm calculating data object hash value in the step 3) The basic unit of data object is set as all data and an eID account and one of some user under some application and answers The major key of a data object is collectively constituted with title.

Further, further comprising the steps of in the step (5):

1) query statement is analyzed, the inquiry based on user is judged whether it is；

If 2) inquiry based on user then sends an instruction to copy 1, and executes inquiry operation；

3) if not inquiry based on user, then analyze query statement, judge whether it is looking into based on application platform It askes；

If 4) inquiry based on application platform then sends an instruction to copy 2, and executes inquiry operation；

5) if not inquiry based on application platform, then select copy according to current system load, and inquiry behaviour is executed Make.

The beneficial effects of the present invention are: traditional replication policy requires the copy of identical data that cannot be put into same physics On machine, but the placement between the copy of different data is not required.The present invention is mainly to the copy of Cassandra database Strategy improves, and copy amount is set as 2, when Csassandra database with consistent hashing algorithm to virtual identity number After being divided, the copies of data is reorganized, select be conducive to the division methods of inquiry to virtual identity data into Row is repartitioned, and is then deferred to identical data copy and is not placed in the rule of same physical machine to copy.Data copy 1 organizational form is conducive to eID user data region-by-region management, when needing to read data, according to wanting request data The characteristics of, the copy to be operated is selected, so that data access efficiency is improved, so that data copy is no longer intended merely to calamity Standby and progress redundant storage, but use different Method of Data Organization to ask to cope with different inquiries by different copies It asks, reduces query time, reduce net cost, maximum system efficiency, the data suitable for virtual identity management system Replica placement problem.

Detailed description of the invention

Fig. 1 is data query flow chart of the invention.

Fig. 2 is copy distribution map of the invention.

Fig. 3 is the virtual identity information figure in the embodiment of the present invention 1.

Fig. 4 is the organizational form figure of the copy 1 in the embodiment of the present invention 1.

Fig. 5 is the virtual identity data profile in the embodiment of the present invention 1.

Fig. 6 is the organizational form figure of the copy 2 in the embodiment of the present invention 1.

Specific embodiment

To facilitate the understanding of the present invention, below in conjunction with Figure of description and embodiment, the invention will be further described.

The present invention provides a kind of method of virtual identity management system copy based on different data organizational form, main to wrap Include following steps:

(1) virtual identity data divide: the data model of Cassandra column database is applied in virtual identity data On, horizontal division and vertical division, horizontal direction are carried out to virtual identity data and divided according to eID, vertical direction according to Application program is divided；

(2) data organization of copy 1: all data objects of the same user are stored together, and have stored a use After all data at family, then next user is stored, meanwhile, in the storage order of user, the user in the same area is collected Middle storage is simultaneously ranked up by fixed sequence；

(3) data organization of copy 2；

(4) copy is distributed；

Exploitation environment of the invention: the X86 platform of (SuSE) Linux OS, JDK1.7 are write, data using java language Server needs to install the database software of Cassandra1.0 or more highest version, provides data for system and supports.

Running environment of the invention: server end runs on the X86 platform for being equipped with (SuSE) Linux OS, JDK1.7 or Multiple machine nodes of the above version, client is customary personal computer.

The following are exemplary embodiments of the invention:

Embodiment 1:

(1) virtual identity data divide: in domain space, user is according to their own needs in different application platforms Register account number, these application platforms include e-commerce, social networks, online game etc..Virtual identity management system passes through EID gets up these information unifications, and a user possesses unique eID mark, he possesses not again under different application platforms Same virtual account, these data portion structures are different, of different sizes, and data volume is huge.By Cassandra column database Data model apply in virtual identity data, as shown in table 1:

1 virtual identity data model of table

The present invention carries out horizontal division and vertical division, horizontal direction according to model stored above, to virtual identity data It is divided according to eID, is divided in vertical direction according to application program, calculate data carrying out consistent hashing algorithm When object hash value, data object unit is set as all data and an eID account of some user under some application Number and an Apply Names collectively constitute " major key " of a data object.The present invention is not concerned with inside a data object Method of Data Organization, but mainly solve the organizational form between different copy data objects.

(2) Method of Data Organization of copy 1: in actual data request operation, often request some user in institute There is the virtual identity information under application platform, for example, there is exception in the eID data of user Zhang San, there are stolen possibility, Just need to check all virtual identity data of Zhang San at this time.Assuming that user Zhang San has applied for altogether including social networks, electronics Business web site, 8 application platforms such as online game, then his all virtual identity informations such as Fig. 3 institute in domain space Show, one of rectangle frame indicates a data object.

If these data objects are distributed on back end in a random fashion, above-mentioned inquiry is completed When may cross over very multiple and different nodes, not only influence inquiry velocity, can also occupy network bandwidth, therefore, be based on All data objects of the same user are stored together by the organizational form of the technology of the present invention, data copy 1, have stored one After all data of a user, then next user is stored, and so on.User storage sequentially, consideration will be same The user in a area is centrally stored, and the user in unified area is ranked up according to a fixed sequence.Be conducive in this way by EID user data region-by-region management.The organizational form of final copy 1 is as shown in Figure 4.

(3) organizational form of copy 2: in actual data request operation, it will usually to the number of some application platform According to being operated, such as check Taobao in the user situation in each area.Need to obtain all void of Taobao's application at this time Quasi- user data, the virtual identity data distribution of Taobao are as shown in Figure 5.

If all data objects are randomly dispersed on back end according to consistent hashing or according to copy 1 Mode be distributed, when carrying out this operation, can equally cross over many nodes, occupied bandwidth, lower cluster Efficiency, while data copy 2, as just the redundancy of copy 1, there is no its effect is performed to maximum.Based on Data copy 2 is carried out as follows tissue by upper consideration, the present invention: 1, first being drawn by application platform to data object Point；2, it inside each application platform, is divided according to user location；3, a regional user in application platform, It sorts in the way of data copy 1；If 4, certain user is unregistered under certain platform, directly skip, such as Fig. 6 institute Show.

(4) copy is distributed: copy 1 and copy 2 are stored separately by the present invention, i.e., copy 1 is stored in a cluster wherein half-section On point, copy 2 is stored on the other half node of entire cluster, and distribution is as shown in Figure 2.Wherein odd node and even number section Point is adjacent, and node 1 is adjacent with node 2n.Suitable hash function is selected, makes the hash value of total play object according to 1 He of copy 2 organizational form of copy is ranked up, and is then mapped data and node with consistent hashing algorithm.So far, all numbers It has been distributed in cluster according to copy all in accordance with designation method.

The hash function is that an object is mapped to another object, can be first by object order, then root again According to the sortord of object, hash function is set.

(5) data query: when client will inquire data, query statement is first analyzed, then based on the analysis results Selection instruction is sent to copy 1 or copy 2, and process is as shown in Figure 1.

Traditional replication policy requires the copy of identical data that cannot be put into same physical machine, but to different data Copy between placement do not require.The present invention mainly improves the replication policy of Cassandra database, copy Quantity is set as 2, will after Csassandra database divides virtual identity data with consistent hashing algorithm The copy of data reorganizes, and selects the division methods for being conducive to inquiry to repartition virtual identity data, then abides by Copy is not placed in the rule of same physical machine from identical data copy.The organizational form of data copy 1 is conducive to By eID user data region-by-region management, when needing to read data, the characteristics of according to request data is wanted, selection will be carried out The copy of operation, to improve data access efficiency so that data copy is no longer intended merely to that calamity is standby and the redundancy that carries out is deposited Storage, but different Method of Data Organization is used by different copies to cope with different inquiry requests, query time is reduced, Net cost, maximum system efficiency are reduced, the data copy suitable for virtual identity management system places problem.

It is that an exemplary description of the invention above, it is clear that of the invention realizes not by the limit of aforesaid way System, as long as using the various improvement that technical solution of the present invention carries out, or not improved by conception and technical scheme of the invention Other occasions are directly applied to, are within the scope of the invention.

Claims

1. a kind of method of the virtual identity management system copy based on different data organizational form, which is characterized in that including with Lower step:

Step 1: virtual identity data divide: by applying by column storage in virtual identity data for Cassandra column database On, horizontal division and vertical division, horizontal direction are carried out to virtual identity data and divided according to eID, vertical direction according to Application program is divided；

Step 2: the data organization of copy 1: all data objects of the same user are stored together, and have stored a use After all data at family, then next user is stored, meanwhile, in the storage order of user, the user in the same area is collected Middle storage is simultaneously ranked up by the sequencing of registion time；

Step 3: the data organization of copy 2；

Step 4: copy distribution；

Step 5: data query: when client will inquire data, query statement is first analyzed, then based on the analysis results Selection instruction is sent to copy 1 or copy 2；

It is further comprising the steps of in the step three:

Step A: data object is divided by application platform；

Step B: it inside each application platform, is divided according to user location；

Step C: a regional user in application platform sorts in the way of data copy 1；

Step D: it if user is unregistered under platform, directly skips.

2. a kind of side of virtual identity management system copy based on different data organizational form according to claim 1 Method, which is characterized in that further comprising the steps of in the step four:

Step a: copy 1 and copy 2 are stored separately, and copy 1 is stored on the odd node of cluster, and copy 2 is stored in cluster Even-numbered nodes on；

Step b: odd node is adjacent with even-numbered nodes, and node 1 is adjacent with node 2n；

Step c: two different hash functions of setting make the hash value of data object respectively by the data group of copy 1 and copy 2 The mode of knitting is ranked up, and is then mapped data and node with consistent hashing algorithm.

3. a kind of side of virtual identity management system copy based on different data organizational form according to claim 2 Method, which is characterized in that when carrying out consistent hashing algorithm calculating data object hash value in the step c, by data object Basic unit to be set as all data and an eID account and an Apply Names of some user under some application total With the major key of one data object of composition.

4. a kind of side of virtual identity management system copy based on different data organizational form according to claim 1 Method, which is characterized in that further comprising the steps of in the step five:

Step 1: analysis query statement judges whether it is the inquiry based on user；

Step 2: if the inquiry based on user, then sending an instruction to copy 1, and execute inquiry operation；

Step 3: if not the inquiry based on user, then analyze query statement, judging whether it is looking into based on application platform It askes；

Step 4: if the inquiry based on application platform, then sending an instruction to copy 2, and execute inquiry operation；

Step 5: if not the inquiry based on application platform, then select copy according to current system load, and executing inquiry behaviour Make.