CN111914044B

CN111914044B - Verification method and system for cloud data geographic position

Info

Publication number: CN111914044B
Application number: CN202010582761.8A
Authority: CN
Inventors: 刘丽敏; 张阳; 贾世杰; 陈天宇; 林璟锵; 贾东征; 钱文飞; 赵欣怡; 吕娜
Original assignee: Institute of Information Engineering of CAS
Current assignee: Institute of Information Engineering of CAS
Priority date: 2020-06-11
Filing date: 2020-06-23
Publication date: 2023-06-30
Anticipated expiration: 2040-06-23
Also published as: CN111914044A

Abstract

The invention provides a verification method and a verification system for cloud data geographic positions, comprising the following steps: the data owner selects a third party audit, uploads the homomorphic tag sets of the storage file and the file block to the cloud service provider, and sends a request for acquiring the data geographic position to the third party audit; the third party audit splits the random challenge into sub-challenges and sends the sub-challenges to each active landmark; each active landmark forwards the sub-challenges to a cloud service provider, and records a sending timestamp; broadcasting each sub-challenge evidence to each active landmark by the cloud service provider, and transmitting the random challenge evidence to a third party audit; each active landmark returns sub-challenge evidence, a sending time stamp and a receiving time stamp to a third party audit; and the third party audits and calculates the distance from each active landmark to the cloud service provider and the geographic position of cloud data. The method solves the problem of overlarge response time delay caused by large calculation time delay in the cloud data geographic position determining scheme, and improves efficiency and accuracy.

Description

Verification method and system for cloud data geographic position

Technical Field

The invention belongs to the technical field of cloud service security attributes, and particularly relates to a verification method and system for cloud data geographic positions.

Background

The cloud service has the excellent advantages of high convenience, high availability and the like, can provide high-efficiency storage service, and accelerates the revolution and social progress of information technology. With the popularization of cloud services, more and more individuals or organizations migrate data from local to cloud. However, cloud service providers (Cloud Service Provider, CSP) are not fully trusted entities, and present multiple security risks while providing efficient services to users. Among them, cloud data geographic locations have attracted attention in recent years. The geographic location of cloud data typically relates to various aspects of data availability, data security, legal policies, and the like. For example, white paper in the health insurance portability and liability act on data centers indicates that "knowing the geographic location of data is a primary consideration, once the data is stored overseas, it will be in a very dangerous place.

Generally, there are two situations in which a data holder migrates data to the cloud, and the situation is related to a geographic location attribute. The first case is that when a data holder rents a cloud service, the cloud data geographic location is agreed in a service level agreement (Service Level Agreement, SLA) to constrain CSP's behavior. However, in practical deployment, in order to pursue the greatest economic benefit, the economic and reasonable CSP may violate the regulations of SLA, and migrate cloud data to remote servers of data centers in other countries or regions with lower cost. Furthermore, while CSP claims to enhance the security of cloud data through off-site disaster recovery backup techniques, CSP may not be deployed accordingly in practice for economic cost savings. In this case, cloud data may be permanently lost due to unexpected downtime of the server or natural disasters. In the second case, the data holder and CSP may not agree on the cloud data geographic location. For example, when a mobile terminal user uses a cloud storage service (e.g., iboud, china is a terminal cloud service), the user does not make a contract with respect to the geographic location of the data and CSP. The geographical location of the cloud data is therefore not clear to the user. Based on the above two situations, providing a general scheme for determining the geographic position of cloud data for a data holder is a problem to be solved.

At present, the cloud data geographic position proving technology comprises two types, wherein the first is a cloud data geographic position verifying technology which is only suitable for the situation that a data holder and a cloud service provider agree on geographic positions in an SLA in advance and has no universality; the second technique is to determine the geographic location of cloud data, whether or not the data holder and the cloud service provider agree on the geographic location in the SLA. However, the private cloud data geographic position determining technology needs to transmit a large amount of data in the process of determining the cloud data geographic position, so that huge communication overhead is brought to a data holder, the burden of the data holder is increased, and the scheme efficiency is low. However, the existing technology for determining the geographic position of cloud data can avoid huge communication overhead, but the response time delay for determining the position can be inevitably increased due to overlong time delay of cloud computing evidence, and finally, larger errors can be generated when the position is determined through the response time delay, so that the positioning accuracy is reduced.

After obtaining the response delay, a time delay-based or machine learning-based method is used to determine the geographic location of the cloud data. Latency-based schemes are very sensitive to response latency, so that a small change in response latency can have a significant impact on the estimated distance, resulting in erroneous geolocation. In addition, the scheme for determining the geographic position of cloud data based on the naive Bayesian algorithm can combine various features including response time delay, and has practicability compared with a method based on the time delay, but the naive Bayesian algorithm requires the features to be mutually independent, but in practice, the features are often mutually related. In addition, the naive Bayes algorithm introduces a large amount of calculation to calculate the prior probability, and the calculation cost is large.

For convenience of description of the present application, some related arts related to the present application will be described below.

Bilinear mapping:

set G ₁ ，G ₂ ，G _T Is a cyclic multiplication group with the same prime order p, g ₁ ，g ₂ G is respectively ₁ ，G ₂ Is a generator of (1). There will be one bilinear map e if the following condition is met: g ₁ ×G ₂ →G _T : 1) Bilinear: for the following

v∈G ₂ And a, b ε Z _p ，e(u ^a ，v ^b )＝e(u，v) ^ab The method comprises the steps of carrying out a first treatment on the surface of the 2) Non-degradability: />

v∈G ₂ So that e (u, v) noteq1; 3) Calculability: there is an efficient algorithm to make G ₁ ，G ₂ E can be calculated for all elements in (c).

Random forest algorithm:

random forests are a machine learning algorithm that is widely used for data classification and regression. The random forest algorithm has the advantages of small calculated amount, capability of efficiently processing various characteristics and the like. Specifically, the random forest is composed of a plurality of decision trees, and a specific algorithm for constructing a random forest model is as follows: 1) The initial training dataset D and the number of decision trees in the random forest are determined. Randomly selecting N pieces of data from the training set D with a put-back as the training set D _i The method comprises the steps of constructing an ith decision tree in a random forest; 2) When the node of the ith decision tree is to be split, m features are selected from all features randomly and not put back, the optimal feature is taken as the splitting feature of the node, and the node is split into left and right branches based on the feature. This process is repeated until all feature combinations are used, at which point the construction of the ith decision tree in the random forest is completed. It is noted that in the process of constructing the decision tree, the decision tree does not need to be pruned, so that each tree grows as much as possible; 3) Repeating the above two steps until all decision trees are built, and building a complete random forest model.

Optimized triangulation:

triangulation may determine the geographic location of cloud data based on three landmarks whose geographic locations are known. As shown in fig. 1A, if three landmarks L are known in geographic location ₁ ，L ₂ ，L ₃ The distance between the CSP and the CSP is r respectively ₁ ，r ₂ ，r ₃ Then in L ₁ ，L ₂ ，L ₃ As the center of a circle, r ₁ ，r ₂ ，r ₃ Three circular areas of radius are cloud data for landmark L ₁ ，L ₂ ，L ₃ In the region of the body. In this case, the cloud data is located in the overlapping area of the three circles, and the centroid of the overlapping area is considered as the geographic location of the cloud data. Because of the error in the estimated distance between the landmark and the CSP, the unavoidable geographical positioning error is brought about by the large overlapping area of the triangulation method. The optimized triangulation method can well solve the problem. As shown in fig. 1B, it converts the circular region in triangulation into a ring with positive and negative constraint distances, and the centroid of the ring intersection region is considered the geolocation of the cloud data. Compared with the common triangulation method, the optimized triangulation has great improvement on the positioning accuracy.

Disclosure of Invention

Aiming at the security problem of cloud data geographic positions in cloud services, the invention provides a cloud data geographic position verification method and a cloud data geographic position verification system, which solve the problems of low efficiency and large error in the cloud data geographic position determining technology.

The technical scheme of the invention is as follows:

a verification method of cloud data geographic position comprises the following steps:

1) The data owner divides a storage file into a plurality of file blocks, generates a file block homomorphic tag set, uploads the storage file and the file block homomorphic tag set to a cloud service provider, splits the generated random challenge into l sub-challenges, and respectively sends the l sub-challenges to l active landmarks, wherein l is more than or equal to 3;

2) Each active landmark forwards the received sub-challenges to a cloud service provider and records a sending timestamp;

3) The cloud service provider generates a plurality of sub-challenge evidences according to the stored file and each sub-challenge, generates random challenge evidences through the file block homomorphic tag set and each sub-challenge, broadcasts each sub-challenge evidence to each active landmark, and sends the random challenge evidences to the data owner;

4) Each active landmark records a receiving time stamp of the received sub-challenge evidence, and returns the sub-challenge evidence, the sending time stamp and the receiving time stamp to the data owner;

5) And the data owner verifies according to the random challenge evidence and each sub-challenge evidence, and calculates the distance from each active landmark to the cloud service provider through the sending time stamp and the receiving time stamp of each active landmark to obtain the geographic position of the cloud data.

Further, the original file is encoded into a stored file using a redundancy code.

Further, a file block homomorphic tag set is calculated by:

1) Randomly generating a signing key pair (spk, ssk) based on the security parameter λ;

2) Randomly selecting a random element

Calculate v≡g ^α A data holder private key sk= (α, ssk), a public key pk= (v, spk) is obtained, =>

For the finite field, G is the generator of G, G and G _T Is a multiplication cyclic group with the order of large prime number p, and G is G multiplied by G to G _T Is a bilinear map;

3) Selecting s random elements u ₁ ，u ₂ ，…u _s The process is carried out in the presence of a catalyst, computing a storage file tag t=name n u ₁ ||u ₂ ||…||u _s ||SSig _ssk (name||n||u ₁ ||u ₂ ||…||u _s ) S is the number of sectors of the file block, G is the multiplicative cyclic group with prime order p, SSig _ssk (. Cndot.) is signed by signature private key ssk, name is the name of file F;

4) Calculating homomorphic labels of each file block

Obtaining a file block homomorphic label set phi, wherein i is more than or equal to 1 and less than or equal to n, j is more than or equal to 1 and less than or equal to s, m is the number of file blocks, and H () is a hash function.

Further, the generated random challenge is split into l sub-challenges by

1) C different elements are randomly selected, and each challenge block index value set is I= { n ₁ ，n ₂ ，…，n _c }，c∈[1，n]C is an integer multiple of l;

2) For any i.epsilon.I, a random number is generated

Obtaining a random challenge chal= { (i, v) _i )}；

3) Splitting challenge chal into l sub-challenge chal _x The challenge block index value and the random number are sequentially split into sub challenges, wherein x is more than or equal to 1 and less than or equal to l.

Further, by ProtoμGen (F, chal) _x )→{μ _xj The evidence generation algorithm generates a plurality of sub-challenge evidences, mu _xj To challenge chal _x Corresponding sub-challenge evidence; the random challenge evidence is generated through a Proof sigma Gen (phi, chal) sigma evidence generation algorithm, wherein sigma is the random challenge evidence.

Further, the distance of each active landmark to the cloud service provider is calculated by:

1) Calculating the response time delay of each sub-challenge of each active landmark by using the sending time stamp and the receiving time stamp returned by each active landmark;

2) And inputting the response time delay of each sub-challenge of each active landmark into a trained random forest model to obtain the distance from each active landmark to the cloud service provider.

Further, the geographic location of the cloud data is determined using an optimized triangulation method.

1) The data owner selects a trusted entity as a third party audit, uploads a storage file and file block homomorphic label set to a cloud service provider, and sends a data geographic position obtaining request to the third party audit;

2) After the third party audits the verification request, splitting the generated random challenge into l sub-challenges, and respectively sending the l sub-challenges to l active landmarks, wherein l is more than or equal to 3;

3) Each active landmark forwards the received sub-challenges to a cloud service provider and records a sending timestamp;

4) The cloud service provider generates a plurality of sub-challenge evidences according to the stored file and each sub-challenge, generates random challenge evidences through the file block homomorphic tag set and each sub-challenge, broadcasts each sub-challenge evidence to each active landmark, and sends the random challenge evidences to a third party audit;

5) Each active landmark records a receiving time stamp of the received sub-challenge evidence, and returns the sub-challenge evidence, the sending time stamp and the receiving time stamp to the third party audit;

6) And the third party audit verifies according to the random challenge evidence and each sub-challenge evidence, calculates the distance from each active landmark to the cloud service provider through the sending timestamp and the receiving timestamp of each active landmark, and returns the obtained cloud data geographic position to the data owner.

Further, the third party audit is provided with a list for storing relevant information of the active landmark; the active landmark related information includes landmark serial numbers, geographic locations of landmarks, IP addresses of landmarks, information service providers of landmarks, and response delays for each landmark to a cloud service provider.

A verification system for cloud data geographic locations, comprising:

the data owner is used for encoding the original file into a storage file by the data owner, dividing the storage file into a plurality of file blocks and generating a file block homomorphic tag set; uploading the storage file and the file block homomorphic label set to a cloud service provider; splitting the generated random challenge into l sub-challenges, and respectively transmitting the l sub-challenges to the l active landmarks; verifying according to the random challenge evidence and each sub-challenge evidence, and calculating the distance from each active landmark to a cloud service provider through the sending time stamp and the receiving time stamp of each active landmark to obtain the geographic position of cloud data, wherein l is more than or equal to 3;

at least three active landmarks for forwarding the received sub-challenges to a cloud service provider, recording a transmission timestamp; receiving the receiving time stamp of the sub-challenge evidence, and returning the sub-challenge evidence, the sending time stamp and the receiving time stamp to the data owner;

the cloud service provider is used for generating a plurality of sub-challenge evidences according to the stored file and each sub-challenge, generating random challenge evidences through the file block homomorphic tag set and each sub-challenge, broadcasting each sub-challenge evidence to each active landmark, and sending the random challenge evidences to the data owner.

Compared with the prior art, the invention has the beneficial effects that:

the method solves the problem of overlarge response time delay caused by large calculation time delay in the existing cloud data geographic position determining scheme, and improves the efficiency and accuracy of the existing cloud data geographic position determining scheme. According to the scheme, according to the generation characteristics of homomorphic labels and response evidences in the CPOR model, the method is designed, the characteristics of low communication cost, no backup, no state and infinite number of audits in the CPOR model are reserved, efficiency is improved, the calculation cost in single response is reduced by only considering conventional operation through splitting challenges and evidences, further calculation time delay in response delay is reduced, and finally accurate response time delay is obtained. The invention uses random forest algorithm and optimized triangle positioning method, which can use various characteristics except response time delay to reflect network condition, and improves positioning accuracy.

Drawings

FIG. 1A is a schematic diagram of a prior art triangulation method.

FIG. 1B is a schematic diagram of an optimized triangulation method.

Fig. 2 is a system model diagram.

Fig. 3 is a scheme timing diagram.

FIG. 4 is a sub-challenge chal _x (1. Ltoreq.x.ltoreq.l).

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be further described in detail by examples and drawings.

The invention adopts a challenge-response mechanism, in particular, a data holder or a third party audit (Third Party Auditor, TPA) initiates a challenge to the CSP through an active landmark, the CSP generates evidence and responds to the challenge by utilizing stored cloud data and homomorphic labels, and finally, the geographic position of the cloud data is determined according to the characteristics of response time delay and the like. According to the invention, the calculation time delay of the CSP is reduced by splitting the challenges and the evidence in each geographic position determining process, and meanwhile, only the conventional operation in evidence calculation is considered, so that the time-consuming exponentiation operation is ignored to obtain accurate response time delay; the method of combining the random forest algorithm and the optimized triangular positioning method is adopted to determine the geographic position of cloud data, and the method does not need to rely on a linear mathematical model when estimating the distance from the landmark to the CSP, can also utilize other characteristics except response time delay to estimate the distance, and improves the positioning accuracy.

The cloud data geographic location is determined on the basis of Compact Proofs of Retrievability (CPOR) public audit data integrity model. In order to obtain the most accurate response time delay, a split method is designed to reduce the calculation cost in single response, so that the calculation time delay is reduced. In particular, the present scheme breaks each complete challenge into multiple sub-challenges before sending the challenge to reduce the number of file blocks involved in generating the proof. In addition, the scheme only considers conventional operations (namely addition and multiplication operations) with small calculation cost, and omits huge calculation cost and response delay caused by time-consuming exponentiation operations. Finally, when generating the certificate at the CSP side, the data unit in one file block is divided into a plurality of smaller sectors to reduce the amount of data involved in each certificate calculation.

The invention aims to determine cloud data geographic positions. Therefore, the method combines a random forest algorithm and an optimized triangulation method to determine the geographic position of cloud data. The method uses a random forest algorithm to calculate the average value of the estimated distances of all decision trees in the random forest as the distance between the landmark and the CSP, so that the method has stronger robustness to small fluctuation of response time delay. Secondly, in the random forest algorithm, a large number of features can be utilized to achieve higher decision accuracy in addition to response delay. Finally, applying given positive constraint and negative constraint to the estimated distance by using an optimized triangular positioning method to eliminate the influence of the distance estimation error on the positioning precision as much as possible, and finally, regarding the mass center of the overlapped area as the geographic position of cloud data.

Fig. 2 is a diagram of a system model of the present invention consisting of a Data holder (Data Owner), a third party audit (The Third Auditor, TPA), a cloud service provider (Cloud Service Provider, CSP), and Active Landmarks (Active handles).

1) Data holder (Data Owner): the data may be stored to CSP by any person or organization that enjoys the cloud's high convenience, high availability, and excellent storage services. The data holder does not maintain a copy of the data locally, it entrusts the TPA to send challenges to the CSP to determine the geographic location of the cloud data.

2) Third party audit (The Third Auditor, TPA): TPA is a trusted entity authorized by a user to determine the geographic location of cloud data in place of a data holder, and after obtaining the geographic location of the cloud data, TPA returns the determined geographic location results to the data holder.

3) Cloud service provider (Cloud Service Provider, CSP): CSPs provide powerful computing, network, and storage resources for data holders, who store data in a data center.

4) Active Landmarks (Active landmark): an active landmark is a fixed geographic location, known set of trusted hosts. They act as intermediaries between TPA and CSP in determining the geographic location of cloud data, responsible for forwarding challenges and responses.

Specifically, the technical scheme adopted by the invention is as follows:

the invention can be divided into five phases, namely an initialization phase (Setup), a selection-Parsing phase (selection-Parsing), a Challenge-Response phase (Challenge-Response), a Verification phase (Verification), and a cloud data geographic location determination phase (geographic). Scheme timing diagram as shown in fig. 3, each stage contains a plurality of polynomial time algorithms. The specific algorithm involved in the invention is as follows:

1. key generation algorithm KeyGen (1) ^λ ) The data holder executes a key generation algorithm, the input parameter of KeyGen (·) is λ, and the output is a key pair (sk, pk).

2. The file block signature algorithm SigGen (sk, F) → (t, Φ) which the data holder executes to generate the authentication tag of the file F. The input parameters of SigGen (·) are the private key sk and the file F. The output of SigGen (), includes file tags

t and the homomorphic label set phi of the file data block.

3. Challenge generation algorithm GenChal (list) →Chal _x The algorithm generates a complete challenge chal by TPA operation and splits the challenge chal into a plurality of sub-challenges, the input parameters of GenChal (& gt) are selected active landmark set list, and the output is the sub-challenge chal _x And x is more than or equal to 1 and less than or equal to l, wherein l is the number of the selected active landmarks.

4. Evidence μ generation algorithm Proof μ Gen (F, chal) _x )→{μ _xj The algorithm is run by the CSP to generate sub-challenge evidence corresponding to the sub-challenge. The input parameters of ProtoμGen (& gt) include CSP storage file F, sub-challenge chal _x Its output is challenge chal _x Corresponding partial evidence { mu } _xj }，1≤j≤s。

5. Evidence sigma generation algorithm Proof sigma Gen (Φ, chal) →sigma, which is run by CSP to generate random challenge evidence corresponding to complete challenge chal. The input parameters of Proof σgen (·) include the homomorphic tag set Φ stored for CSP and the complete challenge chal (i.e., { chal _x }) as input, the output is evidence σ.

6. Verification evidence algorithm verifyprof(pk，chal，σ，{μ _xj 0, 1) run by TPA to verify the correctness and consistency of evidence of CSP response. The input to VerifyProof (·) includes the public key pk of the data holder, the evidence σ of CSP response. Evidence { μ ] of active landmark list receipt _xj X is not less than 1 and not more than l, and j is not less than 1 and not more than s. If the evidence is verified, a 1 is output and the ComDelay (·) algorithm is performed. Otherwise the algorithm outputs 0 and terminates the procedure.

7. A delay calculation algorithm ComDelay (D) →delay that calculates the response delay of each active landmark to CSP from TPA operation. The algorithm takes the timestamp set as input and outputs as the response delay set delay.

8. Determining cloud data geographic position algorithm GeoData (data, B, e, list) →region, which determines cloud data geographic position by TPA operation. The input of the algorithm comprises a training data set data, training parameters B, constraint parameters E and a selected active landmark information set list, and the output is the geographic position region where cloud data are located.

First, we assume that: 1) e: g is G.fwdarw.G _T Is a bilinear map in which G and G _T Is a multiplication cyclic group with a large prime number p, G is a generator of G; 2) H: {0,1} ^* G is a hash function; 3) Suppose that the data holder encodes the original file F' into a file F with redundancy code and equally divides it into n file blocks, i.e., f= (m) ₁ ，m ₂ ，…，m _n ) M in _i ∈Z _p The method comprises the steps of carrying out a first treatment on the surface of the 4) Each file block is equally divided into s sectors, each sector being denoted m _ij ∈Z _p Wherein i is more than or equal to 1 and less than or equal to n, j is more than or equal to 1 and less than or equal to s. In the invention, the main working tasks of each stage are as follows:

in the initialization stage (Setup), the data holder first executes the KeyGen (-) algorithm to generate a public-private key pair, then runs the SigGen (-) algorithm to generate a file tag and a homomorphic tag for the file F, and uploads relevant information to the TPA and CSP. Specifically, the present invention relates to a method for manufacturing a semiconductor device. 1) The data holder randomly generates a signing key pair (spk, ssk) $ based on the security parameter lambda, and then selects a random element

Calculate v≡g ^α . Thus, the data holder private key is sk= (α, ssk), and the public key is pk= (v, spk); 2) The data holder selects s random elements u ₁ ，u ₂ ，…u _s And c. The data holder is then File F calculate file tag t=name n u ₁ ||u ₂ ||…||u _s ||SSig _ssk (name||n||u ₁ ||u ₂ ||…||u _s ) Wherein name is the name of file F; 3) The data holder is for each data block m in the file F _i Calculating homomorphic label sigma _i Wherein

The set of database homomorphic labels is denoted Φ= { σ _i I is more than or equal to 1 and n is more than or equal to n; 4) The data owner sends { F, Φ } to the CSP and { pk, t } to the TPA, while deleting the local copy in order to reduce the user's storage overhead.

In the select-parse phase (selection-Parsing), the TPA retrieves the validation metadata of file F from file tag t and selects an active landmark list for determining the geographic location of the cloud data. Specifically, 1) the TPA restores the public key pk to (v, spk) to verify the signature in the file tag t. If the signature is invalid, the process of locating the cloud data will be terminated, otherwise the TPA recovers the verification metadata n and u ₁ ，u ₂ ，…u _s And executing the next step; 2) The TPA selects l active landmarks with serial numbers of 1,2,3, … and l respectively. Notably, the list contains information about the active landmark, such as a landmark number, a geographic location of the landmark, and the like.

In the Challenge-Response phase (Challenge-Response), after the TPA resolves the file labels successfully and selects active landmarks, the TPA will first run the genchallenge (·) algorithm to generate a complete Challenge chal and split it into l sub-challenges, specifically: 1) The TPA randomly selects c different elements, each element representing a randomly selected challenge block index, the set of index values being i= { n ₁ ，n ₂ ，…，n _c }，c∈[1，n]. It is noted that the challenge block number c should be an integer multiple of the selected active landmark number l; 2) TPA for any I ε IGenerating a random number

Thus the complete challenge of TPA generation is chal= { (i, v) _i ) -wherein I e I; 3) TPA splits challenge chal into l sub-challenges, each represented as chal _x X is more than or equal to 1 and less than or equal to l. That is to say chal= { chal _x X is more than or equal to 1 and less than or equal to l. Each sub-challenge contains->

The index values of the sub-challenge blocks and the random numbers of the sub-challenge blocks are distributed in sequence.

The TPA then sends the l sub-challenges to the corresponding active landmarks, respectively, each of which forwards the received sub-challenges to the CSP while recording the timestamp of the time of the completion of the forwarding. The CSP calculates the evidence immediately after receiving the sub-challenge and returns the evidence. As shown in fig. 4, challenge chal with sub _x (1. Ltoreq.x.ltoreq.l) details of the challenge-response between TPA and CSP are described in detail below: 1) TPA will sub-challenge chal _x Transmitting to an active landmark with a sequence number x; 2) Active landmark x will sub-challenge chal _x Forward to CSP and record the time as t _x，xstart The method comprises the steps of carrying out a first treatment on the surface of the 3) After the CSP receives the sub-warfare, the CSP runs the ProtoμGen (-) algorithm to calculate evidence μGen according to formula (1) _xj (1. Ltoreq.j.ltoreq.s). After each calculation was completed, CSP immediately will mu _xj Broadcasting to all active landmarks in the list; 4) Evidence μ that the active landmark in list received CSP broadcast _xj When this time is recorded, the time stamp of this time is recorded. For example, an active landmark with the sequence number k (1. Ltoreq.k. Ltoreq.l) receives evidence μ _xj When the time stamp is denoted as t _x，kjend The method comprises the steps of carrying out a first treatment on the surface of the 5) After the receipt of evidence is completed, all active landmarks will receive evidence μ _xj (x is less than or equal to 1 and less than or equal to l, j is less than or equal to 1 and less than or equal to s) and the corresponding time stamp are forwarded to TPA; 6) After all sub-challenges are forwarded, the CSP runs the algorithm ProxσGen (), calculates evidence σ for the complete challenge chal (i.e., the set of all sub-challenges) according to equation (2), and sends it directly to TPA.

Wherein r represents a sub-challenge chal _x Sequence number of challenge block in total challenge chal, v _r Then correspond to the random number, n, for the challenge block _r An index value for the challenge block.

In the Verification (Verification) stage, TPA receives the evidence of CSP response, and invokes the VerifyProof () algorithm to verify the consistency and correctness of the active landmark and CSP evidence, specifically as follows: 1) TPA validates the set of evidence { μ ] for each sub-challenge forwarded from the proactive landmark in the list _xj X is not less than 1 and not more than l, j is not less than 1 and not more than s). If the evidence is not the same, the process of determining the geographic location of the cloud data will terminate; 2) Otherwise, TPA audits the correctness of the evidence according to equation (3). If equation (3) holds, the TPA will continue to determine the geographic location of the cloud data. Instead, the process of determining the geographic location of cloud data will be terminated.

In the determine cloud data geographic location (geonation) phase, TPA first calculates the response delay between landmarks to CSPs, specifically: 1) TPA calculates response time delay t of landmark x to CSP and then back to landmark y by using time stamp returned by active landmark _xCk ←median(t _x，kjend -t _x，xstart ) Wherein x is more than or equal to 1 and less than or equal to l, j is more than or equal to 1 and less than or equal to s. Notably, t _xCk Including data network transmission delay and calculation evidence mu _xj The time delay of x is more than or equal to 1 and less than or equal to l, j is more than or equal to 1 and less than or equal to s, and because the split method is used, CSP only needs to consider conventional operations such as addition and multiplication in single response, the calculation time delay of the CSP end is greatly reduced, and more accurate response time delay is obtained; 2) TPA calculates the response time delay from landmark x (1. Ltoreq.x. Ltoreq.l) to CSP to landmark using 1)Response time delay T of x _x ←median(t _xCk +t _xCh -t _kCh ). The set of response delays for each active landmark to CSP in list is defined as delay≡ { T- _x X is more than or equal to 1 and less than or equal to l. While the response delay of each landmark to CSP is added to the list. Notably, the list includes all relevant information of the selected active landmarks, such as response time delay, geographic location, and city population density that may represent network status information, network information service provider, etc. The list will be used to estimate the linear distance of the active landmark to the CSP.

After the TPA calculation response is delayed, a GeoData (·) algorithm is called to determine the geographic position of cloud data, and the method specifically comprises the following two steps: 1) Estimating a distance between the selected active landmark and the CSP; 2) The cloud data geographic location is determined using the geometric knowledge. The method comprises the following specific steps:

to estimate the distance between the selected/active landmarks to the CSP, the TPA first builds a random forest model containing B decision trees through the training set. The training set is obtained in advance by other measurement modes, and the training set Data is assumed to have a total of N pieces of Data, wherein each piece of Data contains M features (for example, response time delay) and a training target (i.e., the linear distance between two hosts in each measurement). Subsequently, the TPA estimates the distance of the active landmark to the CSP using a data set list containing M feature information for all active landmarks, a constructed random forest model, the entire process being described in detail below: 1) Firstly, randomly selecting N pieces of Data from a training set Data by the TPA in a place-back way to be used as a training set for constructing a b-th tree in a random forest; 2) Constructing a b-th decision Tree _b When the nodes of the decision tree need to be split, the TPA randomly selects M sub-features from all M features without replacement, takes the feature with the minimum mean square error as the split feature, and splits the nodes of the tree into left and right branches. The TPA repeats this step until all feature combinations are used, at which point the decision tree construction is complete. Notably, in the process of constructing the decision tree, without pruning, each tree grows as much as possible; 3) Repeatedly executing the step (1) and the step (2) until the construction of the random forest model is completed; 4) TPA use comprisesThe dataset list of feature information for all active landmarks estimates the distance of each active landmark to the CSP. That is, for an active landmark with the sequence number x, 1. Ltoreq.x.ltoreq.l, if the corresponding information of the landmark in the list is input to the random forest model, each decision tree in the random forest will produce a decision, the result of averaging all decisions is regarded as the distance from the active landmark x to the CSP, which is denoted as d _x ,1≤x≤l。

Finally, TPA uses an optimized triangulation method to determine the geographic location of cloud data due to the estimated distance d actively marked to CSP _x And the error exists in the range of x which is more than or equal to 1 and less than or equal to l, so that the influence of the error is reduced by the constraint parameter epsilon. That is, the positive distance of each landmark to CSP is the estimated distance d _x Increasing e, the negative distance is then decreasing e for the estimated value. The positive and negative distances are the maximum radius and the minimum radius of the cloud data geographic position estimation area respectively. Thus, for each selected active landmark, the cloud data will be located in one annular region based on the positive and negative distances, and the centroid of the overlapping portion of the annular regions of all selected active landmarks is the geographic location of the cloud data.

Furthermore, in another embodiment of the present invention, the TPA related functionality may be implemented by the data owner's client.

The above embodiments are only for illustrating the technical solution of the present invention and not for limiting the same, and those skilled in the art may modify or substitute the technical solution of the present invention without departing from the spirit and scope of the present invention, and the protection scope of the present invention shall be defined by the claims.

Claims

1. A verification method of cloud data geographic position comprises the following steps:

1) The data holder divides the storage file into a plurality of file blocks, generates a file block homomorphic tag set, uploads the storage file and the file block homomorphic tag set to a cloud service provider, splits the generated random challenge into l sub-challenges, and respectively sends the l sub-challenges to l active landmarks, wherein l is more than or equal to 3;

3) The cloud service provider generates a plurality of sub-challenge evidences according to the stored file and each sub-challenge, generates random challenge evidences through the file block homomorphic tag set and each sub-challenge, broadcasts each sub-challenge evidence to each active landmark, and sends the random challenge evidences to the data holder;

4) Each active landmark records a receiving time stamp of the received sub-challenge evidence, and returns the sub-challenge evidence, the sending time stamp and the receiving time stamp to the data holder;

5) And the data holder verifies according to the random challenge evidence and each sub-challenge evidence, and calculates the distance from each active landmark to the cloud service provider through the sending timestamp and the receiving timestamp of each active landmark to obtain the geographic position of the cloud data.

2. The method of claim 1, wherein the original file is encoded as the stored file using a redundancy code.

3. The method of claim 1, wherein the set of file block homomorphic labels is calculated by:

2) Randomly selecting a random element

3) Selecting s random elements u ₁ ，u ₂ ，...u _s The process is carried out in the presence of a catalyst, computing a storage file tag t=name n u ₁ ||u ₂ ||…||u _s ||SSig _ssk (name||n||u ₁ ||u ₂ ||…||u _s ) S is the number of sectors of the file block, G is the multiplicative cyclic group with prime order p, SSig _ssk (. Cndot.) is signed by signature private key ssk, name is the name of file F;

4) Calculating homomorphic labels of each file block

Obtaining a file block homomorphic label set +.>

I is more than or equal to 1 and less than or equal to n, j is more than or equal to 1 and less than or equal to s, n is the number of file blocks, and H () is a hash function.

4. A method according to claim 3, characterized in that the generated random challenge is split into l sub-challenges by:

1) C different elements are randomly selected, and each challenge block index value set is I= { n ₁ ，n ₂ ，...，n _c }，c∈[1，n]C is an integer multiple of l;

2) For any i.epsilon.I, a random number is generated

Obtaining a random challenge chal= { (i, v) _i )}；

5. The method of claim 1, wherein the distance of each active landmark to the cloud service provider is calculated by:

6. The method of claim 1, wherein the cloud data geographic location is obtained by steps comprising:

selecting three active landmarks L1, L2, L3 based on the distances of each active landmark to the cloud service provider; the distances between the active landmarks L1, L2 and L3 and the cloud service provider are divided into r1, r2 and r3;

based on the set constraint parameters epsilon, positive constraint distance r1+ epsilon and negative constraint distance r1 epsilon of the active landmark L1, positive constraint distance r2+ epsilon and negative constraint distance r2 epsilon of the active landmark L2, and positive constraint distance r3+ epsilon and negative constraint distance r3 epsilon of the active landmark L3 are obtained respectively;

drawing a first annular region by taking the position of an active landmark L1 as a circle center and taking a positive constraint distance r1+ epsilon and a negative constraint distance r1 epsilon as radiuses;

drawing a second annular region by taking the position of the active landmark L2 as a circle center and taking a positive constraint distance r2+ epsilon and a negative constraint distance r2 epsilon as radiuses;

drawing a third annular region by taking the position of the active landmark L3 as a circle center and taking a positive constraint distance r3+ epsilon and a negative constraint distance r3 epsilon as radiuses;

and taking the mass center of the overlapped part of the first annular area, the second annular area and the third annular area as cloud data geographic positions.

7. A verification method of cloud data geographic position comprises the following steps:

1) A data holder selects a trusted entity as a third party audit, uploads the stored file and file block homomorphic label set obtained by any one of the methods of claims 1-6 to a cloud service provider, and sends a request for obtaining the geographic position of the data to the third party audit;

6) And the third party audit verifies according to the random challenge evidence and each sub-challenge evidence, calculates the distance from each active landmark to the cloud service provider through the sending timestamp and the receiving timestamp of each active landmark, and returns the obtained cloud data geographic position to the data holder.

8. The method of claim 7, wherein the third party audit is provided with a list storing information related to active landmarks; the active landmark related information includes landmark serial numbers, geographic locations of landmarks, IP addresses of landmarks, information service providers of landmarks, and response delays for each landmark to a cloud service provider.

9. A verification system for cloud data geographic locations, comprising:

the data holder is used for encoding the original file into a storage file, dividing the storage file into a plurality of file blocks and generating a file block homomorphic tag set; uploading the storage file and the file block homomorphic label set to a cloud service provider; splitting the generated random challenge into l sub-challenges, and respectively transmitting the l sub-challenges to the l active landmarks; verifying according to the random challenge evidence and each sub-challenge evidence, and calculating the distance from each active landmark to a cloud service provider through the sending time stamp and the receiving time stamp of each active landmark to obtain the geographic position of cloud data, wherein l is more than or equal to 3;

at least three active landmarks for forwarding the received sub-challenges to a cloud service provider, recording a transmission timestamp; receiving the receiving time stamp of the sub-challenge evidence, and returning the sub-challenge evidence, the sending time stamp and the receiving time stamp to the data holder;

the cloud service provider is used for generating a plurality of sub-challenge evidences according to the stored file and each sub-challenge, generating random challenge evidences through the file block homomorphic tag set and each sub-challenge, broadcasting each sub-challenge evidence to each active landmark, and sending the random challenge evidences to the data holder.