CN113098843B - High-speed random sampling encryption method for geological and geographical big data - Google Patents

High-speed random sampling encryption method for geological and geographical big data Download PDF

Info

Publication number
CN113098843B
CN113098843B CN202110249541.8A CN202110249541A CN113098843B CN 113098843 B CN113098843 B CN 113098843B CN 202110249541 A CN202110249541 A CN 202110249541A CN 113098843 B CN113098843 B CN 113098843B
Authority
CN
China
Prior art keywords
data
file
block
data block
data blocks
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110249541.8A
Other languages
Chinese (zh)
Other versions
CN113098843A (en
Inventor
宋军
杨帆
王增鹏
徐衡
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China University of Geosciences
Original Assignee
China University of Geosciences
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China University of Geosciences filed Critical China University of Geosciences
Priority to CN202110249541.8A priority Critical patent/CN113098843B/en
Publication of CN113098843A publication Critical patent/CN113098843A/en
Application granted granted Critical
Publication of CN113098843B publication Critical patent/CN113098843B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/04Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks
    • H04L63/0428Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks wherein the data content is protected, e.g. by encrypting or encapsulating the payload
    • H04L63/0435Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks wherein the data content is protected, e.g. by encrypting or encapsulating the payload wherein the sending and receiving network entities apply symmetric encryption, i.e. same key used for encryption and decryption
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/29Geographical information databases
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/04Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks
    • H04L63/0428Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks wherein the data content is protected, e.g. by encrypting or encapsulating the payload
    • H04L63/0457Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks wherein the data content is protected, e.g. by encrypting or encapsulating the payload wherein the sending and receiving network entities apply dynamic encryption, e.g. stream encryption

Abstract

The invention provides a geological and geographical big data oriented high-speed random sampling encryption method, which is used for obtaining geological and geographical data files; dividing the file into data blocks with random sizes by a file random division module; randomly sampling the data blocks by a reservoir sampling method to obtain the sampled data blocks; encrypting the extracted data block by adopting an AES encryption method; encrypting the data blocks which are not extracted by adopting a ZUC stream cipher method; and storing the encrypted data block and the Hash value corresponding to the data block. The invention has the beneficial effects that: the method can realize the quick and efficient encryption of the big data file while ensuring the safety, and fills the gap that the traditional data encryption technology in the current market hardly meets the requirements of big data on the aspects of encryption speed, processing performance, safety and the like.

Description

High-speed random sampling encryption method for geological and geographical big data
Technical Field
The invention relates to the field of random sampling and data encryption, in particular to a high-speed random sampling encryption method for geological and geographical big data.
Background
The explosive growth of data is promoted by the high-speed development of technologies such as cloud computing, the internet of things and 5G. Big data brings convenience to production and life of people, but meanwhile, the problems of safety and privacy protection are increasingly highlighted. The value of geological and geographic big data is mainly embodied in data mining, and malicious and excessive data mining and uncontrolled data abuse can cause sensitive information to be leaked or private data to be stolen. Therefore, it becomes especially important to encrypt large data.
At present, part of traditional encryption algorithms are influenced by algorithm design and cannot be obviously improved in the aspect of encryption speed; secondly, the performance and load of a single computer almost reach the limit, and faster encryption performance cannot be obtained; finally, big data has the characteristics of large volume, diversification, high growth speed and the like, and a high-speed safe encryption method for the big data is rarely involved. At present, the market has few patent technologies for fast encryption of big data, wherein a patent "a big data encryption method" (201410258583.8) proposes a technology and a method for encrypting plaintext blocks and then secondarily encrypting intermediate plaintext, and a patent "hybrid encryption method and a device for implementing the method" (201510472098.5) proposes a technology and a method for encrypting message data by using an AES encryption algorithm and then encrypting a session key by using an SM2 encryption algorithm. The two technologies both adopt a secondary encryption method, which meets the security requirement of the big data file, but does not meet the requirement of the big data file on the transmission efficiency.
Disclosure of Invention
Aiming at the defects, the invention provides a high-speed random sampling encryption method and system for geological and geographical big data. The system gives consideration to algorithm security and encryption speed, and can improve the processing speed of the encryption scheme while ensuring the security of big data.
The invention provides a high-speed random sampling encryption method for geological and geographical big data, which specifically comprises the following steps:
s101: acquiring geological and geographic data files;
s102: dividing the file into data blocks with random sizes by a file random division module;
s103: randomly sampling the data blocks by a reservoir sampling method to obtain the sampled data blocks;
s104: encrypting the extracted data block by adopting an AES encryption method; encrypting the data blocks which are not extracted by adopting a ZUC stream cipher method;
s105: and storing the encrypted data block and the Hash value corresponding to the data block.
Further, the geological and geographic data files include pictures, tables and text formats.
Further, step S102 specifically includes:
s201: setting the size l of a basic block of a file through a file random segmentation module; l is used to determine the granularity of the block;
s202: randomly generating a variable parameter riContinuously dividing the file until the size of the last data block is smaller than l; the size of the data block is:
L(i)=l+rl
where i is the number corresponding to the data block.
S203: creating a corresponding Chunk Index and Chunk Size, and storing the generated data content and data Chunk Index parameters together with the corresponding file name; the block index stores the number i corresponding to the data block, and the block size stores the size L (i) of the data block corresponding to the block index.
Further, step S103 specifically includes:
s301: taking the first k data blocks in the data block and putting the data blocks into a reservoir; k is a preset value according to actual requirements;
s302: starting from j to k +1 data blocks, extracting a jth data block according to the probability of k/j, and if the jth element is selected, replacing any previously selected data block in the reservoir with equal probability until the whole data block is traversed;
s303: after the traversal is finished, the data blocks in the reservoir are the data blocks which are being extracted, and the rest are the data blocks which are not being extracted.
The process of decrypting the encrypted data specifically comprises the following steps:
s401: extracting the encrypted data block, and decrypting by adopting a decryption method corresponding to an AES encryption algorithm and a ZUC stream cipher encryption algorithm;
s402: splicing and recovering the decrypted data according to the stored Chunk Index and the Chunk Size to obtain a spliced file;
s403: and calculating the Hash value of the decrypted spliced file, verifying, and if the verification is passed, successfully decrypting the spliced file to obtain the original geological and geographic data file.
The beneficial effects provided by the invention are as follows: the invention gives consideration to the safety and the encryption performance of the algorithm, can realize the quick and efficient encryption of the big data file while ensuring the safety, solves the problems of the efficiency and the safety of the big data encryption to a certain extent, and has certain application prospect and engineering value.
Drawings
FIG. 1 is a flow chart of a real-time detection method of the present invention;
FIG. 2 is a schematic diagram of a large data file being randomly partitioned;
FIG. 3 is a schematic diagram of reservoir sampling of a partitioned file;
FIG. 4 is a schematic diagram of AES encryption algorithm encryption and ZUC stream cipher algorithm encryption of a file block;
fig. 5 is a schematic diagram of decrypting an encrypted file.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention will be further described with reference to the accompanying drawings.
A high-speed random sampling encryption method for geological and geographical big data comprises the following steps:
s101: acquiring geological and geographic data files; the geological and geographic data files comprise pictures, tables and text formats;
s102: dividing the file into data blocks with random sizes by a file random division module;
step S102 specifically includes:
s201: setting the size l of a basic block of a file through a file random segmentation module; l is used to determine the granularity of the block; preferably, the setting can be made according to the security level, for example, l can be: 32KB, 48KB, 64 KB.; the larger l, the lower the security level;
s202: randomly generating a variable parameter riContinuously dividing the file until the size of the last data block is smaller than l; the size of each data block is:
L(i)=l+ri
wherein i is a number corresponding to the data block; the size of the last data block is LD ═ length- Σ l (i); length is the total size of the file; because the size of the file and the size of the data block are not necessarily exactly in integer division relationship, when the size of the data block cannot be divided into the size of the file, the last remaining file with the length less than l is treated as a file block, and the random parameter r is used for treating the file blockiThe final file block number n is made to have randomness;
referring to FIG. 2, FIG. 2 is a hypothetical source big dataThe file length is length, and when the variable r in the figureiWhen the value of (1) is constantly equal to 0, the partitioning strategy is a fixed-length partition. The fixed-length block strategy divides a file into blocks with fixed length, namely the number n of the file blocks is length//, wherein/is the size of the fixed-length blocks, but the block strategy enables an attacker to attack a secret key by executing a plaintext encryption process for multiple times, so that the safety of high-speed random sampling encryption of big data is reduced; if in FIG. 2 riIf the value of the key is not equal to 0, the blocking strategy is a variable-length blocking strategy, and the large-data variable-length blocking can divide the file into data blocks with different sizes, so that the randomness component of the high-speed random sampling encryption scheme is increased, and the difficulty of an attacker in attacking the key is improved.
S203: creating a corresponding Chunk Index and Chunk Size, and storing the generated data content and data Chunk Index parameters together with the corresponding file name; the block index stores the number i corresponding to the data block, and the block size stores the size L (i) of the data block corresponding to the block index.
S103: randomly sampling the data blocks by a reservoir sampling method to obtain the sampled data blocks;
step S103 specifically includes:
s301: taking the first k data blocks in the data block and putting the data blocks into a reservoir; k is a preset value according to actual requirements;
s302: starting from j to k +1 data blocks, extracting a jth data block according to the probability of k/j, and if the jth element is selected, replacing any previously selected data block in the reservoir with equal probability until the whole data block is traversed;
s303: after the traversal is finished, the data blocks in the reservoir are the data blocks which are being extracted, and the rest are the data blocks which are not being extracted.
Referring to fig. 3, according to the user security requirement, an extraction ratio p is set to perform sampling (the actual extracted data block number k is the file block number n × p), and the variable length block algorithm and the reservoir sampling algorithm are performed simultaneously, so that the large data file can be read only once to complete all operations. The scheme can be set according to the security level of the big dataThe flexibility of a high-speed random sampling encryption scheme is improved due to different sampling proportions; and the data blocks extracted through the reservoir are mutually independent, so that the attack difficulty of an attacker is increased, and the confidentiality of the source big data file after segmentation is ensured to a certain degree. In FIG. 3, A1、A3、A5、A7The data blocks which are not extracted are obtained; a. the2、A4、A6、A8Is the data block being extracted;
s104: encrypting the extracted data block by adopting an AES encryption method; encrypting the data blocks which are not extracted by adopting a ZUC stream cipher method;
referring to fig. 4, for a large data file to be encrypted, the file is first segmented by a variable length blocking algorithm, then the segmented data blocks are sampled by a reservoir sampling algorithm, the sampled data blocks are encrypted by an AES encryption algorithm, the data blocks that are not sampled are encrypted by a ZUC stream cipher algorithm, and the encrypted data blocks and a Hash value of a source file are respectively stored. In FIG. 4, "" A "1、A3、A5、A7Encrypting by adopting a ZUC stream cipher method; a. the2、A4、A6、A8Encrypting by adopting an AES encryption method;
s105: and storing the encrypted data block and the Hash value corresponding to the data block.
The process of decrypting the encrypted data specifically comprises the following steps:
s401: extracting the encrypted data block, and decrypting by adopting a decryption method corresponding to an AES encryption algorithm and a ZUC stream cipher encryption algorithm;
s402: splicing and recovering the decrypted data according to the stored Chunk Index and the Chunk Size to obtain a spliced file;
s403: and calculating the Hash value of the decrypted spliced file, verifying, and if the verification is passed, successfully decrypting the spliced file to obtain the original geological and geographic data file.
Referring to fig. 5, when a file needs to be decrypted, the data blocks stored in different containers are first extracted, then decrypted by using the advanced encryption algorithm and the stream cipher encryption algorithm, and the decrypted data blocks are spliced by combining the data block index parameters stored in the system. And finally, calculating the Hash value of the decrypted file, and verifying the integrity of the data. The advanced encryption algorithm AES and the fast stream cipher encryption algorithm ZUC are used for encrypting the data blocks which are extracted and not extracted respectively, so that the speed of data block encryption processing is accelerated, the randomness of data block encryption is further increased on the basis of introducing the random sampling algorithm of the water storage pool, and the safety of large data files is ensured.
The beneficial effects provided by the invention are as follows: the invention provides a big data-oriented high-speed random sampling encryption method and system aiming at the function of safe and rapid encryption of less big data in the domestic market. Through the mode, the method and the device can realize the quick and efficient encryption of the big data file while ensuring the safety, and fill the vacancy that the traditional data encryption technology in the current market is difficult to meet the requirements of big data on encryption speed, processing performance, safety and the like.
The invention has the advantages that: firstly, by setting random parameters, the variable-length blocking of the file is realized, and the number of the blocks of the file has randomness; secondly, randomly sampling by adopting a reservoir sampling algorithm, encrypting the sampled file blocks by adopting an advanced encryption algorithm, and encrypting the rest file blocks by adopting a rapid stream encryption algorithm, wherein the number of the sampled file blocks is relatively small, and most of the file blocks are encrypted by the rapid stream encryption algorithm, so that the encryption efficiency is greatly improved, and meanwhile, the difficulty of cracking the file which wants to obtain a big data file is greatly increased due to the randomness of the sampling, the randomness of the file blocks and the adoption of the advanced encryption algorithm; in addition, the user can set the sampling number of the water reservoir sampling algorithm according to the required safety level, so that the system can adapt to various scenes in reality, meet various requirements of the user and have the characteristic of humanization.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims (2)

1. A high-speed random sampling encryption method for geological and geographical big data is characterized by comprising the following steps: the method specifically comprises the following steps:
s101: acquiring geological and geographic data files;
s102: dividing the file into data blocks with random sizes by a file random division module;
s103: randomly sampling the data blocks by a reservoir sampling method to obtain the sampled data blocks;
s104: encrypting the extracted data block by adopting an AES encryption method; encrypting the data blocks which are not extracted by adopting a ZUC stream cipher method;
s105: storing the encrypted data block and the Hash value corresponding to the data block;
step S102 specifically includes:
s201: setting the size l of a basic block of a file through a file random segmentation module; l is used to determine the granularity of the block;
s202: randomly generating a variable parameter riContinuously dividing the file until the size of the last data block is smaller than l; the size of the data block is:
L(i)=l+ri
wherein i is a number corresponding to the data block;
s203: creating a corresponding Chunk Index and Chunk Size, and storing the generated data content and data Chunk Index parameters together with the corresponding file name; wherein, the block index stores the serial number i corresponding to the data block, and the block size stores the size L (i) of the data block corresponding to the block index;
step S103 specifically includes:
s301: taking the first k data blocks in the data block and putting the data blocks into a reservoir; k is a preset value according to actual requirements;
s302: starting from j to k +1 data blocks, extracting a jth data block according to the probability of k/j, and if the jth element is selected, replacing any previously selected data block in the reservoir with equal probability until the whole data block is traversed;
s303: after traversing, the data blocks in the reservoir are the data blocks which are extracted, and the rest are the data blocks which are not extracted;
the process of decrypting the encrypted data specifically comprises the following steps:
s401: extracting the encrypted data block, and decrypting by adopting a decryption method corresponding to an AES encryption algorithm and a ZUC stream cipher encryption algorithm;
s402: splicing and recovering the decrypted data according to the stored Chunk Index and the Chunk Size to obtain a spliced file;
s403: and calculating the Hash value of the decrypted spliced file, verifying, and if the verification is passed, successfully decrypting the spliced file to obtain the original geological and geographic data file.
2. The geological and geographical big data oriented high-speed random sampling encryption method as claimed in claim 1, characterized in that: the geological and geographic data files include pictures, tables and text formats.
CN202110249541.8A 2021-03-08 2021-03-08 High-speed random sampling encryption method for geological and geographical big data Active CN113098843B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110249541.8A CN113098843B (en) 2021-03-08 2021-03-08 High-speed random sampling encryption method for geological and geographical big data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110249541.8A CN113098843B (en) 2021-03-08 2021-03-08 High-speed random sampling encryption method for geological and geographical big data

Publications (2)

Publication Number Publication Date
CN113098843A CN113098843A (en) 2021-07-09
CN113098843B true CN113098843B (en) 2022-06-14

Family

ID=76667752

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110249541.8A Active CN113098843B (en) 2021-03-08 2021-03-08 High-speed random sampling encryption method for geological and geographical big data

Country Status (1)

Country Link
CN (1) CN113098843B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117560233B (en) * 2024-01-12 2024-04-05 深圳市金飞杰信息技术服务有限公司 Method and system based on data interaction encryption

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104205117A (en) * 2014-04-10 2014-12-10 华为技术有限公司 Device file encryption and decryption method and device
CN105260668A (en) * 2015-10-10 2016-01-20 北京搜狗科技发展有限公司 File encryption method and electronic device
CN106788982A (en) * 2017-02-22 2017-05-31 郑州云海信息技术有限公司 A kind of sectional encryption transmission method and device
WO2017166856A1 (en) * 2016-03-31 2017-10-05 北京金山安全软件有限公司 Method, device and equipment for file encryption

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108551434B (en) * 2015-08-26 2019-04-12 华为技术有限公司 The method and apparatus for transmitting HE-LTF sequence

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104205117A (en) * 2014-04-10 2014-12-10 华为技术有限公司 Device file encryption and decryption method and device
CN105260668A (en) * 2015-10-10 2016-01-20 北京搜狗科技发展有限公司 File encryption method and electronic device
WO2017166856A1 (en) * 2016-03-31 2017-10-05 北京金山安全软件有限公司 Method, device and equipment for file encryption
CN106788982A (en) * 2017-02-22 2017-05-31 郑州云海信息技术有限公司 A kind of sectional encryption transmission method and device

Also Published As

Publication number Publication date
CN113098843A (en) 2021-07-09

Similar Documents

Publication Publication Date Title
CN109474423B (en) Data encryption and decryption method, server and storage medium
US11709948B1 (en) Systems and methods for generation of secure indexes for cryptographically-secure queries
US11637689B2 (en) Efficient encrypted data management system and method
Maitri et al. Secure file storage in cloud computing using hybrid cryptography algorithm
CN110213354B (en) Cloud storage data confidentiality protection method
CN113641648B (en) Distributed cloud secure storage method, system and storage medium
CN107609418A (en) Desensitization method, device, storage device and the computer equipment of text data
CN102693398A (en) Data encryption method and system
Bala et al. Secure File Storage In Cloud Computing Using Hybrid Cryptography Algorithm.
CN113098843B (en) High-speed random sampling encryption method for geological and geographical big data
CN111310222A (en) File encryption method
Thakkar et al. A survey for comparative analysis of various cryptographic algorithms used to secure data on cloud
Zhang et al. A dynamic searchable symmetric encryption scheme for multiuser with forward and backward security
Mohd et al. Enhanced AES algorithm based on 14 rounds in securing data and minimizing processing time
CN104794243B (en) Third party's cipher text retrieval method based on filename
KAREEM Secure Cloud Approach Based on Okamoto-Uchiyama Cryptosystem.
CN112818404A (en) Data access permission updating method, device, equipment and readable storage medium
Hoang et al. A multi-server oblivious dynamic searchable encryption framework
Santos et al. Enhancing data security in cloud using random pattern fragmentation and a distributed nosql database
CN111798236A (en) Transaction data encryption and decryption method, device and equipment
Saini A survey on watermarking web contents for protecting copyright
CN115865461A (en) Method and system for distributing data in high-performance computing cluster
Sankari et al. Proposed iPrivacy based image encryption in mobile cloud
Lee et al. A study of practical proxy reencryption with a keyword search scheme considering cloud storage structure
Lanke et al. Cloud Cryptography: Mechanism of Different Encryption Standards

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant