CN113098843B - High-speed random sampling encryption method for geological and geographical big data - Google Patents
High-speed random sampling encryption method for geological and geographical big data Download PDFInfo
- Publication number
- CN113098843B CN113098843B CN202110249541.8A CN202110249541A CN113098843B CN 113098843 B CN113098843 B CN 113098843B CN 202110249541 A CN202110249541 A CN 202110249541A CN 113098843 B CN113098843 B CN 113098843B
- Authority
- CN
- China
- Prior art keywords
- data
- file
- block
- data block
- data blocks
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/04—Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks
- H04L63/0428—Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks wherein the data content is protected, e.g. by encrypting or encapsulating the payload
- H04L63/0435—Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks wherein the data content is protected, e.g. by encrypting or encapsulating the payload wherein the sending and receiving network entities apply symmetric encryption, i.e. same key used for encryption and decryption
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/29—Geographical information databases
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/04—Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks
- H04L63/0428—Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks wherein the data content is protected, e.g. by encrypting or encapsulating the payload
- H04L63/0457—Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks wherein the data content is protected, e.g. by encrypting or encapsulating the payload wherein the sending and receiving network entities apply dynamic encryption, e.g. stream encryption
Abstract
The invention provides a geological and geographical big data oriented high-speed random sampling encryption method, which is used for obtaining geological and geographical data files; dividing the file into data blocks with random sizes by a file random division module; randomly sampling the data blocks by a reservoir sampling method to obtain the sampled data blocks; encrypting the extracted data block by adopting an AES encryption method; encrypting the data blocks which are not extracted by adopting a ZUC stream cipher method; and storing the encrypted data block and the Hash value corresponding to the data block. The invention has the beneficial effects that: the method can realize the quick and efficient encryption of the big data file while ensuring the safety, and fills the gap that the traditional data encryption technology in the current market hardly meets the requirements of big data on the aspects of encryption speed, processing performance, safety and the like.
Description
Technical Field
The invention relates to the field of random sampling and data encryption, in particular to a high-speed random sampling encryption method for geological and geographical big data.
Background
The explosive growth of data is promoted by the high-speed development of technologies such as cloud computing, the internet of things and 5G. Big data brings convenience to production and life of people, but meanwhile, the problems of safety and privacy protection are increasingly highlighted. The value of geological and geographic big data is mainly embodied in data mining, and malicious and excessive data mining and uncontrolled data abuse can cause sensitive information to be leaked or private data to be stolen. Therefore, it becomes especially important to encrypt large data.
At present, part of traditional encryption algorithms are influenced by algorithm design and cannot be obviously improved in the aspect of encryption speed; secondly, the performance and load of a single computer almost reach the limit, and faster encryption performance cannot be obtained; finally, big data has the characteristics of large volume, diversification, high growth speed and the like, and a high-speed safe encryption method for the big data is rarely involved. At present, the market has few patent technologies for fast encryption of big data, wherein a patent "a big data encryption method" (201410258583.8) proposes a technology and a method for encrypting plaintext blocks and then secondarily encrypting intermediate plaintext, and a patent "hybrid encryption method and a device for implementing the method" (201510472098.5) proposes a technology and a method for encrypting message data by using an AES encryption algorithm and then encrypting a session key by using an SM2 encryption algorithm. The two technologies both adopt a secondary encryption method, which meets the security requirement of the big data file, but does not meet the requirement of the big data file on the transmission efficiency.
Disclosure of Invention
Aiming at the defects, the invention provides a high-speed random sampling encryption method and system for geological and geographical big data. The system gives consideration to algorithm security and encryption speed, and can improve the processing speed of the encryption scheme while ensuring the security of big data.
The invention provides a high-speed random sampling encryption method for geological and geographical big data, which specifically comprises the following steps:
s101: acquiring geological and geographic data files;
s102: dividing the file into data blocks with random sizes by a file random division module;
s103: randomly sampling the data blocks by a reservoir sampling method to obtain the sampled data blocks;
s104: encrypting the extracted data block by adopting an AES encryption method; encrypting the data blocks which are not extracted by adopting a ZUC stream cipher method;
s105: and storing the encrypted data block and the Hash value corresponding to the data block.
Further, the geological and geographic data files include pictures, tables and text formats.
Further, step S102 specifically includes:
s201: setting the size l of a basic block of a file through a file random segmentation module; l is used to determine the granularity of the block;
s202: randomly generating a variable parameter riContinuously dividing the file until the size of the last data block is smaller than l; the size of the data block is:
L(i)=l+rl
where i is the number corresponding to the data block.
S203: creating a corresponding Chunk Index and Chunk Size, and storing the generated data content and data Chunk Index parameters together with the corresponding file name; the block index stores the number i corresponding to the data block, and the block size stores the size L (i) of the data block corresponding to the block index.
Further, step S103 specifically includes:
s301: taking the first k data blocks in the data block and putting the data blocks into a reservoir; k is a preset value according to actual requirements;
s302: starting from j to k +1 data blocks, extracting a jth data block according to the probability of k/j, and if the jth element is selected, replacing any previously selected data block in the reservoir with equal probability until the whole data block is traversed;
s303: after the traversal is finished, the data blocks in the reservoir are the data blocks which are being extracted, and the rest are the data blocks which are not being extracted.
The process of decrypting the encrypted data specifically comprises the following steps:
s401: extracting the encrypted data block, and decrypting by adopting a decryption method corresponding to an AES encryption algorithm and a ZUC stream cipher encryption algorithm;
s402: splicing and recovering the decrypted data according to the stored Chunk Index and the Chunk Size to obtain a spliced file;
s403: and calculating the Hash value of the decrypted spliced file, verifying, and if the verification is passed, successfully decrypting the spliced file to obtain the original geological and geographic data file.
The beneficial effects provided by the invention are as follows: the invention gives consideration to the safety and the encryption performance of the algorithm, can realize the quick and efficient encryption of the big data file while ensuring the safety, solves the problems of the efficiency and the safety of the big data encryption to a certain extent, and has certain application prospect and engineering value.
Drawings
FIG. 1 is a flow chart of a real-time detection method of the present invention;
FIG. 2 is a schematic diagram of a large data file being randomly partitioned;
FIG. 3 is a schematic diagram of reservoir sampling of a partitioned file;
FIG. 4 is a schematic diagram of AES encryption algorithm encryption and ZUC stream cipher algorithm encryption of a file block;
fig. 5 is a schematic diagram of decrypting an encrypted file.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention will be further described with reference to the accompanying drawings.
A high-speed random sampling encryption method for geological and geographical big data comprises the following steps:
s101: acquiring geological and geographic data files; the geological and geographic data files comprise pictures, tables and text formats;
s102: dividing the file into data blocks with random sizes by a file random division module;
step S102 specifically includes:
s201: setting the size l of a basic block of a file through a file random segmentation module; l is used to determine the granularity of the block; preferably, the setting can be made according to the security level, for example, l can be: 32KB, 48KB, 64 KB.; the larger l, the lower the security level;
s202: randomly generating a variable parameter riContinuously dividing the file until the size of the last data block is smaller than l; the size of each data block is:
L(i)=l+ri
wherein i is a number corresponding to the data block; the size of the last data block is LD ═ length- Σ l (i); length is the total size of the file; because the size of the file and the size of the data block are not necessarily exactly in integer division relationship, when the size of the data block cannot be divided into the size of the file, the last remaining file with the length less than l is treated as a file block, and the random parameter r is used for treating the file blockiThe final file block number n is made to have randomness;
referring to FIG. 2, FIG. 2 is a hypothetical source big dataThe file length is length, and when the variable r in the figureiWhen the value of (1) is constantly equal to 0, the partitioning strategy is a fixed-length partition. The fixed-length block strategy divides a file into blocks with fixed length, namely the number n of the file blocks is length//, wherein/is the size of the fixed-length blocks, but the block strategy enables an attacker to attack a secret key by executing a plaintext encryption process for multiple times, so that the safety of high-speed random sampling encryption of big data is reduced; if in FIG. 2 riIf the value of the key is not equal to 0, the blocking strategy is a variable-length blocking strategy, and the large-data variable-length blocking can divide the file into data blocks with different sizes, so that the randomness component of the high-speed random sampling encryption scheme is increased, and the difficulty of an attacker in attacking the key is improved.
S203: creating a corresponding Chunk Index and Chunk Size, and storing the generated data content and data Chunk Index parameters together with the corresponding file name; the block index stores the number i corresponding to the data block, and the block size stores the size L (i) of the data block corresponding to the block index.
S103: randomly sampling the data blocks by a reservoir sampling method to obtain the sampled data blocks;
step S103 specifically includes:
s301: taking the first k data blocks in the data block and putting the data blocks into a reservoir; k is a preset value according to actual requirements;
s302: starting from j to k +1 data blocks, extracting a jth data block according to the probability of k/j, and if the jth element is selected, replacing any previously selected data block in the reservoir with equal probability until the whole data block is traversed;
s303: after the traversal is finished, the data blocks in the reservoir are the data blocks which are being extracted, and the rest are the data blocks which are not being extracted.
Referring to fig. 3, according to the user security requirement, an extraction ratio p is set to perform sampling (the actual extracted data block number k is the file block number n × p), and the variable length block algorithm and the reservoir sampling algorithm are performed simultaneously, so that the large data file can be read only once to complete all operations. The scheme can be set according to the security level of the big dataThe flexibility of a high-speed random sampling encryption scheme is improved due to different sampling proportions; and the data blocks extracted through the reservoir are mutually independent, so that the attack difficulty of an attacker is increased, and the confidentiality of the source big data file after segmentation is ensured to a certain degree. In FIG. 3, A1、A3、A5、A7The data blocks which are not extracted are obtained; a. the2、A4、A6、A8Is the data block being extracted;
s104: encrypting the extracted data block by adopting an AES encryption method; encrypting the data blocks which are not extracted by adopting a ZUC stream cipher method;
referring to fig. 4, for a large data file to be encrypted, the file is first segmented by a variable length blocking algorithm, then the segmented data blocks are sampled by a reservoir sampling algorithm, the sampled data blocks are encrypted by an AES encryption algorithm, the data blocks that are not sampled are encrypted by a ZUC stream cipher algorithm, and the encrypted data blocks and a Hash value of a source file are respectively stored. In FIG. 4, "" A "1、A3、A5、A7Encrypting by adopting a ZUC stream cipher method; a. the2、A4、A6、A8Encrypting by adopting an AES encryption method;
s105: and storing the encrypted data block and the Hash value corresponding to the data block.
The process of decrypting the encrypted data specifically comprises the following steps:
s401: extracting the encrypted data block, and decrypting by adopting a decryption method corresponding to an AES encryption algorithm and a ZUC stream cipher encryption algorithm;
s402: splicing and recovering the decrypted data according to the stored Chunk Index and the Chunk Size to obtain a spliced file;
s403: and calculating the Hash value of the decrypted spliced file, verifying, and if the verification is passed, successfully decrypting the spliced file to obtain the original geological and geographic data file.
Referring to fig. 5, when a file needs to be decrypted, the data blocks stored in different containers are first extracted, then decrypted by using the advanced encryption algorithm and the stream cipher encryption algorithm, and the decrypted data blocks are spliced by combining the data block index parameters stored in the system. And finally, calculating the Hash value of the decrypted file, and verifying the integrity of the data. The advanced encryption algorithm AES and the fast stream cipher encryption algorithm ZUC are used for encrypting the data blocks which are extracted and not extracted respectively, so that the speed of data block encryption processing is accelerated, the randomness of data block encryption is further increased on the basis of introducing the random sampling algorithm of the water storage pool, and the safety of large data files is ensured.
The beneficial effects provided by the invention are as follows: the invention provides a big data-oriented high-speed random sampling encryption method and system aiming at the function of safe and rapid encryption of less big data in the domestic market. Through the mode, the method and the device can realize the quick and efficient encryption of the big data file while ensuring the safety, and fill the vacancy that the traditional data encryption technology in the current market is difficult to meet the requirements of big data on encryption speed, processing performance, safety and the like.
The invention has the advantages that: firstly, by setting random parameters, the variable-length blocking of the file is realized, and the number of the blocks of the file has randomness; secondly, randomly sampling by adopting a reservoir sampling algorithm, encrypting the sampled file blocks by adopting an advanced encryption algorithm, and encrypting the rest file blocks by adopting a rapid stream encryption algorithm, wherein the number of the sampled file blocks is relatively small, and most of the file blocks are encrypted by the rapid stream encryption algorithm, so that the encryption efficiency is greatly improved, and meanwhile, the difficulty of cracking the file which wants to obtain a big data file is greatly increased due to the randomness of the sampling, the randomness of the file blocks and the adoption of the advanced encryption algorithm; in addition, the user can set the sampling number of the water reservoir sampling algorithm according to the required safety level, so that the system can adapt to various scenes in reality, meet various requirements of the user and have the characteristic of humanization.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.
Claims (2)
1. A high-speed random sampling encryption method for geological and geographical big data is characterized by comprising the following steps: the method specifically comprises the following steps:
s101: acquiring geological and geographic data files;
s102: dividing the file into data blocks with random sizes by a file random division module;
s103: randomly sampling the data blocks by a reservoir sampling method to obtain the sampled data blocks;
s104: encrypting the extracted data block by adopting an AES encryption method; encrypting the data blocks which are not extracted by adopting a ZUC stream cipher method;
s105: storing the encrypted data block and the Hash value corresponding to the data block;
step S102 specifically includes:
s201: setting the size l of a basic block of a file through a file random segmentation module; l is used to determine the granularity of the block;
s202: randomly generating a variable parameter riContinuously dividing the file until the size of the last data block is smaller than l; the size of the data block is:
L(i)=l+ri
wherein i is a number corresponding to the data block;
s203: creating a corresponding Chunk Index and Chunk Size, and storing the generated data content and data Chunk Index parameters together with the corresponding file name; wherein, the block index stores the serial number i corresponding to the data block, and the block size stores the size L (i) of the data block corresponding to the block index;
step S103 specifically includes:
s301: taking the first k data blocks in the data block and putting the data blocks into a reservoir; k is a preset value according to actual requirements;
s302: starting from j to k +1 data blocks, extracting a jth data block according to the probability of k/j, and if the jth element is selected, replacing any previously selected data block in the reservoir with equal probability until the whole data block is traversed;
s303: after traversing, the data blocks in the reservoir are the data blocks which are extracted, and the rest are the data blocks which are not extracted;
the process of decrypting the encrypted data specifically comprises the following steps:
s401: extracting the encrypted data block, and decrypting by adopting a decryption method corresponding to an AES encryption algorithm and a ZUC stream cipher encryption algorithm;
s402: splicing and recovering the decrypted data according to the stored Chunk Index and the Chunk Size to obtain a spliced file;
s403: and calculating the Hash value of the decrypted spliced file, verifying, and if the verification is passed, successfully decrypting the spliced file to obtain the original geological and geographic data file.
2. The geological and geographical big data oriented high-speed random sampling encryption method as claimed in claim 1, characterized in that: the geological and geographic data files include pictures, tables and text formats.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110249541.8A CN113098843B (en) | 2021-03-08 | 2021-03-08 | High-speed random sampling encryption method for geological and geographical big data |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110249541.8A CN113098843B (en) | 2021-03-08 | 2021-03-08 | High-speed random sampling encryption method for geological and geographical big data |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113098843A CN113098843A (en) | 2021-07-09 |
CN113098843B true CN113098843B (en) | 2022-06-14 |
Family
ID=76667752
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110249541.8A Active CN113098843B (en) | 2021-03-08 | 2021-03-08 | High-speed random sampling encryption method for geological and geographical big data |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113098843B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117560233B (en) * | 2024-01-12 | 2024-04-05 | 深圳市金飞杰信息技术服务有限公司 | Method and system based on data interaction encryption |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104205117A (en) * | 2014-04-10 | 2014-12-10 | 华为技术有限公司 | Device file encryption and decryption method and device |
CN105260668A (en) * | 2015-10-10 | 2016-01-20 | 北京搜狗科技发展有限公司 | File encryption method and electronic device |
CN106788982A (en) * | 2017-02-22 | 2017-05-31 | 郑州云海信息技术有限公司 | A kind of sectional encryption transmission method and device |
WO2017166856A1 (en) * | 2016-03-31 | 2017-10-05 | 北京金山安全软件有限公司 | Method, device and equipment for file encryption |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108551434B (en) * | 2015-08-26 | 2019-04-12 | 华为技术有限公司 | The method and apparatus for transmitting HE-LTF sequence |
-
2021
- 2021-03-08 CN CN202110249541.8A patent/CN113098843B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104205117A (en) * | 2014-04-10 | 2014-12-10 | 华为技术有限公司 | Device file encryption and decryption method and device |
CN105260668A (en) * | 2015-10-10 | 2016-01-20 | 北京搜狗科技发展有限公司 | File encryption method and electronic device |
WO2017166856A1 (en) * | 2016-03-31 | 2017-10-05 | 北京金山安全软件有限公司 | Method, device and equipment for file encryption |
CN106788982A (en) * | 2017-02-22 | 2017-05-31 | 郑州云海信息技术有限公司 | A kind of sectional encryption transmission method and device |
Also Published As
Publication number | Publication date |
---|---|
CN113098843A (en) | 2021-07-09 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109474423B (en) | Data encryption and decryption method, server and storage medium | |
US11709948B1 (en) | Systems and methods for generation of secure indexes for cryptographically-secure queries | |
US11637689B2 (en) | Efficient encrypted data management system and method | |
Maitri et al. | Secure file storage in cloud computing using hybrid cryptography algorithm | |
CN110213354B (en) | Cloud storage data confidentiality protection method | |
CN113641648B (en) | Distributed cloud secure storage method, system and storage medium | |
CN107609418A (en) | Desensitization method, device, storage device and the computer equipment of text data | |
CN102693398A (en) | Data encryption method and system | |
Bala et al. | Secure File Storage In Cloud Computing Using Hybrid Cryptography Algorithm. | |
CN113098843B (en) | High-speed random sampling encryption method for geological and geographical big data | |
CN111310222A (en) | File encryption method | |
Thakkar et al. | A survey for comparative analysis of various cryptographic algorithms used to secure data on cloud | |
Zhang et al. | A dynamic searchable symmetric encryption scheme for multiuser with forward and backward security | |
Mohd et al. | Enhanced AES algorithm based on 14 rounds in securing data and minimizing processing time | |
CN104794243B (en) | Third party's cipher text retrieval method based on filename | |
KAREEM | Secure Cloud Approach Based on Okamoto-Uchiyama Cryptosystem. | |
CN112818404A (en) | Data access permission updating method, device, equipment and readable storage medium | |
Hoang et al. | A multi-server oblivious dynamic searchable encryption framework | |
Santos et al. | Enhancing data security in cloud using random pattern fragmentation and a distributed nosql database | |
CN111798236A (en) | Transaction data encryption and decryption method, device and equipment | |
Saini | A survey on watermarking web contents for protecting copyright | |
CN115865461A (en) | Method and system for distributing data in high-performance computing cluster | |
Sankari et al. | Proposed iPrivacy based image encryption in mobile cloud | |
Lee et al. | A study of practical proxy reencryption with a keyword search scheme considering cloud storage structure | |
Lanke et al. | Cloud Cryptography: Mechanism of Different Encryption Standards |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |