CN111291046A - Computer big data storage control system and method - Google Patents
Computer big data storage control system and method Download PDFInfo
- Publication number
- CN111291046A CN111291046A CN202010046920.2A CN202010046920A CN111291046A CN 111291046 A CN111291046 A CN 111291046A CN 202010046920 A CN202010046920 A CN 202010046920A CN 111291046 A CN111291046 A CN 111291046A
- Authority
- CN
- China
- Prior art keywords
- data
- module
- information
- file
- storage
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/22—Indexing; Data structures therefor; Storage structures
- G06F16/2219—Large Object storage; Management thereof
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/60—Protecting data
- G06F21/602—Providing cryptographic facilities or services
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/60—Protecting data
- G06F21/64—Protecting data integrity, e.g. using checksums, certificates or signatures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/04—Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks
- H04L63/0428—Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks wherein the data content is protected, e.g. by encrypting or encapsulating the payload
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/12—Applying verification of the received information
- H04L63/123—Applying verification of the received information received data contents, e.g. message integrity
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/06—Protocols specially adapted for file transfer, e.g. file transfer protocol [FTP]
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L9/00—Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
- H04L9/08—Key distribution or management, e.g. generation, sharing or updating, of cryptographic keys or passwords
- H04L9/0861—Generation of secret information including derivation or calculation of cryptographic keys or passwords
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Computer Security & Cryptography (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Software Systems (AREA)
- General Physics & Mathematics (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Computer Networks & Wireless Communication (AREA)
- Computer Hardware Design (AREA)
- General Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- Data Mining & Analysis (AREA)
- Biophysics (AREA)
- Evolutionary Computation (AREA)
- Molecular Biology (AREA)
- Computational Linguistics (AREA)
- Mathematical Physics (AREA)
- Biomedical Technology (AREA)
- Bioethics (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Databases & Information Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention belongs to the technical field of computer application, and discloses a computer big data storage control system and a method, wherein a key parameter generation module is adopted to generate an encryption key through a Keyall algorithm according to input security parameters, and tag information is generated for a storage file; selecting a corresponding storage mode by adopting a data storage module according to the reading frequency or the file size of the data file; and the integrity verification module is adopted to send a detection request to the server, the server calculates through the label information and the request information to obtain detection information, and the server detects the storage result through the detection information and the label information. The data storage module in the invention improves the data processing capability based on the storage strategy of redundancy before coding, and can ensure that the requirement of the generation quantity of the well-jet data at the present stage is met. The integrity verification module can verify whether the stored data is complete, and the stored data is prevented from being deleted, lost, tampered and the like.
Description
Technical Field
The invention belongs to the technical field of computer application, and particularly relates to a computer big data storage control system and method.
Background
Currently, the closest prior art: today, with the rapid development of information technology, the daily activities of people generate a great deal of data information. Under the condition of daily accumulation, the data volume is larger and larger, the data types are more and more, and the large data is formed. The traditional computer information processing technology has limited data which can be processed, and cannot meet the requirement of the data generation quantity of the well injection type at present.
The method has the advantages that the computer information processing technology is innovated, the effectiveness of the computer information processing technology is improved, and data can be better collected, sorted, processed and applied. In order to ensure the application efficiency of the computer information processing technology, data needs to be classified in advance. Meanwhile, in the process of applying the computer information processing technology, the safety of information processing needs to be improved. And the safety of data processing can be ensured only by ensuring the advancement of computer information processing technology. Moreover, a certain relation exists between the massive data and different types of data information. How to ensure the storage safety of information in the data processing process is a difficulty. The function of data information can be fully played only by continuously analyzing and upgrading the information security technology, and the integrity and stability of data are ensured.
In summary, the problems of the prior art are as follows:
(1) the data that current computer information processing technique can handle is limited, can't satisfy the data production quantity of present stage well-jet formula at all.
(2) The existing computer information processing technology cannot ensure the safety of data processing.
Disclosure of Invention
Aiming at the problems in the prior art, the invention provides a computer big data storage control system and a computer big data storage control method.
The invention is realized in this way, a computer big data storage control system includes:
the data acquisition module is connected with the central main control module, acquires data uploaded by a user through a computer network terminal, and transmits the data to the data coding module; establishing a corresponding sample for the acquired data, and preprocessing the data in the sample; after the preprocessing is finished, carrying out compression transmission on corresponding data; in the data preprocessing process, the missing data processing process comprises the following steps: deleting the missing samples by using corresponding deletion functions; after the deletion is completed, assigning a missing value to replace the missing value; randomly simulating the completed data set, storing the data set into imp, and performing linear regression on the imp;
the data coding module is connected with the central main control module, calculates a coded array according to the parameters of the configuration center by the acquired data information, and performs coding storage according to the array; selecting a proper neural network model, and extracting corresponding data characteristics from the input data information; establishing a corresponding multilayer neural network according to the extracted data characteristic information; each layer of neural network trains the whole deep neural network by using a corresponding training method; calculating parameters of the neural network of the first layer, and taking the output of the hidden layer of the neural network of the first layer as the input of the next layer; continuously repeating the above process, and training the network parameters of each layer in sequence to realize the coding of data;
the key parameter generation module is connected with the central main control module, generates an encryption key through a Keyall algorithm according to the input security parameters, and generates label information for the storage file; the user inputs the security parameters to generate a key pair and an encryption key, the generated public key is public and is used for generating label information for the file storing the result, and the private key is stored by the user; when the encryption key is generated by the Keyall algorithm, two strong prime numbers p and q are generated firstly and then calculated: m ═ pq, f (M) ═ p-1 (q-1), then, an odd number a is generated, a belongs to a positive integer, so that G (a, f (M) ═ 1, then p and q are regarded as private keys, and a and M are regarded as public keys;
the data storage module is connected with the central main control module and selects a corresponding storage mode according to the reading frequency or the file size of the data file; the data storage module stores data files by adopting a storage strategy of redundancy first and coding later, small files and frequently used large files are stored in a redundancy backup mode, and large files which are not used for a long time are stored in an RS (Reed-Solomon) coding mode;
the integrity verification module is connected with the central main control module and used for sending a detection request to the server, and the server calculates through the label information and the request information to obtain detection information; the server detects the storage result through the detection information and the label information; the user sends the key information to the server to provide a detection request; the server calculates the stored data according to the detection request of the user; after receiving the information returned by the server, the user decrypts the information; the user verifies the returned information to verify whether the returned information is complete, if the returned information is complete, the verification is successful, and the big data storage result is correct; otherwise, if the big data storage result is wrong, the storage results need to be verified one by one to find out the wrong big data storage result;
the data recovery module is connected with the central main control module and used for recovering data according to the feedback information, the failure node information, the data recovery related parameters and the corresponding position information of the analysis failure node in the array of the integrity verification module; monitoring the time for sending data information between the master node and the slave node, and if the return information of the node is not received within the set time, judging that the node is invalid; reading parameters related to data recovery in the configuration center, analyzing the corresponding position of a lost node or a block in the coding array according to the node failure information or the block failure information and the related parameters of the data recovery, and sending the position to a decoding unit; reading related load balancing parameters in the configuration center, and selecting a new node list according to the parameters and the load state of each node; selecting a decoding scheme according to the corresponding position of the lost node or the block in the coding array, and reading the residual block data required in the server; decoding calculation is carried out according to the residual block data to obtain the data of the lost blocks, and the recovered blocks are stored in the new nodes in the server according to the selected new node list;
the configuration module is connected with the central main control module and is used for presetting and configuring various parameters in the system and extracting corresponding configuration information according to the control instruction;
the data management module is connected with the central main control module and is used for adding, deleting, modifying and backing up the stored data content;
the data classification module is connected with the central main control module and classifies the stored contents by using a data classification method;
the query module is connected with the central main control module and searches corresponding contents through voice input or keyboard input;
the central main control module is respectively connected with the data acquisition module, the data coding module, the key parameter generation module, the data storage module, the integrity verification module, the data recovery module, the configuration module, the data management module, the data classification module, the query module, the wireless signal transceiving module and the cloud server and is used for coordinating the normal operation of each module;
the wireless signal transceiver module is connected with the central main control module and is connected with the cloud server through the wireless signal transceiver to realize data transmission;
and the cloud server is connected with the central main control module, and the host service configuration and the service scale can be configured according to the needs of users and are used for realizing data sharing.
Further, the data management module includes:
the data management system comprises an adding and deleting module, a data processing module and a data processing module, wherein the adding and deleting module inputs a corresponding deleting or adding instruction according to the user requirement, and the data management system deletes or adds corresponding contents;
the modification module inputs a corresponding modification command according to the user requirement, and the data management system modifies the corresponding content;
and the backup module monitors and tracks the data uploaded by the user and the update of the important target file to be tracked, transmits the update log to the backup system in real time through a network, and the backup system updates the disk according to the log.
Another object of the present invention is to provide a computer big data storage control method of the computer big data storage control system, the computer big data storage control method comprising:
the method comprises the following steps that firstly, a data acquisition module acquires data uploaded by a user through a computer network terminal; classifying the stored content by a data classification module by using a data classification method;
step two, after the data classification is finished, the data are transmitted to a data coding module; the data information acquired by the data coding module calculates a coded array according to the parameters of the configuration center and codes according to the array; after the coding is finished, the data storage module selects a corresponding storage mode for storage according to the reading frequency or the file size of the data file;
step three, the integrity verification module sends a detection request to the server, and the server calculates through the label information and the request information to obtain detection information; the server detects the storage result through the detection information and the label information; the data recovery module recovers the data according to the feedback information, the invalid node information, the data recovery related parameters and the corresponding position information of the analytic invalid node in the array of the integrity verification module;
after the data recovery is finished, a key parameter generation module generates an encryption key through a Keyall algorithm according to the input security parameters, and generates label information for the storage file;
step five, in the storage process, the configuration module performs preset configuration on various parameters in the system and extracts corresponding configuration information according to the control instruction; the data management module performs addition, deletion, modification and backup on the stored data content; meanwhile, corresponding contents are searched by a query module through voice input or keyboard input;
step six, the wireless signal transceiver module is connected with the cloud server through the wireless signal transceiver to realize data transmission; the host service configuration and the business scale in the cloud server can be configured according to the needs of users and are used for realizing data sharing.
Further, in the first step, the data obtaining module processes the obtained data as follows:
establishing a corresponding sample for the acquired data, and preprocessing the data in the sample; after the preprocessing is finished, carrying out compression transmission on corresponding data;
in the data preprocessing process, the missing data processing process comprises the following steps:
deleting the missing samples by using corresponding deletion functions; after the deletion is completed, assigning a missing value to replace the missing value; and (4) randomly simulating the completed data set, storing the data set into imp, and performing linear regression on the imp.
Further, in the second step, the process of encoding the data by the data encoding module is as follows:
selecting a proper neural network model, and extracting corresponding data characteristics from the input data information;
establishing a corresponding multilayer neural network according to the extracted data characteristic information; each layer of neural network trains the whole deep neural network by using a corresponding training method;
calculating parameters of the neural network of the first layer, and taking the output of the hidden layer of the neural network of the first layer as the input of the next layer;
and continuously repeating the process, and training the network parameters of each layer in sequence to realize the coding of the data.
Further, in the second step, the data storage module stores the data files by adopting a storage strategy of redundancy first and coding second, and for small files and frequently used large files, a redundancy backup mode is adopted for storage, and for large files which are not used for a long time, an RS coding mode is adopted for storage.
Further, the redundancy-first and coding-second storage strategy specifically includes:
when a certain file is uploaded to a server, storing the file according to a redundant backup mode, newly adding 'latest reading time' in file meta information and setting the latest reading time as a current timestamp;
the server checks the 'file size' and the 'latest reading time' in each file meta information, skips files which are already stored by RS encoding and files with the size less than 100MB, and regarding files with the file size more than 100MB, if the time for reading the file last time is within 3 days from the moment, the file is considered as hot data, and skips; otherwise, judging that the File is not used for a long time, performing RS coding storage on the File, and deleting the previous redundant backup of the File;
when the read file is stored in a redundant backup mode, updating the latest reading time;
when the read file is stored according to the RS coding mode, if the file is intact, no operation is performed; if the file is damaged, RS decoding is carried out on the residual data blocks of the file to obtain source data, and the restored source data are stored again according to a redundant backup mode.
Further, in the third step, the specific detection steps adopted by the integrity verification module include:
firstly, a user sends key information to a server to provide a detection request;
secondly, the server calculates the stored data according to the detection request of the user;
thirdly, after receiving the information returned by the server, the user decrypts the information;
fourthly, the user verifies the returned information to verify whether the returned information is complete, if the returned information is complete, the verification is successful, and the big data storage result is correct; otherwise, if the big data storage result is wrong, the storage results need to be verified one by one to find out the wrong big data storage result;
the algorithm is as follows: inputting a data file H to be detected, and selecting a file block m in the data file Hi(1<i<n), detecting the random number r to generate a random number r; then, a is calculatedr=arnod M, calculation of detection data 1:secondly, for file block mi(1<i<n) corresponding tag information TiIs selected, andcomputingFinally, the detection data is calculated 2: r ═ SrnodM, and verifying whether R and R' are equal; if the two are equal, the T is returned, otherwise, the F is returned.
Further, in the third step, the data recovery process adopted by the data recovery module specifically includes:
monitoring the time for sending data information between the master node and the slave node, and if the return information of the node is not received within the set time, judging that the node is invalid;
reading parameters related to data recovery in the configuration center, analyzing the corresponding position of a lost node or a block in the coding array according to the node failure information or the block failure information and the related parameters of the data recovery, and sending the position to a decoding unit;
reading related load balancing parameters in the configuration center, and selecting a new node list according to the parameters and the load state of each node;
selecting a decoding scheme according to the corresponding position of the lost node or the block in the coding array, and reading the residual block data required in the server;
and performing decoding calculation according to the residual block data to obtain the data of the lost blocks, and storing the recovered blocks in the new nodes in the server according to the selected new node list.
Further, in the fourth step, the key parameter generating module specifically includes:
the user inputs the security parameters to generate a key pair and an encryption key, the generated public key is public and is used for generating label information for the file storing the result, and the private key is stored by the user;
when the encryption key is generated by the Keyall algorithm, two strong prime numbers p and q are generated firstly and then calculated: m ═ pq, f (M) ═ p-1 (q-1), then, an odd number a is generated, a belongs to a positive integer, so that G (a, f (M) ═ 1, then p and q are regarded as private keys, and a and M are regarded as public keys.
In summary, the advantages and positive effects of the invention are: the data storage module in the invention adopts a corresponding storage mode selected according to the reading frequency or the file size of the data file, and improves the data processing capability based on a storage strategy of redundancy before coding, thereby ensuring that the requirement of the well-spraying type data generation quantity at the present stage can be met. The integrity verification module can be used for verifying whether the stored data is complete or not, so that the stored data is prevented from being deleted, lost, tampered and the like, and the safety of the stored data is ensured.
The data information obtained by the invention calculates the coded array according to the parameters of the configuration center, and the data coding module which codes and stores the data according to the array performs neural network coding on the data, so that the method has good fault tolerance, self-organization and self-adaptability; in the data compression process, the neural network can autonomously complete image coding and compression according to information characteristics without the aid of a certain predetermined data coding algorithm. Meanwhile, in the data preprocessing process, the method for deleting the data can ensure the authenticity and reliability of the data.
Drawings
Fig. 1 is a schematic structural diagram of a computer big data storage control system according to an embodiment of the present invention.
Fig. 2 is a schematic structural diagram of a data management module according to an embodiment of the present invention.
FIG. 3 is a flowchart of a method for controlling big data storage of a computer according to an embodiment of the present invention.
Fig. 4 is a schematic diagram illustrating comparison results of detection execution times of the integrity verification module according to the embodiment of the present invention.
In the figure: 1. a data acquisition module; 2. a data encoding module; 3. a key parameter generation module; 4. a data storage module; 5. an integrity verification module; 6. a data recovery module; 7. a configuration module; 8. a data management module; 9. a data classification module; 10. a query module; 11. a central main control module; 12. a wireless signal transceiving module; 13. and (4) a cloud server.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail with reference to the following embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
As shown in fig. 1, a computer big data storage control system provided by an embodiment of the present invention includes:
and the data acquisition module 1 is connected with the central main control module 11, acquires data uploaded by a user through a computer network terminal, and transmits the data to the data coding module.
And the data coding module 2 is connected with the central main control module 11, and the obtained data information calculates a coded array according to the parameters of the configuration center and performs coding storage according to the array.
And the key parameter generation module 3 is connected with the central main control module 11, generates an encryption key through a Keyall algorithm according to the input security parameters, and generates label information for the storage file.
And the data storage module 4 is connected with the central main control module 11 and selects a corresponding storage mode according to the reading frequency or the file size of the data file.
The integrity verification module 5 is connected with the central main control module 11 and used for sending a detection request to the server, and the server calculates through the label information and the request information to obtain detection information; and the server detects the storage result through the detection information and the label information.
And the data recovery module 6 is connected with the central main control module 11 and is used for recovering data according to the feedback information, the failure node information, the data recovery related parameters and the corresponding position information of the analysis failure node in the array of the integrity verification module.
And the configuration module 7 is connected with the central main control module 11 and is used for presetting and configuring various parameters in the system and extracting corresponding configuration information according to the control instruction.
And the data management module 8 is connected with the central main control module 11 and is used for adding, deleting, modifying and backing up the stored data content.
And the data classification module 9 is connected with the central main control module 11 and classifies the stored contents by using a data classification method.
And the query module 10 is connected with the central main control module 11 and searches corresponding contents through voice input or keyboard input.
The central main control module 11 is respectively connected with the data acquisition module 1, the data encoding module 2, the key parameter generation module 3, the data storage module 4, the integrity verification module 5, the data recovery module 6, the configuration module 7, the data management module 8, the data classification module 9, the query module 10, the wireless signal transceiving module 12 and the cloud server 13, and is used for coordinating normal operation of each module.
And the wireless signal transceiver module 12 is connected with the central main control module 11 and is connected with the cloud server through a wireless signal transceiver to realize data transmission.
And the cloud server 13 is connected with the central main control module 11, and the host service configuration and the business scale can be configured according to the needs of users and are used for realizing data sharing.
As shown in fig. 2, the data management module provided in the embodiment of the present invention includes:
and the adding and deleting module is used for inputting a corresponding deleting or adding instruction according to the user requirement, and the data management system deletes or adds the corresponding content.
And the modification module inputs a corresponding modification command according to the user requirement, and the data management system modifies the corresponding content.
And the backup module monitors and tracks the data uploaded by the user and the update of the important target file to be tracked, transmits the update log to the backup system in real time through a network, and the backup system updates the disk according to the log.
The process of the data coding module 2 connected with the central main control module 11, which is provided by the embodiment of the invention, calculating a coded array according to the parameters of the configuration center by the acquired data information, and coding and storing the data according to the array comprises the following steps:
selecting a proper neural network model, and extracting corresponding data characteristics from the input data information;
establishing a corresponding multilayer neural network according to the extracted data characteristic information; each layer of neural network trains the whole deep neural network by using a corresponding training method;
calculating parameters of the neural network of the first layer, and taking the output of the hidden layer of the neural network of the first layer as the input of the next layer;
and continuously repeating the process, and training the network parameters of each layer in sequence to realize the coding of the data.
The data acquisition module 1 connected to the central main control module 11 and acquiring data uploaded by a user through a computer network terminal according to the embodiment of the present invention processes the acquired data as follows:
establishing a corresponding sample for the acquired data, and preprocessing the data in the sample; and after the preprocessing is finished, performing compression transmission on corresponding data.
In the data preprocessing process, the missing data processing process comprises the following steps:
deleting the missing samples by using corresponding deletion functions; after the deletion is completed, assigning a missing value to replace the missing value; and (4) randomly simulating the completed data set, storing the data set into imp, and performing linear regression on the imp.
The key parameter generation module 3 provided by the embodiment of the present invention specifically includes:
the user inputs the security parameters to generate a key pair and an encryption key, the generated public key is public and is used for generating label information for the file storing the result, and the private key is stored by the user;
when the encryption key is generated by the Keyall algorithm, two strong prime numbers p and q are generated firstly and then calculated: m ═ pq, f (M) ═ p-1 (q-1), then, an odd number a is generated, a belongs to a positive integer, so that G (a, f (M) ═ 1, then p and q are regarded as private keys, and a and M are regarded as public keys.
The data storage module 4 provided by the embodiment of the invention stores the data files by adopting a storage strategy of firstly redundancy and then coding, and stores small files and frequently used large files in a redundancy backup mode, and stores large files which are not used for a long time in an RS coding mode.
The redundancy-before-encoding and coding-after storage strategy specifically comprises:
when a certain file is uploaded to a server, storing the file according to a redundant backup mode, newly adding 'latest reading time' in file meta information and setting the latest reading time as a current timestamp;
the server checks the 'file size' and the 'latest reading time' in each file meta information, skips files which are already stored by RS encoding and files with the size less than 100MB, and regarding files with the file size more than 100MB, if the time for reading the file last time is within 3 days from the moment, the file is considered as hot data, and skips; otherwise, judging that the File is not used for a long time, performing RS coding storage on the File, and deleting the previous redundant backup of the File;
when the read file is stored in a redundant backup mode, updating the latest reading time;
and when the read file is stored according to the RS coding mode, if the file is intact, no operation is performed. If the file is damaged, RS decoding is carried out on the residual data blocks of the file to obtain source data, and the restored source data are stored again according to a redundant backup mode.
The integrity verification module 5 provided by the embodiment of the invention adopts specific detection steps including:
firstly, a user sends key information to a server to provide a detection request;
secondly, the server calculates the stored data according to the detection request of the user;
thirdly, after receiving the information returned by the server, the user decrypts the information;
fourthly, the user verifies the returned information to verify whether the returned information is complete, if the returned information is complete, the verification is successful, and the big data storage result is correct; otherwise, if the big data storage result is wrong, the storage results need to be verified one by one to find out the wrong big data storage result.
The algorithm is as follows: inputting a data file H (the data file is integrity) to be detected, and selecting a file block m in the data file Hi(1<i<n), detecting the random number r to generate a random number r; then, a is calculatedr=arnod M, calculation of detection data 1:secondly, for file block mi(1<i<n) corresponding tag information TiSelecting and calculatingFinally, the detection data is calculated 2: r ═ SrnodM, and verify that R and R' are equal. When the two are equal, the T is returned (namely, the data is complete), and otherwise, the F is returned (namely, the data is incomplete). Through the realization of the algorithm, the storage result of the big data can be detected based on the integrity verification.
The data recovery process adopted by the data recovery module 6 provided by the embodiment of the present invention specifically includes:
monitoring the time for sending data information between the master node and the slave node, and if the return information of the node is not received within the set time, judging that the node is invalid;
reading parameters related to data recovery in the configuration center, analyzing the corresponding position of a lost node or a block in the coding array according to the node failure information or the block failure information and the related parameters of the data recovery, and sending the position to a decoding unit;
reading related load balancing parameters in the configuration center, and selecting a new node list according to the parameters and the load state of each node;
selecting a decoding scheme according to the corresponding position of the lost node or the block in the coding array, and reading the residual block data required in the server;
and performing decoding calculation according to the residual block data to obtain the data of the lost blocks, and storing the recovered blocks in the new nodes in the server according to the selected new node list.
In order to verify whether the performance of the integrity verification module 5 is good, based on the PBC database, an udubm4.16 system is adopted, a CPU is i7-4600M, 2.50GHz, a memory is 8GB, an adopted programming language is C, the scale of a large data storage result is 1-5G, a detection efficiency comparison experiment is performed by respectively adopting a detection system provided by the integrity verification module 5 and a conventional detection system, and fig. 4 is a schematic diagram of comparison results of detection execution times of the integrity verification module provided by the embodiment of the present invention.
As shown in fig. 3, a method for controlling a computer big data storage device according to an embodiment of the present invention includes:
s101: the data acquisition module acquires data uploaded by a user through a computer network terminal; and classifying the stored content by a data classification module by using a data classification method.
S102: after the data classification is finished, the data are transmitted to a data coding module; the data information acquired by the data coding module calculates a coded array according to the parameters of the configuration center and codes according to the array; after the coding is finished, the data storage module selects a corresponding storage mode for storage according to the reading frequency or the file size of the data file.
S103: the integrity verification module sends a detection request to the server, and the server calculates through the label information and the request information to obtain detection information; the server detects the storage result through the detection information and the label information; and the data recovery module performs data recovery according to the feedback information, the failure node information, the data recovery related parameters and the corresponding position information of the analysis failure node in the array of the integrity verification module.
S104: after the data recovery is finished, the key parameter generation module generates an encryption key through a Keyall algorithm according to the input security parameters, and generates label information for the storage file.
S105: in the storage process, the configuration module performs preset configuration on various parameters in the system and extracts corresponding configuration information according to the control instruction; the data management module performs addition, deletion, modification and backup on the stored data content; meanwhile, corresponding contents are searched by a query module through voice input or keyboard input.
S106: the wireless signal transceiver module is connected with the cloud server through the wireless signal transceiver to realize data transmission; the host service configuration and the business scale in the cloud server can be configured according to the needs of users and are used for realizing data sharing.
The working principle of the invention is as follows: the data acquisition module 1 acquires data uploaded by a user through a computer network terminal; the stored contents are classified by the data classification module 9 using a data classification method. After the data classification is finished, the data are transmitted to a data coding module; the data information acquired by the data coding module 2 calculates a coded array according to the parameters of the configuration center and codes according to the array; after the encoding is completed, the data storage module 4 selects a corresponding storage mode according to the reading frequency or the file size of the data file, and stores the data file. The integrity verification module 5 sends a detection request to the server, and the server calculates through the label information and the request information to obtain detection information; the server detects the storage result through the detection information and the label information; and the data recovery module 6 performs data recovery according to the feedback information, the invalid node information, the data recovery related parameters and the corresponding position information of the analytic invalid node in the array of the integrity verification module. After the data recovery is completed, the key parameter generation module 3 generates an encryption key through a Keyall algorithm according to the input security parameters, and generates label information for the storage file.
In the storage process, the configuration module 7 performs preset configuration on various parameters in the system and extracts corresponding configuration information according to the control instruction; the data management module 8 performs addition, deletion, modification and backup on the stored data content; meanwhile, the corresponding content is searched by the query module 10 by using voice input or keyboard input. The wireless signal transceiver module 12 is connected with the cloud server through a wireless signal transceiver to realize data transmission; the host service configuration and the business scale in the cloud server 13 can be configured according to the needs of users, and are used for realizing the sharing of data.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents and improvements made within the spirit and principle of the present invention are intended to be included within the scope of the present invention.
Claims (10)
1. A computer big data storage control system, comprising:
the data acquisition module is connected with the central main control module, acquires data uploaded by a user through a computer network terminal, and transmits the data to the data coding module; establishing a corresponding sample for the acquired data, and preprocessing the data in the sample; after the preprocessing is finished, carrying out compression transmission on corresponding data; in the data preprocessing process, the missing data processing process comprises the following steps: deleting the missing samples by using corresponding deletion functions; after the deletion is completed, assigning a missing value to replace the missing value; randomly simulating the completed data set, storing the data set into imp, and performing linear regression on the imp;
the data coding module is connected with the central main control module, calculates a coded array according to the parameters of the configuration center by the acquired data information, and performs coding storage according to the array; selecting a proper neural network model, and extracting corresponding data characteristics from the input data information; establishing a corresponding multilayer neural network according to the extracted data characteristic information; each layer of neural network trains the whole deep neural network by using a corresponding training method; calculating parameters of the neural network of the first layer, and taking the output of the hidden layer of the neural network of the first layer as the input of the next layer; continuously repeating the above process, and training the network parameters of each layer in sequence to realize the coding of data;
the key parameter generation module is connected with the central main control module, generates an encryption key through a Keyall algorithm according to the input security parameters, and generates label information for the storage file; the user inputs the security parameters to generate a key pair and an encryption key, the generated public key is public and is used for generating label information for the file storing the result, and the private key is stored by the user; when the encryption key is generated by the Keyall algorithm, two strong prime numbers p and q are generated firstly and then calculated: m ═ pq, f (M) ═ p-1 (q-1), then, an odd number a is generated, a belongs to a positive integer, so that G (a, f (M) ═ 1, then p and q are regarded as private keys, and a and M are regarded as public keys;
the data storage module is connected with the central main control module and selects a corresponding storage mode according to the reading frequency or the file size of the data file; the data storage module stores data files by adopting a storage strategy of redundancy first and coding later, small files and frequently used large files are stored in a redundancy backup mode, and large files which are not used for a long time are stored in an RS (Reed-Solomon) coding mode;
the integrity verification module is connected with the central main control module and used for sending a detection request to the server, and the server calculates through the label information and the request information to obtain detection information; the server detects the storage result through the detection information and the label information; the user sends the key information to the server to provide a detection request; the server calculates the stored data according to the detection request of the user; after receiving the information returned by the server, the user decrypts the information; the user verifies the returned information to verify whether the returned information is complete, if the returned information is complete, the verification is successful, and the big data storage result is correct; otherwise, if the big data storage result is wrong, the storage results need to be verified one by one to find out the wrong big data storage result;
the data recovery module is connected with the central main control module and used for recovering data according to the feedback information, the failure node information, the data recovery related parameters and the corresponding position information of the analysis failure node in the array of the integrity verification module; monitoring the time for sending data information between the master node and the slave node, and if the return information of the node is not received within the set time, judging that the node is invalid; reading parameters related to data recovery in the configuration center, analyzing the corresponding position of a lost node or a block in the coding array according to the node failure information or the block failure information and the related parameters of the data recovery, and sending the position to a decoding unit; reading related load balancing parameters in the configuration center, and selecting a new node list according to the parameters and the load state of each node; selecting a decoding scheme according to the corresponding position of the lost node or the block in the coding array, and reading the residual block data required in the server; decoding calculation is carried out according to the residual block data to obtain the data of the lost blocks, and the recovered blocks are stored in the new nodes in the server according to the selected new node list;
the configuration module is connected with the central main control module and is used for presetting and configuring various parameters in the system and extracting corresponding configuration information according to the control instruction;
the data management module is connected with the central main control module and is used for adding, deleting, modifying and backing up the stored data content;
the data classification module is connected with the central main control module and classifies the stored contents by using a data classification method;
the query module is connected with the central main control module and searches corresponding contents through voice input or keyboard input;
the central main control module is respectively connected with the data acquisition module, the data coding module, the key parameter generation module, the data storage module, the integrity verification module, the data recovery module, the configuration module, the data management module, the data classification module, the query module, the wireless signal transceiving module and the cloud server and is used for coordinating the normal operation of each module;
the wireless signal transceiver module is connected with the central main control module and is connected with the cloud server through the wireless signal transceiver to realize data transmission;
and the cloud server is connected with the central main control module, and the host service configuration and the service scale can be configured according to the needs of users and are used for realizing data sharing.
2. The computer big data storage control system of claim 1, wherein the data management module comprises:
the data management system comprises an adding and deleting module, a data processing module and a data processing module, wherein the adding and deleting module inputs a corresponding deleting or adding instruction according to the user requirement, and the data management system deletes or adds corresponding contents;
the modification module inputs a corresponding modification command according to the user requirement, and the data management system modifies the corresponding content;
and the backup module monitors and tracks the data uploaded by the user and the update of the important target file to be tracked, transmits the update log to the backup system in real time through a network, and the backup system updates the disk according to the log.
3. A computer big data storage control method of a computer big data storage control system according to claims 1-2, wherein the computer big data storage control method comprises:
the method comprises the following steps that firstly, a data acquisition module acquires data uploaded by a user through a computer network terminal; classifying the stored content by a data classification module by using a data classification method;
step two, after the data classification is finished, the data are transmitted to a data coding module; the data information acquired by the data coding module calculates a coded array according to the parameters of the configuration center and codes according to the array; after the coding is finished, the data storage module selects a corresponding storage mode for storage according to the reading frequency or the file size of the data file;
step three, the integrity verification module sends a detection request to the server, and the server calculates through the label information and the request information to obtain detection information; the server detects the storage result through the detection information and the label information; the data recovery module recovers the data according to the feedback information, the invalid node information, the data recovery related parameters and the corresponding position information of the analytic invalid node in the array of the integrity verification module;
after the data recovery is finished, a key parameter generation module generates an encryption key through a Keyall algorithm according to the input security parameters, and generates label information for the storage file;
step five, in the storage process, the configuration module performs preset configuration on various parameters in the system and extracts corresponding configuration information according to the control instruction; the data management module performs addition, deletion, modification and backup on the stored data content; meanwhile, corresponding contents are searched by a query module through voice input or keyboard input;
step six, the wireless signal transceiver module is connected with the cloud server through the wireless signal transceiver to realize data transmission; the host service configuration and the business scale in the cloud server can be configured according to the needs of users and are used for realizing data sharing.
4. The computer big data storage control method according to claim 3, wherein in the first step, the data acquisition module processes the acquired data as follows:
establishing a corresponding sample for the acquired data, and preprocessing the data in the sample; after the preprocessing is finished, carrying out compression transmission on corresponding data;
in the data preprocessing process, the missing data processing process comprises the following steps:
deleting the missing samples by using corresponding deletion functions; after the deletion is completed, assigning a missing value to replace the missing value; and (4) randomly simulating the completed data set, storing the data set into imp, and performing linear regression on the imp.
5. The method for controlling big data storage of a computer according to claim 3, wherein in the second step, the data encoding module encodes the data by:
selecting a proper neural network model, and extracting corresponding data characteristics from the input data information;
establishing a corresponding multilayer neural network according to the extracted data characteristic information; each layer of neural network trains the whole deep neural network by using a corresponding training method;
calculating parameters of the neural network of the first layer, and taking the output of the hidden layer of the neural network of the first layer as the input of the next layer;
and continuously repeating the process, and training the network parameters of each layer in sequence to realize the coding of the data.
6. The computer big data storage control method according to claim 3, wherein in the second step, the data storage module stores the data files by using a storage strategy of redundancy first and coding second, and for small files and frequently used big files, the data files are stored in a redundancy backup manner, and for big files which are not used for a long time, the data files are stored in an RS coding manner.
7. The method for controlling big data storage of a computer according to claim 6, wherein the redundancy-first and coding-second storage strategy specifically comprises:
when a certain file is uploaded to a server, storing the file according to a redundant backup mode, newly adding 'latest reading time' in file meta information and setting the latest reading time as a current timestamp;
the server checks the 'file size' and the 'latest reading time' in each file meta information, skips files which are already stored by RS encoding and files with the size less than 100MB, and regarding files with the file size more than 100MB, if the time for reading the file last time is within 3 days from the moment, the file is considered as hot data, and skips; otherwise, judging that the File is not used for a long time, performing RS coding storage on the File, and deleting the previous redundant backup of the File;
when the read file is stored in a redundant backup mode, updating the latest reading time;
when the read file is stored according to the RS coding mode, if the file is intact, no operation is performed; if the file is damaged, RS decoding is carried out on the residual data blocks of the file to obtain source data, and the restored source data are stored again according to a redundant backup mode.
8. The computer big data storage control method according to claim 3, wherein in the third step, the specific detection step adopted by the integrity verification module comprises:
firstly, a user sends key information to a server to provide a detection request;
secondly, the server calculates the stored data according to the detection request of the user;
thirdly, after receiving the information returned by the server, the user decrypts the information;
fourthly, the user verifies the returned information to verify whether the returned information is complete, if the returned information is complete, the verification is successful, and the big data storage result is correct; otherwise, if the big data storage result is wrong, the storage results need to be verified one by one to find out the wrong big data storage result;
the algorithm is as follows: inputting a data file H to be detected, and selecting a file block m in the data file Hi(1<i<n), detecting the random number r to generate a random number r; then, a is calculatedr=arnodM, calculation of detection data 1:secondly, for file block mi(1<i<n) corresponding tag information TiSelecting and calculatingFinally, the detection data is calculated 2: r ═ SrnodM, and verifying whether R and R' are equal; if the two are equal, the T is returned, otherwise, the F is returned.
9. The method for controlling big data storage of a computer according to claim 3, wherein in the third step, the data recovery process adopted by the data recovery module specifically includes:
monitoring the time for sending data information between the master node and the slave node, and if the return information of the node is not received within the set time, judging that the node is invalid;
reading parameters related to data recovery in the configuration center, analyzing the corresponding position of a lost node or a block in the coding array according to the node failure information or the block failure information and the related parameters of the data recovery, and sending the position to a decoding unit;
reading related load balancing parameters in the configuration center, and selecting a new node list according to the parameters and the load state of each node;
selecting a decoding scheme according to the corresponding position of the lost node or the block in the coding array, and reading the residual block data required in the server;
and performing decoding calculation according to the residual block data to obtain the data of the lost blocks, and storing the recovered blocks in the new nodes in the server according to the selected new node list.
10. The method for controlling big data storage of a computer according to claim 3, wherein in the fourth step, the key parameter generating module specifically comprises:
the user inputs the security parameters to generate a key pair and an encryption key, the generated public key is public and is used for generating label information for the file storing the result, and the private key is stored by the user;
when the encryption key is generated by the Keyall algorithm, two strong prime numbers p and q are generated firstly and then calculated: m ═ pq, f (M) ═ p-1 (q-1), then, an odd number a is generated, a belongs to a positive integer, so that G (a, f (M) ═ 1, then p and q are regarded as private keys, and a and M are regarded as public keys.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010046920.2A CN111291046B (en) | 2020-01-16 | 2020-01-16 | Computer big data storage control system and method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010046920.2A CN111291046B (en) | 2020-01-16 | 2020-01-16 | Computer big data storage control system and method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111291046A true CN111291046A (en) | 2020-06-16 |
CN111291046B CN111291046B (en) | 2023-07-14 |
Family
ID=71023092
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010046920.2A Active CN111291046B (en) | 2020-01-16 | 2020-01-16 | Computer big data storage control system and method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111291046B (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114416728A (en) * | 2021-12-27 | 2022-04-29 | 炫彩互动网络科技有限公司 | Server archiving and file reading method |
CN114629709A (en) * | 2022-03-18 | 2022-06-14 | 云南鲲之大科技有限公司 | Computer network safety system based on distributed big data information interaction |
CN116418580A (en) * | 2023-04-10 | 2023-07-11 | 广东粤密技术服务有限公司 | Data integrity protection detection method and device for local area network and electronic equipment |
CN117193886A (en) * | 2023-11-06 | 2023-12-08 | 成都科江科技有限公司 | Dynamic loading method for configuration file of industrial control system |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20090094250A1 (en) * | 2007-10-09 | 2009-04-09 | Greg Dhuse | Ensuring data integrity on a dispersed storage grid |
CN103425941A (en) * | 2013-07-31 | 2013-12-04 | 广东数字证书认证中心有限公司 | Cloud storage data integrity verification method, equipment and server |
CN105320899A (en) * | 2014-07-22 | 2016-02-10 | 北京大学 | User-oriented cloud storage data integrity protection method |
CN106611135A (en) * | 2016-06-21 | 2017-05-03 | 四川用联信息技术有限公司 | Storage data integrity verification and recovery method |
RU2017115539A3 (en) * | 2017-05-02 | 2018-11-07 |
-
2020
- 2020-01-16 CN CN202010046920.2A patent/CN111291046B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20090094250A1 (en) * | 2007-10-09 | 2009-04-09 | Greg Dhuse | Ensuring data integrity on a dispersed storage grid |
CN103425941A (en) * | 2013-07-31 | 2013-12-04 | 广东数字证书认证中心有限公司 | Cloud storage data integrity verification method, equipment and server |
CN105320899A (en) * | 2014-07-22 | 2016-02-10 | 北京大学 | User-oriented cloud storage data integrity protection method |
CN106611135A (en) * | 2016-06-21 | 2017-05-03 | 四川用联信息技术有限公司 | Storage data integrity verification and recovery method |
RU2017115539A3 (en) * | 2017-05-02 | 2018-11-07 |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114416728A (en) * | 2021-12-27 | 2022-04-29 | 炫彩互动网络科技有限公司 | Server archiving and file reading method |
CN114629709A (en) * | 2022-03-18 | 2022-06-14 | 云南鲲之大科技有限公司 | Computer network safety system based on distributed big data information interaction |
CN116418580A (en) * | 2023-04-10 | 2023-07-11 | 广东粤密技术服务有限公司 | Data integrity protection detection method and device for local area network and electronic equipment |
CN116418580B (en) * | 2023-04-10 | 2023-11-24 | 广东粤密技术服务有限公司 | Data integrity protection detection method and device for local area network and electronic equipment |
CN117193886A (en) * | 2023-11-06 | 2023-12-08 | 成都科江科技有限公司 | Dynamic loading method for configuration file of industrial control system |
CN117193886B (en) * | 2023-11-06 | 2024-01-05 | 成都科江科技有限公司 | Dynamic loading method for configuration file of industrial control system |
Also Published As
Publication number | Publication date |
---|---|
CN111291046B (en) | 2023-07-14 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111291046B (en) | Computer big data storage control system and method | |
CN110162414B (en) | Method and device for realizing artificial intelligent service based on micro-service architecture | |
CN113961759B (en) | Abnormality detection method based on attribute map representation learning | |
CN110929840A (en) | Continuous learning neural network system using rolling window | |
CN103548003A (en) | Processes and methods for client-side fingerprint caching to improve deduplication system backup performance | |
CN103838847A (en) | Data organization method oriented to sea-cloud collaboration network computing network | |
CN112380067B (en) | Metadata-based big data backup system and method in Hadoop environment | |
CN111708794B (en) | Data comparison method and device based on big data platform and computer equipment | |
Zhang et al. | Log sequence anomaly detection based on local information extraction and globally sparse transformer model | |
CN107391557B (en) | Block chain serial query method and system for setting out-of-chain fault table | |
CN113127633A (en) | Intelligent conference management method and device, computer equipment and storage medium | |
CN116414948A (en) | Abnormal data mining method and software product based on cloud data and artificial intelligence | |
CN107451177B (en) | Query method and system for single error-surveying block chain of increased blocks | |
CN114564726A (en) | Software vulnerability analysis method and system based on big data office | |
CN107463596B (en) | Block chain parallel query method and system for setting out-of-chain fault table | |
CN115408350A (en) | Log compression method, log recovery method, log compression device, log recovery device, computer equipment and storage medium | |
CN115796229A (en) | Graph node embedding method, system, device and storage medium | |
CN114816468A (en) | Cloud edge coordination system, data processing method, electronic device and storage medium | |
CN110399485B (en) | Data tracing method and system based on word vector and machine learning | |
CN117435999A (en) | Risk assessment method, apparatus, device and medium | |
CN104025088B (en) | Data block is separated into method and system of multiple streams for compression | |
CN114760328A (en) | Data storage method, system, electronic equipment and storage medium | |
CN111783133B (en) | Network resource management method based on block chain technology | |
CN112052674A (en) | Entity definition extraction method, system, storage medium and server | |
Liu et al. | Secure and controllable data management mechanism for multi‐sensor fusion in internet of things |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |