CN111291046A - Computer big data storage control system and method - Google Patents

Computer big data storage control system and method Download PDF

Info

Publication number
CN111291046A
CN111291046A CN202010046920.2A CN202010046920A CN111291046A CN 111291046 A CN111291046 A CN 111291046A CN 202010046920 A CN202010046920 A CN 202010046920A CN 111291046 A CN111291046 A CN 111291046A
Authority
CN
China
Prior art keywords
data
module
information
file
storage
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010046920.2A
Other languages
Chinese (zh)
Other versions
CN111291046B (en
Inventor
付媛媛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hongfujin Precision Industry Shenzhen Co Ltd
Original Assignee
Hongfujin Precision Industry Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hongfujin Precision Industry Shenzhen Co Ltd filed Critical Hongfujin Precision Industry Shenzhen Co Ltd
Priority to CN202010046920.2A priority Critical patent/CN111291046B/en
Publication of CN111291046A publication Critical patent/CN111291046A/en
Application granted granted Critical
Publication of CN111291046B publication Critical patent/CN111291046B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2219Large Object storage; Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/602Providing cryptographic facilities or services
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/64Protecting data integrity, e.g. using checksums, certificates or signatures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/04Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks
    • H04L63/0428Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks wherein the data content is protected, e.g. by encrypting or encapsulating the payload
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/12Applying verification of the received information
    • H04L63/123Applying verification of the received information received data contents, e.g. message integrity
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/06Protocols specially adapted for file transfer, e.g. file transfer protocol [FTP]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L9/00Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
    • H04L9/08Key distribution or management, e.g. generation, sharing or updating, of cryptographic keys or passwords
    • H04L9/0861Generation of secret information including derivation or calculation of cryptographic keys or passwords

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Computer Hardware Design (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Biomedical Technology (AREA)
  • Bioethics (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention belongs to the technical field of computer application, and discloses a computer big data storage control system and a method, wherein a key parameter generation module is adopted to generate an encryption key through a Keyall algorithm according to input security parameters, and tag information is generated for a storage file; selecting a corresponding storage mode by adopting a data storage module according to the reading frequency or the file size of the data file; and the integrity verification module is adopted to send a detection request to the server, the server calculates through the label information and the request information to obtain detection information, and the server detects the storage result through the detection information and the label information. The data storage module in the invention improves the data processing capability based on the storage strategy of redundancy before coding, and can ensure that the requirement of the generation quantity of the well-jet data at the present stage is met. The integrity verification module can verify whether the stored data is complete, and the stored data is prevented from being deleted, lost, tampered and the like.

Description

Computer big data storage control system and method
Technical Field
The invention belongs to the technical field of computer application, and particularly relates to a computer big data storage control system and method.
Background
Currently, the closest prior art: today, with the rapid development of information technology, the daily activities of people generate a great deal of data information. Under the condition of daily accumulation, the data volume is larger and larger, the data types are more and more, and the large data is formed. The traditional computer information processing technology has limited data which can be processed, and cannot meet the requirement of the data generation quantity of the well injection type at present.
The method has the advantages that the computer information processing technology is innovated, the effectiveness of the computer information processing technology is improved, and data can be better collected, sorted, processed and applied. In order to ensure the application efficiency of the computer information processing technology, data needs to be classified in advance. Meanwhile, in the process of applying the computer information processing technology, the safety of information processing needs to be improved. And the safety of data processing can be ensured only by ensuring the advancement of computer information processing technology. Moreover, a certain relation exists between the massive data and different types of data information. How to ensure the storage safety of information in the data processing process is a difficulty. The function of data information can be fully played only by continuously analyzing and upgrading the information security technology, and the integrity and stability of data are ensured.
In summary, the problems of the prior art are as follows:
(1) the data that current computer information processing technique can handle is limited, can't satisfy the data production quantity of present stage well-jet formula at all.
(2) The existing computer information processing technology cannot ensure the safety of data processing.
Disclosure of Invention
Aiming at the problems in the prior art, the invention provides a computer big data storage control system and a computer big data storage control method.
The invention is realized in this way, a computer big data storage control system includes:
the data acquisition module is connected with the central main control module, acquires data uploaded by a user through a computer network terminal, and transmits the data to the data coding module; establishing a corresponding sample for the acquired data, and preprocessing the data in the sample; after the preprocessing is finished, carrying out compression transmission on corresponding data; in the data preprocessing process, the missing data processing process comprises the following steps: deleting the missing samples by using corresponding deletion functions; after the deletion is completed, assigning a missing value to replace the missing value; randomly simulating the completed data set, storing the data set into imp, and performing linear regression on the imp;
the data coding module is connected with the central main control module, calculates a coded array according to the parameters of the configuration center by the acquired data information, and performs coding storage according to the array; selecting a proper neural network model, and extracting corresponding data characteristics from the input data information; establishing a corresponding multilayer neural network according to the extracted data characteristic information; each layer of neural network trains the whole deep neural network by using a corresponding training method; calculating parameters of the neural network of the first layer, and taking the output of the hidden layer of the neural network of the first layer as the input of the next layer; continuously repeating the above process, and training the network parameters of each layer in sequence to realize the coding of data;
the key parameter generation module is connected with the central main control module, generates an encryption key through a Keyall algorithm according to the input security parameters, and generates label information for the storage file; the user inputs the security parameters to generate a key pair and an encryption key, the generated public key is public and is used for generating label information for the file storing the result, and the private key is stored by the user; when the encryption key is generated by the Keyall algorithm, two strong prime numbers p and q are generated firstly and then calculated: m ═ pq, f (M) ═ p-1 (q-1), then, an odd number a is generated, a belongs to a positive integer, so that G (a, f (M) ═ 1, then p and q are regarded as private keys, and a and M are regarded as public keys;
the data storage module is connected with the central main control module and selects a corresponding storage mode according to the reading frequency or the file size of the data file; the data storage module stores data files by adopting a storage strategy of redundancy first and coding later, small files and frequently used large files are stored in a redundancy backup mode, and large files which are not used for a long time are stored in an RS (Reed-Solomon) coding mode;
the integrity verification module is connected with the central main control module and used for sending a detection request to the server, and the server calculates through the label information and the request information to obtain detection information; the server detects the storage result through the detection information and the label information; the user sends the key information to the server to provide a detection request; the server calculates the stored data according to the detection request of the user; after receiving the information returned by the server, the user decrypts the information; the user verifies the returned information to verify whether the returned information is complete, if the returned information is complete, the verification is successful, and the big data storage result is correct; otherwise, if the big data storage result is wrong, the storage results need to be verified one by one to find out the wrong big data storage result;
the data recovery module is connected with the central main control module and used for recovering data according to the feedback information, the failure node information, the data recovery related parameters and the corresponding position information of the analysis failure node in the array of the integrity verification module; monitoring the time for sending data information between the master node and the slave node, and if the return information of the node is not received within the set time, judging that the node is invalid; reading parameters related to data recovery in the configuration center, analyzing the corresponding position of a lost node or a block in the coding array according to the node failure information or the block failure information and the related parameters of the data recovery, and sending the position to a decoding unit; reading related load balancing parameters in the configuration center, and selecting a new node list according to the parameters and the load state of each node; selecting a decoding scheme according to the corresponding position of the lost node or the block in the coding array, and reading the residual block data required in the server; decoding calculation is carried out according to the residual block data to obtain the data of the lost blocks, and the recovered blocks are stored in the new nodes in the server according to the selected new node list;
the configuration module is connected with the central main control module and is used for presetting and configuring various parameters in the system and extracting corresponding configuration information according to the control instruction;
the data management module is connected with the central main control module and is used for adding, deleting, modifying and backing up the stored data content;
the data classification module is connected with the central main control module and classifies the stored contents by using a data classification method;
the query module is connected with the central main control module and searches corresponding contents through voice input or keyboard input;
the central main control module is respectively connected with the data acquisition module, the data coding module, the key parameter generation module, the data storage module, the integrity verification module, the data recovery module, the configuration module, the data management module, the data classification module, the query module, the wireless signal transceiving module and the cloud server and is used for coordinating the normal operation of each module;
the wireless signal transceiver module is connected with the central main control module and is connected with the cloud server through the wireless signal transceiver to realize data transmission;
and the cloud server is connected with the central main control module, and the host service configuration and the service scale can be configured according to the needs of users and are used for realizing data sharing.
Further, the data management module includes:
the data management system comprises an adding and deleting module, a data processing module and a data processing module, wherein the adding and deleting module inputs a corresponding deleting or adding instruction according to the user requirement, and the data management system deletes or adds corresponding contents;
the modification module inputs a corresponding modification command according to the user requirement, and the data management system modifies the corresponding content;
and the backup module monitors and tracks the data uploaded by the user and the update of the important target file to be tracked, transmits the update log to the backup system in real time through a network, and the backup system updates the disk according to the log.
Another object of the present invention is to provide a computer big data storage control method of the computer big data storage control system, the computer big data storage control method comprising:
the method comprises the following steps that firstly, a data acquisition module acquires data uploaded by a user through a computer network terminal; classifying the stored content by a data classification module by using a data classification method;
step two, after the data classification is finished, the data are transmitted to a data coding module; the data information acquired by the data coding module calculates a coded array according to the parameters of the configuration center and codes according to the array; after the coding is finished, the data storage module selects a corresponding storage mode for storage according to the reading frequency or the file size of the data file;
step three, the integrity verification module sends a detection request to the server, and the server calculates through the label information and the request information to obtain detection information; the server detects the storage result through the detection information and the label information; the data recovery module recovers the data according to the feedback information, the invalid node information, the data recovery related parameters and the corresponding position information of the analytic invalid node in the array of the integrity verification module;
after the data recovery is finished, a key parameter generation module generates an encryption key through a Keyall algorithm according to the input security parameters, and generates label information for the storage file;
step five, in the storage process, the configuration module performs preset configuration on various parameters in the system and extracts corresponding configuration information according to the control instruction; the data management module performs addition, deletion, modification and backup on the stored data content; meanwhile, corresponding contents are searched by a query module through voice input or keyboard input;
step six, the wireless signal transceiver module is connected with the cloud server through the wireless signal transceiver to realize data transmission; the host service configuration and the business scale in the cloud server can be configured according to the needs of users and are used for realizing data sharing.
Further, in the first step, the data obtaining module processes the obtained data as follows:
establishing a corresponding sample for the acquired data, and preprocessing the data in the sample; after the preprocessing is finished, carrying out compression transmission on corresponding data;
in the data preprocessing process, the missing data processing process comprises the following steps:
deleting the missing samples by using corresponding deletion functions; after the deletion is completed, assigning a missing value to replace the missing value; and (4) randomly simulating the completed data set, storing the data set into imp, and performing linear regression on the imp.
Further, in the second step, the process of encoding the data by the data encoding module is as follows:
selecting a proper neural network model, and extracting corresponding data characteristics from the input data information;
establishing a corresponding multilayer neural network according to the extracted data characteristic information; each layer of neural network trains the whole deep neural network by using a corresponding training method;
calculating parameters of the neural network of the first layer, and taking the output of the hidden layer of the neural network of the first layer as the input of the next layer;
and continuously repeating the process, and training the network parameters of each layer in sequence to realize the coding of the data.
Further, in the second step, the data storage module stores the data files by adopting a storage strategy of redundancy first and coding second, and for small files and frequently used large files, a redundancy backup mode is adopted for storage, and for large files which are not used for a long time, an RS coding mode is adopted for storage.
Further, the redundancy-first and coding-second storage strategy specifically includes:
when a certain file is uploaded to a server, storing the file according to a redundant backup mode, newly adding 'latest reading time' in file meta information and setting the latest reading time as a current timestamp;
the server checks the 'file size' and the 'latest reading time' in each file meta information, skips files which are already stored by RS encoding and files with the size less than 100MB, and regarding files with the file size more than 100MB, if the time for reading the file last time is within 3 days from the moment, the file is considered as hot data, and skips; otherwise, judging that the File is not used for a long time, performing RS coding storage on the File, and deleting the previous redundant backup of the File;
when the read file is stored in a redundant backup mode, updating the latest reading time;
when the read file is stored according to the RS coding mode, if the file is intact, no operation is performed; if the file is damaged, RS decoding is carried out on the residual data blocks of the file to obtain source data, and the restored source data are stored again according to a redundant backup mode.
Further, in the third step, the specific detection steps adopted by the integrity verification module include:
firstly, a user sends key information to a server to provide a detection request;
secondly, the server calculates the stored data according to the detection request of the user;
thirdly, after receiving the information returned by the server, the user decrypts the information;
fourthly, the user verifies the returned information to verify whether the returned information is complete, if the returned information is complete, the verification is successful, and the big data storage result is correct; otherwise, if the big data storage result is wrong, the storage results need to be verified one by one to find out the wrong big data storage result;
the algorithm is as follows: inputting a data file H to be detected, and selecting a file block m in the data file Hi(1<i<n), detecting the random number r to generate a random number r; then, a is calculatedr=arnod M, calculation of detection data 1:
Figure BDA0002369750450000061
secondly, for file block mi(1<i<n) corresponding tag information TiIs selected, andcomputing
Figure BDA0002369750450000071
Finally, the detection data is calculated 2: r ═ SrnodM, and verifying whether R and R' are equal; if the two are equal, the T is returned, otherwise, the F is returned.
Further, in the third step, the data recovery process adopted by the data recovery module specifically includes:
monitoring the time for sending data information between the master node and the slave node, and if the return information of the node is not received within the set time, judging that the node is invalid;
reading parameters related to data recovery in the configuration center, analyzing the corresponding position of a lost node or a block in the coding array according to the node failure information or the block failure information and the related parameters of the data recovery, and sending the position to a decoding unit;
reading related load balancing parameters in the configuration center, and selecting a new node list according to the parameters and the load state of each node;
selecting a decoding scheme according to the corresponding position of the lost node or the block in the coding array, and reading the residual block data required in the server;
and performing decoding calculation according to the residual block data to obtain the data of the lost blocks, and storing the recovered blocks in the new nodes in the server according to the selected new node list.
Further, in the fourth step, the key parameter generating module specifically includes:
the user inputs the security parameters to generate a key pair and an encryption key, the generated public key is public and is used for generating label information for the file storing the result, and the private key is stored by the user;
when the encryption key is generated by the Keyall algorithm, two strong prime numbers p and q are generated firstly and then calculated: m ═ pq, f (M) ═ p-1 (q-1), then, an odd number a is generated, a belongs to a positive integer, so that G (a, f (M) ═ 1, then p and q are regarded as private keys, and a and M are regarded as public keys.
In summary, the advantages and positive effects of the invention are: the data storage module in the invention adopts a corresponding storage mode selected according to the reading frequency or the file size of the data file, and improves the data processing capability based on a storage strategy of redundancy before coding, thereby ensuring that the requirement of the well-spraying type data generation quantity at the present stage can be met. The integrity verification module can be used for verifying whether the stored data is complete or not, so that the stored data is prevented from being deleted, lost, tampered and the like, and the safety of the stored data is ensured.
The data information obtained by the invention calculates the coded array according to the parameters of the configuration center, and the data coding module which codes and stores the data according to the array performs neural network coding on the data, so that the method has good fault tolerance, self-organization and self-adaptability; in the data compression process, the neural network can autonomously complete image coding and compression according to information characteristics without the aid of a certain predetermined data coding algorithm. Meanwhile, in the data preprocessing process, the method for deleting the data can ensure the authenticity and reliability of the data.
Drawings
Fig. 1 is a schematic structural diagram of a computer big data storage control system according to an embodiment of the present invention.
Fig. 2 is a schematic structural diagram of a data management module according to an embodiment of the present invention.
FIG. 3 is a flowchart of a method for controlling big data storage of a computer according to an embodiment of the present invention.
Fig. 4 is a schematic diagram illustrating comparison results of detection execution times of the integrity verification module according to the embodiment of the present invention.
In the figure: 1. a data acquisition module; 2. a data encoding module; 3. a key parameter generation module; 4. a data storage module; 5. an integrity verification module; 6. a data recovery module; 7. a configuration module; 8. a data management module; 9. a data classification module; 10. a query module; 11. a central main control module; 12. a wireless signal transceiving module; 13. and (4) a cloud server.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail with reference to the following embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
As shown in fig. 1, a computer big data storage control system provided by an embodiment of the present invention includes:
and the data acquisition module 1 is connected with the central main control module 11, acquires data uploaded by a user through a computer network terminal, and transmits the data to the data coding module.
And the data coding module 2 is connected with the central main control module 11, and the obtained data information calculates a coded array according to the parameters of the configuration center and performs coding storage according to the array.
And the key parameter generation module 3 is connected with the central main control module 11, generates an encryption key through a Keyall algorithm according to the input security parameters, and generates label information for the storage file.
And the data storage module 4 is connected with the central main control module 11 and selects a corresponding storage mode according to the reading frequency or the file size of the data file.
The integrity verification module 5 is connected with the central main control module 11 and used for sending a detection request to the server, and the server calculates through the label information and the request information to obtain detection information; and the server detects the storage result through the detection information and the label information.
And the data recovery module 6 is connected with the central main control module 11 and is used for recovering data according to the feedback information, the failure node information, the data recovery related parameters and the corresponding position information of the analysis failure node in the array of the integrity verification module.
And the configuration module 7 is connected with the central main control module 11 and is used for presetting and configuring various parameters in the system and extracting corresponding configuration information according to the control instruction.
And the data management module 8 is connected with the central main control module 11 and is used for adding, deleting, modifying and backing up the stored data content.
And the data classification module 9 is connected with the central main control module 11 and classifies the stored contents by using a data classification method.
And the query module 10 is connected with the central main control module 11 and searches corresponding contents through voice input or keyboard input.
The central main control module 11 is respectively connected with the data acquisition module 1, the data encoding module 2, the key parameter generation module 3, the data storage module 4, the integrity verification module 5, the data recovery module 6, the configuration module 7, the data management module 8, the data classification module 9, the query module 10, the wireless signal transceiving module 12 and the cloud server 13, and is used for coordinating normal operation of each module.
And the wireless signal transceiver module 12 is connected with the central main control module 11 and is connected with the cloud server through a wireless signal transceiver to realize data transmission.
And the cloud server 13 is connected with the central main control module 11, and the host service configuration and the business scale can be configured according to the needs of users and are used for realizing data sharing.
As shown in fig. 2, the data management module provided in the embodiment of the present invention includes:
and the adding and deleting module is used for inputting a corresponding deleting or adding instruction according to the user requirement, and the data management system deletes or adds the corresponding content.
And the modification module inputs a corresponding modification command according to the user requirement, and the data management system modifies the corresponding content.
And the backup module monitors and tracks the data uploaded by the user and the update of the important target file to be tracked, transmits the update log to the backup system in real time through a network, and the backup system updates the disk according to the log.
The process of the data coding module 2 connected with the central main control module 11, which is provided by the embodiment of the invention, calculating a coded array according to the parameters of the configuration center by the acquired data information, and coding and storing the data according to the array comprises the following steps:
selecting a proper neural network model, and extracting corresponding data characteristics from the input data information;
establishing a corresponding multilayer neural network according to the extracted data characteristic information; each layer of neural network trains the whole deep neural network by using a corresponding training method;
calculating parameters of the neural network of the first layer, and taking the output of the hidden layer of the neural network of the first layer as the input of the next layer;
and continuously repeating the process, and training the network parameters of each layer in sequence to realize the coding of the data.
The data acquisition module 1 connected to the central main control module 11 and acquiring data uploaded by a user through a computer network terminal according to the embodiment of the present invention processes the acquired data as follows:
establishing a corresponding sample for the acquired data, and preprocessing the data in the sample; and after the preprocessing is finished, performing compression transmission on corresponding data.
In the data preprocessing process, the missing data processing process comprises the following steps:
deleting the missing samples by using corresponding deletion functions; after the deletion is completed, assigning a missing value to replace the missing value; and (4) randomly simulating the completed data set, storing the data set into imp, and performing linear regression on the imp.
The key parameter generation module 3 provided by the embodiment of the present invention specifically includes:
the user inputs the security parameters to generate a key pair and an encryption key, the generated public key is public and is used for generating label information for the file storing the result, and the private key is stored by the user;
when the encryption key is generated by the Keyall algorithm, two strong prime numbers p and q are generated firstly and then calculated: m ═ pq, f (M) ═ p-1 (q-1), then, an odd number a is generated, a belongs to a positive integer, so that G (a, f (M) ═ 1, then p and q are regarded as private keys, and a and M are regarded as public keys.
The data storage module 4 provided by the embodiment of the invention stores the data files by adopting a storage strategy of firstly redundancy and then coding, and stores small files and frequently used large files in a redundancy backup mode, and stores large files which are not used for a long time in an RS coding mode.
The redundancy-before-encoding and coding-after storage strategy specifically comprises:
when a certain file is uploaded to a server, storing the file according to a redundant backup mode, newly adding 'latest reading time' in file meta information and setting the latest reading time as a current timestamp;
the server checks the 'file size' and the 'latest reading time' in each file meta information, skips files which are already stored by RS encoding and files with the size less than 100MB, and regarding files with the file size more than 100MB, if the time for reading the file last time is within 3 days from the moment, the file is considered as hot data, and skips; otherwise, judging that the File is not used for a long time, performing RS coding storage on the File, and deleting the previous redundant backup of the File;
when the read file is stored in a redundant backup mode, updating the latest reading time;
and when the read file is stored according to the RS coding mode, if the file is intact, no operation is performed. If the file is damaged, RS decoding is carried out on the residual data blocks of the file to obtain source data, and the restored source data are stored again according to a redundant backup mode.
The integrity verification module 5 provided by the embodiment of the invention adopts specific detection steps including:
firstly, a user sends key information to a server to provide a detection request;
secondly, the server calculates the stored data according to the detection request of the user;
thirdly, after receiving the information returned by the server, the user decrypts the information;
fourthly, the user verifies the returned information to verify whether the returned information is complete, if the returned information is complete, the verification is successful, and the big data storage result is correct; otherwise, if the big data storage result is wrong, the storage results need to be verified one by one to find out the wrong big data storage result.
The algorithm is as follows: inputting a data file H (the data file is integrity) to be detected, and selecting a file block m in the data file Hi(1<i<n), detecting the random number r to generate a random number r; then, a is calculatedr=arnod M, calculation of detection data 1:
Figure BDA0002369750450000121
secondly, for file block mi(1<i<n) corresponding tag information TiSelecting and calculating
Figure BDA0002369750450000122
Finally, the detection data is calculated 2: r ═ SrnodM, and verify that R and R' are equal. When the two are equal, the T is returned (namely, the data is complete), and otherwise, the F is returned (namely, the data is incomplete). Through the realization of the algorithm, the storage result of the big data can be detected based on the integrity verification.
The data recovery process adopted by the data recovery module 6 provided by the embodiment of the present invention specifically includes:
monitoring the time for sending data information between the master node and the slave node, and if the return information of the node is not received within the set time, judging that the node is invalid;
reading parameters related to data recovery in the configuration center, analyzing the corresponding position of a lost node or a block in the coding array according to the node failure information or the block failure information and the related parameters of the data recovery, and sending the position to a decoding unit;
reading related load balancing parameters in the configuration center, and selecting a new node list according to the parameters and the load state of each node;
selecting a decoding scheme according to the corresponding position of the lost node or the block in the coding array, and reading the residual block data required in the server;
and performing decoding calculation according to the residual block data to obtain the data of the lost blocks, and storing the recovered blocks in the new nodes in the server according to the selected new node list.
In order to verify whether the performance of the integrity verification module 5 is good, based on the PBC database, an udubm4.16 system is adopted, a CPU is i7-4600M, 2.50GHz, a memory is 8GB, an adopted programming language is C, the scale of a large data storage result is 1-5G, a detection efficiency comparison experiment is performed by respectively adopting a detection system provided by the integrity verification module 5 and a conventional detection system, and fig. 4 is a schematic diagram of comparison results of detection execution times of the integrity verification module provided by the embodiment of the present invention.
As shown in fig. 3, a method for controlling a computer big data storage device according to an embodiment of the present invention includes:
s101: the data acquisition module acquires data uploaded by a user through a computer network terminal; and classifying the stored content by a data classification module by using a data classification method.
S102: after the data classification is finished, the data are transmitted to a data coding module; the data information acquired by the data coding module calculates a coded array according to the parameters of the configuration center and codes according to the array; after the coding is finished, the data storage module selects a corresponding storage mode for storage according to the reading frequency or the file size of the data file.
S103: the integrity verification module sends a detection request to the server, and the server calculates through the label information and the request information to obtain detection information; the server detects the storage result through the detection information and the label information; and the data recovery module performs data recovery according to the feedback information, the failure node information, the data recovery related parameters and the corresponding position information of the analysis failure node in the array of the integrity verification module.
S104: after the data recovery is finished, the key parameter generation module generates an encryption key through a Keyall algorithm according to the input security parameters, and generates label information for the storage file.
S105: in the storage process, the configuration module performs preset configuration on various parameters in the system and extracts corresponding configuration information according to the control instruction; the data management module performs addition, deletion, modification and backup on the stored data content; meanwhile, corresponding contents are searched by a query module through voice input or keyboard input.
S106: the wireless signal transceiver module is connected with the cloud server through the wireless signal transceiver to realize data transmission; the host service configuration and the business scale in the cloud server can be configured according to the needs of users and are used for realizing data sharing.
The working principle of the invention is as follows: the data acquisition module 1 acquires data uploaded by a user through a computer network terminal; the stored contents are classified by the data classification module 9 using a data classification method. After the data classification is finished, the data are transmitted to a data coding module; the data information acquired by the data coding module 2 calculates a coded array according to the parameters of the configuration center and codes according to the array; after the encoding is completed, the data storage module 4 selects a corresponding storage mode according to the reading frequency or the file size of the data file, and stores the data file. The integrity verification module 5 sends a detection request to the server, and the server calculates through the label information and the request information to obtain detection information; the server detects the storage result through the detection information and the label information; and the data recovery module 6 performs data recovery according to the feedback information, the invalid node information, the data recovery related parameters and the corresponding position information of the analytic invalid node in the array of the integrity verification module. After the data recovery is completed, the key parameter generation module 3 generates an encryption key through a Keyall algorithm according to the input security parameters, and generates label information for the storage file.
In the storage process, the configuration module 7 performs preset configuration on various parameters in the system and extracts corresponding configuration information according to the control instruction; the data management module 8 performs addition, deletion, modification and backup on the stored data content; meanwhile, the corresponding content is searched by the query module 10 by using voice input or keyboard input. The wireless signal transceiver module 12 is connected with the cloud server through a wireless signal transceiver to realize data transmission; the host service configuration and the business scale in the cloud server 13 can be configured according to the needs of users, and are used for realizing the sharing of data.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents and improvements made within the spirit and principle of the present invention are intended to be included within the scope of the present invention.

Claims (10)

1. A computer big data storage control system, comprising:
the data acquisition module is connected with the central main control module, acquires data uploaded by a user through a computer network terminal, and transmits the data to the data coding module; establishing a corresponding sample for the acquired data, and preprocessing the data in the sample; after the preprocessing is finished, carrying out compression transmission on corresponding data; in the data preprocessing process, the missing data processing process comprises the following steps: deleting the missing samples by using corresponding deletion functions; after the deletion is completed, assigning a missing value to replace the missing value; randomly simulating the completed data set, storing the data set into imp, and performing linear regression on the imp;
the data coding module is connected with the central main control module, calculates a coded array according to the parameters of the configuration center by the acquired data information, and performs coding storage according to the array; selecting a proper neural network model, and extracting corresponding data characteristics from the input data information; establishing a corresponding multilayer neural network according to the extracted data characteristic information; each layer of neural network trains the whole deep neural network by using a corresponding training method; calculating parameters of the neural network of the first layer, and taking the output of the hidden layer of the neural network of the first layer as the input of the next layer; continuously repeating the above process, and training the network parameters of each layer in sequence to realize the coding of data;
the key parameter generation module is connected with the central main control module, generates an encryption key through a Keyall algorithm according to the input security parameters, and generates label information for the storage file; the user inputs the security parameters to generate a key pair and an encryption key, the generated public key is public and is used for generating label information for the file storing the result, and the private key is stored by the user; when the encryption key is generated by the Keyall algorithm, two strong prime numbers p and q are generated firstly and then calculated: m ═ pq, f (M) ═ p-1 (q-1), then, an odd number a is generated, a belongs to a positive integer, so that G (a, f (M) ═ 1, then p and q are regarded as private keys, and a and M are regarded as public keys;
the data storage module is connected with the central main control module and selects a corresponding storage mode according to the reading frequency or the file size of the data file; the data storage module stores data files by adopting a storage strategy of redundancy first and coding later, small files and frequently used large files are stored in a redundancy backup mode, and large files which are not used for a long time are stored in an RS (Reed-Solomon) coding mode;
the integrity verification module is connected with the central main control module and used for sending a detection request to the server, and the server calculates through the label information and the request information to obtain detection information; the server detects the storage result through the detection information and the label information; the user sends the key information to the server to provide a detection request; the server calculates the stored data according to the detection request of the user; after receiving the information returned by the server, the user decrypts the information; the user verifies the returned information to verify whether the returned information is complete, if the returned information is complete, the verification is successful, and the big data storage result is correct; otherwise, if the big data storage result is wrong, the storage results need to be verified one by one to find out the wrong big data storage result;
the data recovery module is connected with the central main control module and used for recovering data according to the feedback information, the failure node information, the data recovery related parameters and the corresponding position information of the analysis failure node in the array of the integrity verification module; monitoring the time for sending data information between the master node and the slave node, and if the return information of the node is not received within the set time, judging that the node is invalid; reading parameters related to data recovery in the configuration center, analyzing the corresponding position of a lost node or a block in the coding array according to the node failure information or the block failure information and the related parameters of the data recovery, and sending the position to a decoding unit; reading related load balancing parameters in the configuration center, and selecting a new node list according to the parameters and the load state of each node; selecting a decoding scheme according to the corresponding position of the lost node or the block in the coding array, and reading the residual block data required in the server; decoding calculation is carried out according to the residual block data to obtain the data of the lost blocks, and the recovered blocks are stored in the new nodes in the server according to the selected new node list;
the configuration module is connected with the central main control module and is used for presetting and configuring various parameters in the system and extracting corresponding configuration information according to the control instruction;
the data management module is connected with the central main control module and is used for adding, deleting, modifying and backing up the stored data content;
the data classification module is connected with the central main control module and classifies the stored contents by using a data classification method;
the query module is connected with the central main control module and searches corresponding contents through voice input or keyboard input;
the central main control module is respectively connected with the data acquisition module, the data coding module, the key parameter generation module, the data storage module, the integrity verification module, the data recovery module, the configuration module, the data management module, the data classification module, the query module, the wireless signal transceiving module and the cloud server and is used for coordinating the normal operation of each module;
the wireless signal transceiver module is connected with the central main control module and is connected with the cloud server through the wireless signal transceiver to realize data transmission;
and the cloud server is connected with the central main control module, and the host service configuration and the service scale can be configured according to the needs of users and are used for realizing data sharing.
2. The computer big data storage control system of claim 1, wherein the data management module comprises:
the data management system comprises an adding and deleting module, a data processing module and a data processing module, wherein the adding and deleting module inputs a corresponding deleting or adding instruction according to the user requirement, and the data management system deletes or adds corresponding contents;
the modification module inputs a corresponding modification command according to the user requirement, and the data management system modifies the corresponding content;
and the backup module monitors and tracks the data uploaded by the user and the update of the important target file to be tracked, transmits the update log to the backup system in real time through a network, and the backup system updates the disk according to the log.
3. A computer big data storage control method of a computer big data storage control system according to claims 1-2, wherein the computer big data storage control method comprises:
the method comprises the following steps that firstly, a data acquisition module acquires data uploaded by a user through a computer network terminal; classifying the stored content by a data classification module by using a data classification method;
step two, after the data classification is finished, the data are transmitted to a data coding module; the data information acquired by the data coding module calculates a coded array according to the parameters of the configuration center and codes according to the array; after the coding is finished, the data storage module selects a corresponding storage mode for storage according to the reading frequency or the file size of the data file;
step three, the integrity verification module sends a detection request to the server, and the server calculates through the label information and the request information to obtain detection information; the server detects the storage result through the detection information and the label information; the data recovery module recovers the data according to the feedback information, the invalid node information, the data recovery related parameters and the corresponding position information of the analytic invalid node in the array of the integrity verification module;
after the data recovery is finished, a key parameter generation module generates an encryption key through a Keyall algorithm according to the input security parameters, and generates label information for the storage file;
step five, in the storage process, the configuration module performs preset configuration on various parameters in the system and extracts corresponding configuration information according to the control instruction; the data management module performs addition, deletion, modification and backup on the stored data content; meanwhile, corresponding contents are searched by a query module through voice input or keyboard input;
step six, the wireless signal transceiver module is connected with the cloud server through the wireless signal transceiver to realize data transmission; the host service configuration and the business scale in the cloud server can be configured according to the needs of users and are used for realizing data sharing.
4. The computer big data storage control method according to claim 3, wherein in the first step, the data acquisition module processes the acquired data as follows:
establishing a corresponding sample for the acquired data, and preprocessing the data in the sample; after the preprocessing is finished, carrying out compression transmission on corresponding data;
in the data preprocessing process, the missing data processing process comprises the following steps:
deleting the missing samples by using corresponding deletion functions; after the deletion is completed, assigning a missing value to replace the missing value; and (4) randomly simulating the completed data set, storing the data set into imp, and performing linear regression on the imp.
5. The method for controlling big data storage of a computer according to claim 3, wherein in the second step, the data encoding module encodes the data by:
selecting a proper neural network model, and extracting corresponding data characteristics from the input data information;
establishing a corresponding multilayer neural network according to the extracted data characteristic information; each layer of neural network trains the whole deep neural network by using a corresponding training method;
calculating parameters of the neural network of the first layer, and taking the output of the hidden layer of the neural network of the first layer as the input of the next layer;
and continuously repeating the process, and training the network parameters of each layer in sequence to realize the coding of the data.
6. The computer big data storage control method according to claim 3, wherein in the second step, the data storage module stores the data files by using a storage strategy of redundancy first and coding second, and for small files and frequently used big files, the data files are stored in a redundancy backup manner, and for big files which are not used for a long time, the data files are stored in an RS coding manner.
7. The method for controlling big data storage of a computer according to claim 6, wherein the redundancy-first and coding-second storage strategy specifically comprises:
when a certain file is uploaded to a server, storing the file according to a redundant backup mode, newly adding 'latest reading time' in file meta information and setting the latest reading time as a current timestamp;
the server checks the 'file size' and the 'latest reading time' in each file meta information, skips files which are already stored by RS encoding and files with the size less than 100MB, and regarding files with the file size more than 100MB, if the time for reading the file last time is within 3 days from the moment, the file is considered as hot data, and skips; otherwise, judging that the File is not used for a long time, performing RS coding storage on the File, and deleting the previous redundant backup of the File;
when the read file is stored in a redundant backup mode, updating the latest reading time;
when the read file is stored according to the RS coding mode, if the file is intact, no operation is performed; if the file is damaged, RS decoding is carried out on the residual data blocks of the file to obtain source data, and the restored source data are stored again according to a redundant backup mode.
8. The computer big data storage control method according to claim 3, wherein in the third step, the specific detection step adopted by the integrity verification module comprises:
firstly, a user sends key information to a server to provide a detection request;
secondly, the server calculates the stored data according to the detection request of the user;
thirdly, after receiving the information returned by the server, the user decrypts the information;
fourthly, the user verifies the returned information to verify whether the returned information is complete, if the returned information is complete, the verification is successful, and the big data storage result is correct; otherwise, if the big data storage result is wrong, the storage results need to be verified one by one to find out the wrong big data storage result;
the algorithm is as follows: inputting a data file H to be detected, and selecting a file block m in the data file Hi(1<i<n), detecting the random number r to generate a random number r; then, a is calculatedr=arnodM, calculation of detection data 1:
Figure FDA0002369750440000061
secondly, for file block mi(1<i<n) corresponding tag information TiSelecting and calculating
Figure FDA0002369750440000062
Finally, the detection data is calculated 2: r ═ SrnodM, and verifying whether R and R' are equal; if the two are equal, the T is returned, otherwise, the F is returned.
9. The method for controlling big data storage of a computer according to claim 3, wherein in the third step, the data recovery process adopted by the data recovery module specifically includes:
monitoring the time for sending data information between the master node and the slave node, and if the return information of the node is not received within the set time, judging that the node is invalid;
reading parameters related to data recovery in the configuration center, analyzing the corresponding position of a lost node or a block in the coding array according to the node failure information or the block failure information and the related parameters of the data recovery, and sending the position to a decoding unit;
reading related load balancing parameters in the configuration center, and selecting a new node list according to the parameters and the load state of each node;
selecting a decoding scheme according to the corresponding position of the lost node or the block in the coding array, and reading the residual block data required in the server;
and performing decoding calculation according to the residual block data to obtain the data of the lost blocks, and storing the recovered blocks in the new nodes in the server according to the selected new node list.
10. The method for controlling big data storage of a computer according to claim 3, wherein in the fourth step, the key parameter generating module specifically comprises:
the user inputs the security parameters to generate a key pair and an encryption key, the generated public key is public and is used for generating label information for the file storing the result, and the private key is stored by the user;
when the encryption key is generated by the Keyall algorithm, two strong prime numbers p and q are generated firstly and then calculated: m ═ pq, f (M) ═ p-1 (q-1), then, an odd number a is generated, a belongs to a positive integer, so that G (a, f (M) ═ 1, then p and q are regarded as private keys, and a and M are regarded as public keys.
CN202010046920.2A 2020-01-16 2020-01-16 Computer big data storage control system and method Active CN111291046B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010046920.2A CN111291046B (en) 2020-01-16 2020-01-16 Computer big data storage control system and method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010046920.2A CN111291046B (en) 2020-01-16 2020-01-16 Computer big data storage control system and method

Publications (2)

Publication Number Publication Date
CN111291046A true CN111291046A (en) 2020-06-16
CN111291046B CN111291046B (en) 2023-07-14

Family

ID=71023092

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010046920.2A Active CN111291046B (en) 2020-01-16 2020-01-16 Computer big data storage control system and method

Country Status (1)

Country Link
CN (1) CN111291046B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114416728A (en) * 2021-12-27 2022-04-29 炫彩互动网络科技有限公司 Server archiving and file reading method
CN114629709A (en) * 2022-03-18 2022-06-14 云南鲲之大科技有限公司 Computer network safety system based on distributed big data information interaction
CN116418580A (en) * 2023-04-10 2023-07-11 广东粤密技术服务有限公司 Data integrity protection detection method and device for local area network and electronic equipment
CN117193886A (en) * 2023-11-06 2023-12-08 成都科江科技有限公司 Dynamic loading method for configuration file of industrial control system

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090094250A1 (en) * 2007-10-09 2009-04-09 Greg Dhuse Ensuring data integrity on a dispersed storage grid
CN103425941A (en) * 2013-07-31 2013-12-04 广东数字证书认证中心有限公司 Cloud storage data integrity verification method, equipment and server
CN105320899A (en) * 2014-07-22 2016-02-10 北京大学 User-oriented cloud storage data integrity protection method
CN106611135A (en) * 2016-06-21 2017-05-03 四川用联信息技术有限公司 Storage data integrity verification and recovery method
RU2017115539A3 (en) * 2017-05-02 2018-11-07

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090094250A1 (en) * 2007-10-09 2009-04-09 Greg Dhuse Ensuring data integrity on a dispersed storage grid
CN103425941A (en) * 2013-07-31 2013-12-04 广东数字证书认证中心有限公司 Cloud storage data integrity verification method, equipment and server
CN105320899A (en) * 2014-07-22 2016-02-10 北京大学 User-oriented cloud storage data integrity protection method
CN106611135A (en) * 2016-06-21 2017-05-03 四川用联信息技术有限公司 Storage data integrity verification and recovery method
RU2017115539A3 (en) * 2017-05-02 2018-11-07

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114416728A (en) * 2021-12-27 2022-04-29 炫彩互动网络科技有限公司 Server archiving and file reading method
CN114629709A (en) * 2022-03-18 2022-06-14 云南鲲之大科技有限公司 Computer network safety system based on distributed big data information interaction
CN116418580A (en) * 2023-04-10 2023-07-11 广东粤密技术服务有限公司 Data integrity protection detection method and device for local area network and electronic equipment
CN116418580B (en) * 2023-04-10 2023-11-24 广东粤密技术服务有限公司 Data integrity protection detection method and device for local area network and electronic equipment
CN117193886A (en) * 2023-11-06 2023-12-08 成都科江科技有限公司 Dynamic loading method for configuration file of industrial control system
CN117193886B (en) * 2023-11-06 2024-01-05 成都科江科技有限公司 Dynamic loading method for configuration file of industrial control system

Also Published As

Publication number Publication date
CN111291046B (en) 2023-07-14

Similar Documents

Publication Publication Date Title
CN111291046B (en) Computer big data storage control system and method
CN110162414B (en) Method and device for realizing artificial intelligent service based on micro-service architecture
CN113961759B (en) Abnormality detection method based on attribute map representation learning
CN110929840A (en) Continuous learning neural network system using rolling window
CN103548003A (en) Processes and methods for client-side fingerprint caching to improve deduplication system backup performance
CN103838847A (en) Data organization method oriented to sea-cloud collaboration network computing network
CN112380067B (en) Metadata-based big data backup system and method in Hadoop environment
CN111708794B (en) Data comparison method and device based on big data platform and computer equipment
Zhang et al. Log sequence anomaly detection based on local information extraction and globally sparse transformer model
CN107391557B (en) Block chain serial query method and system for setting out-of-chain fault table
CN113127633A (en) Intelligent conference management method and device, computer equipment and storage medium
CN116414948A (en) Abnormal data mining method and software product based on cloud data and artificial intelligence
CN107451177B (en) Query method and system for single error-surveying block chain of increased blocks
CN114564726A (en) Software vulnerability analysis method and system based on big data office
CN107463596B (en) Block chain parallel query method and system for setting out-of-chain fault table
CN115408350A (en) Log compression method, log recovery method, log compression device, log recovery device, computer equipment and storage medium
CN115796229A (en) Graph node embedding method, system, device and storage medium
CN114816468A (en) Cloud edge coordination system, data processing method, electronic device and storage medium
CN110399485B (en) Data tracing method and system based on word vector and machine learning
CN117435999A (en) Risk assessment method, apparatus, device and medium
CN104025088B (en) Data block is separated into method and system of multiple streams for compression
CN114760328A (en) Data storage method, system, electronic equipment and storage medium
CN111783133B (en) Network resource management method based on block chain technology
CN112052674A (en) Entity definition extraction method, system, storage medium and server
Liu et al. Secure and controllable data management mechanism for multi‐sensor fusion in internet of things

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant