CN112486941A - Mimicry object storage system based on multiple erasure codes - Google Patents
Mimicry object storage system based on multiple erasure codes Download PDFInfo
- Publication number
- CN112486941A CN112486941A CN202011373786.3A CN202011373786A CN112486941A CN 112486941 A CN112486941 A CN 112486941A CN 202011373786 A CN202011373786 A CN 202011373786A CN 112486941 A CN112486941 A CN 112486941A
- Authority
- CN
- China
- Prior art keywords
- module
- data
- erasure
- metadata
- object storage
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/18—File system types
- G06F16/182—Distributed file systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/004—Error avoidance
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/14—Error detection or correction of the data by redundancy in operation
- G06F11/1402—Saving, restoring, recovering or retrying
- G06F11/1446—Point-in-time backing up or restoration of persistent data
- G06F11/1448—Management of the data involved in backup or backup restore
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/16—File or folder operations, e.g. details of user interfaces specifically adapted to file systems
- G06F16/164—File meta data generation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/17—Details of further file system functions
- G06F16/172—Caching, prefetching or hoarding of files
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/17—Details of further file system functions
- G06F16/174—Redundancy elimination performed by the file system
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/18—File system types
- G06F16/1805—Append-only file systems, e.g. using logs or journals to store data
- G06F16/1815—Journaling file systems
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Quality & Reliability (AREA)
- Human Computer Interaction (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention provides a mimicry object storage system based on multiple erasure codes, which comprises an oral service module, a storage bucket and a plurality of application modules, wherein the oral service module is used for providing an interface function for interaction between a client and the system; the erasure coding module is used for embedding various erasure codes in the system as a function building pool, randomly selecting a plurality of erasure codes from the erasure codes for coding each object, storing the erasure codes into a bottom layer data block server, and recording a coding mode as metadata in the metadata server; the judging module is used for dynamically and randomly determining the erasure codes and the selected number of the file objects in the object storage process; the data service module is used for storing the data block and the check block thereof; the metadata module is used for providing services of metadata storage functions; and the file output judgment module is used for judging whether the output data is consistent with the requirement or not and outputting the data. An object storage system with dynamic heterogeneous redundant network defense capability.
Description
Technical Field
The invention belongs to the field of data storage technology, and particularly relates to a mimicry object storage system based on multiple erasure codes.
Background
Mimicry defense
With the development of the internet, the internet gradually becomes an indispensable part of our lives. The internet has the property of being open and free, and as such, is also highly vulnerable. Therefore, the security of the cyberspace has become an important direction of the research of the internet.
The traditional network space is safe, and a passive defense mode is mostly used. In passive defense, after the attacker completes initial configuration, the attacker can continuously acquire the information of the attacker at any time, and attack the attacker at a proper time. In this mode, the attacker and the victim are in unequal positions, which is a great challenge for the security of the network system. Active Defense methods such as Cyber Mimic Defense (CMD) are produced to improve the adverse conditions of defenders in the secure game.
Mimicry in biology refers to the ecological phenomenon whereby one species gains similar characteristics as another successful organism during evolution, thereby benefiting both or either. If the mimicry is classified according to defensive behavior, the mimicry should belong to an active defense based on endogenous mechanisms, also known as mimicry camouflage (MG). This mimicry camouflage may be referred to as "mimicry Defense" (MD) if it not only confuses natural enemies in color, texture, and shape, but also mimics other creatures in behavior and form.
Network security experts find that if the active defense common in nature is introduced into a network space, a new idea is provided for processing the security problem of the network space, and particularly, the active defense has a remarkable effect when dealing with the current most troublesome security threats, namely uncertain threats such as unknown vulnerabilities, backdoors, Trojan viruses and the like.
The CMD is established on the basis of a biological mimicry defense idea, and shows that under the premise of not changing the service function of an original system, the internal architecture, redundant resources, an operation mechanism, a core algorithm and other environmental factors as well as unknown bugs or trojans and viruses which may be attached to the CMD can make strategic changes in the time and space categories. When facing a mimicry system with a constantly changing operation scene, an attacker is difficult to see the whole configuration of the whole system, so that the process of constructing an attack chain by the attacker is disturbed, and the attack cost is increased.
At the technical level, CMD aims at the whole variety of elements of active defense: the target systems with similarity and unicity are transformed into heterogeneous and diverse systems; the target system with even statics and determinacy is modified into dynamic and random; starting a heterogeneous redundant multimode arbitration mechanism for probing and shielding unknown backdoor bugs; the flexibility of the system is enhanced by adopting a high-reliability architecture.
In cyberspace mimicry defense, a Dynamic Heterogeneous Redundancy (DHR) architecture is one of the principle methods for implementing the Mimicry Defense (MD). The structure consists of an input agent, a heterogeneous component set, a policy scheduling algorithm, an executive set and a multi-mode voter. And the heterogeneous component set and the strategy scheduling form a multi-dimensional dynamic reconfiguration support link of the execution set. The standardized software and hardware modules can combine m kinds of functionally equivalent heterogeneous component body sets, n component bodies are dynamically selected from the set E according to a specific strategy scheduling algorithm to serve as An executive body set (A1, A2, … and An), the system input agent forwards input to each executive body in the current service set, and output vectors of the executive bodies are submitted to a voter for voting to obtain system output.
Object store
An Object Store (OS) is essentially a key-value store that can be accessed using simple interfaces such as put, get, del, and some extended interfaces. In an OS system, objects are similar to files in a file storage system, the objects contain data, but unlike the tree structure of a file system, the OS is flat; and the OS system has no random read-write function, the objects are uploaded and downloaded by using put and get commands aiming at the whole object, and the objects cannot be modified in the system.
The object also comprises metadata, wherein the metadata refers to data for describing the data and mainly comprises attribute information of the data, namely storage time, modification time, data hash value and data storage position. The system can retrieve the position of the data through the metadata, thereby realizing the search of the data.
The OS has the remarkable advantages that the expansibility of the OS is very high, the capacity of dozens of or even hundreds of EBs can be expanded, the whole object storage system is transversely expanded, and as the capacity is increased, data can be distributed to all storage nodes according to an algorithm; the flat design reduces the influence of the directory system on the system efficiency; the access is convenient, the HTTP/HTTPS protocol is supported, and the interface using RESTful can be directly accessed. However, the OS has a disadvantage that the object cannot be directly modified in the system, and if the object is modified, the object must be completely uploaded to the system after the object is locally modified.
Modern object storage systems mostly adopt a distributed architecture, and each node is distributed on different servers. In order to maximize the utilization rate of a disk while preventing data loss, an OS system mostly uses erasure codes to block and encode data to generate some redundant coding blocks. If a block of data is lost or tampered (less in number than the redundant coded blocks), the system can recover the data through an erasure coding algorithm.
Ceph, a distributed storage system that supports object storage. The object store of Ceph mainly contains several important modules:
1) monitoring cluster (Ceph Monitor): the system is used for maintaining the mapping of the cluster state and is also responsible for identity authentication between the management daemon and the client. The Monitor cluster realizes the consistency of own data through the Paxos algorithm
2) Management cluster (Ceph Manager Daemon): and the daemon is responsible for maintaining the runtime indexes and the Ceph cluster state.
3) Object Storage cluster (Ceph Object Storage Daemon): and the daemon process is used for storing data and processing replication, recovery, backfill and balance of the cluster data. Meanwhile, heartbeat information is sent between OSD and monitoring information is provided for Monitor.
4) A Place Group (PG) contains multiple OSDs, introducing the concept of a PG to facilitate the allocation and location of data.
5) The Ceph stores data as objects in a logical storage pool, and the client computes where the objects should be stored using the CRUSH algorithm, which enables the Ceph client to communicate directly with OSDs, rather than through a centralized server or proxy.
In the Ceph storage system, data storage is divided into three mapping processes: firstly, files to be operated by a user are mapped into objects which can be processed by the RADOS, then the objects are mapped into the PG, and after the files are mapped into one or more files, each file needs to be mapped into one PG independently.
1) In Ceph, a stored file is redundantly processed using a single erasure code, but if the file is completely tampered and deleted, Ceph cannot restore the file.
2) Ceph does not set a metadata server or a proxy server, and a client side calculates the storage position of a data block by using a CRUSH algorithm and then directly takes data from the data server.
Swift, OpenStack object store (Swift) is one of the children of the OpenStack open source cloud computing project, providing software that can store and retrieve data via HTTP. The objects are stored in a hierarchical organization structure that provides anonymous read-only access, and the application stores and retrieves data in the object store via the industry standard HTTP RESTful API. The back-end component of the object store follows the same RESTful model.
The object storage mainly comprises the following modules:
1) proxy Server (Proxy Server)
The proxy server is responsible for integrating the Swift architecture, and the user can interact with the Swift by using a standard RESTful HTTP interface.
2) Storage Server (Storage Server)
The storage server is further divided into an Object server, a Container server and an Account server, and is used for storing the binary Object, the processing Object list and the processing Container list respectively.
3) Consistency Server (Consistency Server)
The Consistency Server in Swift is used to find and resolve errors caused by data corruption and hardware failures.
Ring is the most critical component of Swift and is used for recording the mapping relation between the storage object and the real physical position. When referring to querying Account, Container, Object information, Ring information of the cluster needs to be queried. Ring uses Zone, Device, Partition and replay to maintain these mapping information. Each Partition in Ring has 3 replias (by default) in the cluster. The position of each Partition is maintained by Ring and stored in the map. The Ring file is created during system initialization, items in the Ring file are rebalanced every time the storage nodes are increased or decreased, the file is migrated during the process of increasing or decreasing the nodes, and the number of migrated files can be guaranteed to be minimum.
The Swift writes a consistent hash algorithm to build a distributed object storage system with strong expansibility and redundant storage. When the number of cluster nodes changes, the mapping relationship between the key of the Swift of OpenStack and the node changes as little as possible, and the capability is the result of the Swift applying the consistent hash algorithm.
The idea of a consistent hashing algorithm can be described as the following three points. The hash value of each node is calculated firstly, and the hash values are distributed to a circular ring interval of 0-232. The hash value of the object is then calculated using the same calculation method and is also assigned to the ring. And finally, searching in the clockwise direction from the position to which the data is mapped, and storing the data to the first found node. If no node can be found beyond 232, it is saved on the first node.
1) Similar to Ceph, the stored file is redundantly processed by using a single erasure code in Swift, but if the file is completely tampered and deleted, Swift cannot recover the file.
2) The Swift system is a storage system that satisfies "final consistency", that is, files stored in the system are inconsistent for a period of time first, and then can reach consistency. Swift may have latency problems and clients requesting data may access outdated versions.
Disclosure of Invention
The invention aims to provide a mimicry object storage system based on multiple erasure codes, aiming at solving the problem.
The invention is realized in this way, a mimicry object storage system based on multiple erasure codes, the mimicry object storage system based on multiple erasure codes comprises an interface service module, an erasure coding module, a judgment module, a data service module, a metadata module and a file output judgment module, the oral service module is used for providing an interface function for interaction between a client and the system, and a user can inquire, create and delete a storage bucket through the client and can upload, download and delete objects; the erasure coding module is used for embedding a plurality of erasure codes in the system as a function building pool, randomly selecting a plurality of erasure codes from the erasure codes for coding each object, storing the erasure codes into a bottom layer data block server, and recording a coding mode as metadata in the metadata server; the judgment module is used for dynamically and randomly determining the erasure codes and the selected number of the file objects in the object storage process; the data service module is used for storing the data block and the check block thereof; the metadata module is used for providing a service of a metadata storage function; and the file output judgment module is used for judging whether the output data is consistent with the requirement or not and outputting the data.
The further technical scheme of the invention is as follows: when the interface service module is used for interacting at the client, the client sends an accessKey to the interface service module along with data, and the system randomly selects one of two different verification algorithms to verify the accessKey of the user.
The further technical scheme of the invention is as follows: the data service node in the data service module continuously sends heartbeat information to the interface service node; after the interface service node sends the request for locating the data block, if the data block exists in the data service node, the data block is sent to the interface service node.
The further technical scheme of the invention is as follows: the metadata service node in the metadata module provides a Restful Web API for the interface service node, the metadata is expandable, the reading and writing efficiency of the metadata of the system is improved, the metadata node is prevented from going down, and the metadata sends a heartbeat packet to the interface service node through a message queue.
The further technical scheme of the invention is as follows: and a message queue module is arranged between the interface service module and the data service module, and an REST interface is arranged between the interface service module and the data service module for storing and using objects and the objects are communicated through a message queue.
The further technical scheme of the invention is as follows: when a user requests to download the object in the erasure coding module, multiple copies of the source object are obtained in a corresponding decoding mode, and data are returned to the client after the file hash is subjected to multiple judgment in the judgment module.
The further technical scheme of the invention is as follows: after the judging module receives a reading operation instruction of a user, the judging device calls an erasure coding module function, decodes an object, stores the object in a cache queue and compares the content consistency of a plurality of files; if the contents are consistent, selecting one copy to output; if the contents are inconsistent, executing multi-value judgment and eliminating inconsistent data.
The further technical scheme of the invention is as follows: the mimetic object storage system based on the multiple erasure codes further comprises a log module, wherein the log module is used for recording heartbeat information sent and received by each server and recording necessary log information in the process of uploading and downloading objects by a user.
The further technical scheme of the invention is as follows: when the erasure coding module is used for coding, a system dynamically and randomly selects a plurality of erasure codes, each erasure code module divides original data into a plurality of data blocks, generates check data blocks according to different coding algorithms, and randomly stores the original data blocks and the check blocks into different data service nodes.
The further technical scheme of the invention is as follows: when the erasure correcting coding module decodes, the erasure correcting code module randomly selects a plurality of erasure correcting codes according to the judgment module, requests the metadata service node for the metadata information of a decoding object corresponding to the selected erasure correcting codes, and acquires corresponding data blocks and coding blocks from the data service node for decoding.
The invention has the beneficial effects that: the whole system is divided into a plurality of modules, which is convenient for subsequent maintenance, updating and upgrading of the system; the user interface of the system is similar to a conventional object storage system, and the user does not have difficulty in handing the system because the system uses a mimicry architecture.
The whole system provides safety guarantee in multiple aspects: partial data nodes in the system are closed, and the system can continue to operate; the system uses an erasure code mode to store the object, and the multi-copy redundancy is not used, so that the storage space occupied by the object is reduced; during the storage process, the object is divided into a plurality of data blocks and a plurality of check blocks, and after a part of the blocks is lost, the system can still repair the object; the system adopts the dynamic selection of various erasure codes to encode the data, and the system can recover the data as long as one erasure code can correctly decode the data; a judgment module of the system can start multiple judgment values when multiple erasure codes output different objects, and output correct objects. The storage mode not only has the characteristics of erasure code fragmentation storage and data repairable, but also can ensure the security of the file by comparing the consistency of various data recovered by multiple erasure codes. The system realizes the isomerism of data on multiple layers, and the method greatly increases the attack cost of an attacker, so that the system is more difficult to break.
When the user interacts with the system, the HTTPS protocol is used in the whole process.
Drawings
FIG. 1 is a diagram illustrating a pseudo object storage architecture according to an embodiment of the present invention.
Detailed Description
As shown in fig. 1, the multi-erasure code-based mimicry object storage system provided by the present invention is detailed as follows:
the existing object storage system is mostly used for various application scenes such as Content Delivery Network (CDN) data Delivery, data lakes for big data calculation and analysis, data disaster tolerance and the like, and the occurrence of object storage provides great convenience for a large number of Internet companies. However, in recent years, network security accidents frequently occur, and science and technology companies need to have extremely high security as well as mass storage capability of a storage system. However, the reality that the loophole is everywhere and the backdoor is difficult to avoid tells us that the loophole-free backdoor-free environment formed by the network space is difficult to realize from the technical point of view.
The invention discloses a mimicry object storage system based on multiple erasure codes, which is a set of distributed object storage system with dynamic heterogeneous redundant network defense capability under the guidance of CMD idea. It allows "known unknown" risks and "unknown" threats to exist in the cyberspace as in "beach buildings". The distributed object storage system has the advantages of expandability and load balancing of the traditional distributed object storage system, and provides active defense capability for the object storage system.
In summary, the technical problems to be solved by the present invention are: under the guidance of a network space mimicry defense idea, a mimicry object storage system of a dynamic heterogeneous redundant architecture is constructed, and the system has an active defense capability.
Integrated framework
The invention adopts a distributed object storage system without directory hierarchy and data format limitation.
The invention adopts the framework that the interface and the data storage are separated, the interface and the data storage become mutually independent service nodes, the interface and the data storage cooperate to provide object storage service, and the interface service node or the data service node can be added in the cluster to realize distribution and expandability.
There are three services in the system: interface services, data services, and metadata services. The interface service layer provides REST interface for the outside, and the data service layer provides data storage function. The interface service processes the client's request and then accesses the object to the data service and the metadata to the metadata service. The data service processes requests from the interface service and accesses objects on the local disk. The metadata service processes requests from the interface service and stores metadata for the resources on the local disk.
There are two types of interfaces between interface services and data services, the first interface enabling access to objects. Access to objects uses the REST interface. That is, the data service itself also provides the REST interface, in which case the interface service node acts as an HTTP client requesting objects from the data service.
Client terminal
The system customizes different types of clients for the user as follows:
1) client capable of running on android system
2) Web client capable of accessing storage system through browser
The user can use the unique user identification UID at different clients to perform related operations on all files under the identification.
Interface service module
The interface service node provides an interface function for interaction between the client and the system, and a user can inquire, create and delete the storage bucket through the client; the uploading, downloading and deleting operations of the objects can be carried out.
When the client interacts with the system, the client sends an accessKey to the interface service module along with data, the system randomly selects one of two different verification algorithms to verify the accessKey of the user, and the two verification algorithms are used to improve the safety of the system.
The system adopts the LVS technology to realize the requirement of load balancing of the interface service module, thereby increasing the throughput of the system. The LVS is a shorthand of a Linux Virtual Server, namely a Linux Virtual Server, and is a Virtual Server cluster system, and the LVS cluster realizes an IP load balancing technology and a content-based request distribution technology. The dispatcher uniformly transfers the requests to different servers for execution and can shield the servers with background faults, so that a group of servers form a high-performance and high-availability server cluster.
The LVS server can enable the client to use the LVS server as a single point of connection, and the processing and storage capacity of a whole server cluster at the back end can be obtained only by connecting the LVS server, so that the expansibility and the usability of the system can be greatly improved, the safety of the service can be provided, and other services isolated from the server cannot be damaged by singly invading one server.
And maintaining the information of the data service nodes in the interface service nodes, and when the heartbeat information from a certain data service node is not received any more, considering that the node is down and deleting the information of the node. When the interface service node receives the downloading request, the interface service node firstly requests the metadata service node for the metadata information of the downloaded object, and then requests the data service node for the positioning data block according to the metadata information.
Data service module
The data service node is used for storing the data block and the check block thereof. The server cluster uses object addressing, each storage object has a unique content identification CID, and all nodes can acquire the storage objects through the CIDs. The storage cluster can flexibly cope with cluster expansion and retraction, and data migration, load balancing and fault recovery can be completed autonomously.
The data service node continuously sends heartbeat information to the interface service node; after the interface service node sends the request for locating the data block, if the data block exists in the data service node, the data block is sent to the interface service node.
The data service module provides a layer of security guarantee for the system:
1) if the system is assumed to have n data service nodes, if no less than k data service nodes work normally, the whole system can work normally.
Metadata service module
Like the data service node, the metadata service is a service that provides a metadata storage function. Metadata refers to descriptive information of an object such as name, version, size, and hash value.
When a user sends various requests to the interface server from the system, the interface server firstly initiates a request to the metadata server to obtain corresponding metadata, and then requests data from the data service node.
The metadata service node provides Restful Web API for the interface service node, the metadata is extensible, the reading and writing efficiency of the metadata of the system is improved, the metadata node is prevented from going down, and the metadata sends heartbeat packets to the interface service node through the message queue.
Message queue module
The message queue module exists between the interface service module and the data service module. There are two types of interfaces between interface services and data services: the storage of objects uses the REST interface, the second interface communicates through message queues. Different interfaces can be used for meeting different requirements, the REST interface can process the large data volume transmission of object storage, and the message queue can process the mass sending and single sending of heartbeat information and positioning information.
Erasure coding module
In order to meet the mimicry characteristic, a plurality of erasure codes are embedded in the system to serve as a function building pool, each object randomly selects a plurality of erasure codes to code and then stores the erasure codes into a bottom layer data block server, and the coding mode is used as metadata and recorded in the metadata server. When a user requests to download an object, multiple copies of a source object are obtained through a corresponding decoding mode, and finally, value multi-judgment (value multi-judgment is a multi-mode redundancy judgment model, and most of output is used as correct output of a judgment system) is realized in a judgment module through file hash and then data is returned to a client.
RS erasure codes, Cauchy RS erasure codes, binary Reed-Solomon codes (BRS for short), and liberty RAID-6 codes, Blaum-Roth codes, and Liber8 codes based on the original Van der Monte matrix can be used in the present invention.
During coding, a system dynamically and randomly selects a plurality of erasure codes, each erasure code module divides original data into a plurality of data blocks, then generates check data blocks according to different coding algorithms, and finally randomly stores the original data blocks and the check blocks into different data service nodes.
During decoding, the erasure code module randomly selects a plurality of erasure codes according to the decision module, requests the metadata service node for the metadata information of a decoding object corresponding to the selected erasure code, then acquires the corresponding data block and the coding block from the data service node, and finally performs decoding. If the block is lost or the system is tampered with, the system is repaired during the decoding process, and if the block cannot be repaired, the next erasure code is selected for decoding.
The dynamic, heterogeneous and redundant characteristics of the erasure code module provide two layers of guarantee for system safety:
1) an object is encoded by an erasure code in the system and then divided into n blocks, and the system can repair the object if k blocks are tampered or deleted.
2) The system encodes the object by using m erasure codes, and if m-1 of the erasure codes cannot decode the object, the system can correctly recover the object as long as one erasure code can be successfully decoded.
Decision module
The decision layer dynamically and randomly decides the erasure codes and the selected number of the file objects in the object storage process;
the design idea of the judgment module is derived from a redundancy voter in a mimicry security defense theory, the probability that heterogeneous systems have the same vulnerability at the same time and are attacked at the same time is very small, and for the whole system, the same file in most heterogeneous systems cannot be changed at the same time, so that when the content or the state of the file are in different states, the system takes the most of the same states as the standard, and a few of changed files are repaired. The cause of the file change may be an exception of the storage system itself or a disk IO error, or may be an external attack. In either state, the decision log generated due to file inconsistencies will be permanently saved for archiving and administrator querying. Meanwhile, the abnormal and repairing processes generated by the judgment are displayed in the management monitoring interface in real time, so that an administrator can analyze and process the judgment state of the system in real time.
After receiving a read operation instruction of a user, the judger calls an erasure coding module function, decodes an object, stores the object in a cache queue, and compares the content consistency of a plurality of files. If the contents are consistent, selecting one copy of the contents for output; if the contents of the files are inconsistent, the execution values are judged more, inconsistent data are eliminated, and the consistency of output results is ensured.
Log module
The log module records heartbeat information sent and received by each server; the module can also record necessary log information in the process of uploading and downloading the object by the user, such as the request and parameters sent by the user to the system, the storage position of the object, the fragment information of the object and the like, and the logs are helpful for operation and maintenance personnel to find out the problem of the system in time. If an intruder is present, a corresponding trace is left in the log.
The present system uses an autonomously developed logging module.
User registration and login module
When a user registers, the system requires to input fingerprint, face and identity card information besides conventional information such as an account number, a password and the like, so as to ensure that each user is of a real name. The fingerprint, the face and the identity card are stored in the server as the information of the account.
The system can set whether the user needs to verify the face or not when logging in.
The invention designs a mimicry safe object storage system which is an object storage system with dynamic heterogeneous redundant network defense capability, wherein the whole system is divided into a plurality of modules, so that the subsequent maintenance, updating and upgrading of the system are facilitated; the user interface of the system is similar to a conventional object storage system, and the user does not have difficulty in handing the system because the system uses a mimicry architecture.
The whole system provides safety guarantee in multiple aspects: partial data nodes in the system are closed, and the system can continue to operate; the system uses an erasure code mode to store the object, and the multi-copy redundancy is not used, so that the storage space occupied by the object is reduced; during the storage process, the object is divided into a plurality of data blocks and a plurality of check blocks, and after a part of the blocks is lost, the system can still repair the object; the system adopts the dynamic selection of various erasure codes to encode the data, and the system can recover the data as long as one erasure code can correctly decode the data; a judgment module of the system can start multiple judgment values when multiple erasure codes output different objects, and output correct objects. The storage mode not only has the characteristics of erasure code fragmentation storage and data repairable, but also can ensure the security of the file by comparing the consistency of various data recovered by multiple erasure codes. The system realizes the isomerism of data on multiple layers, and the method greatly increases the attack cost of an attacker, so that the system is more difficult to break.
When the user interacts with the system, the HTTPS protocol is used in the whole process.
The invention provides a mimicry object storage system which has a dynamic heterogeneous redundant architecture, has extremely high safety, not only has a passive defense module common in other conventional storage systems such as an HTTPS transmission protocol, user authentication and the like, but also has a mimicry defense capability.
In particular implementations, the system may use RS erasure codes based on the original vandermonde matrix, Cauchy RS erasure codes, binary Reed-Solomon codes (BRS for short), and libertation RAID-6 encoding, Blaum-Roth encoding, and Liber 8-tion encoding. When the object is stored, the system can dynamically and randomly select a plurality of erasure code algorithms to perform redundant storage on the file, which is different from common multi-copy redundancy and single erasure code coding storage. The storage mode not only has the characteristics of erasure code fragmentation storage and data repairable, but also can ensure the security of the file by comparing the consistency of various data recovered by multiple erasure codes. The system realizes the isomerism of data on multiple layers, and the method greatly increases the attack cost of an attacker, so that the system is more difficult to break. The system core technology architecture is shown in fig. 1.
1. When the client interacts with the system, each message sent by the client to the system is attached with the accessKey, and the system randomly selects one of the two verification algorithms to verify the accessKey.
2. When a user uploads data, the system dynamically selects various erasure codes to encode the data, and the encoded data is randomly stored in the data nodes.
3. When a user downloads data, the system firstly acquires n erasure code types used during uploading, dynamically selects k types of erasure codes (3< ═ k < ═ n, and k is an odd number), and respectively decodes the data.
4. When outputting data, the system compares the consistency of the decoded data, and if the consistency is consistent, one data is randomly selected to be output; otherwise, the system starting value is judged more and a data output is randomly selected.
5. When the data judgment is inconsistent, the data in the system is tampered or lost, and the system firstly recovers the data by using an erasure code k1 corresponding to the data;
6. if the data cannot be recovered, the system recovers the complete file by using other erasure codes (erasure codes k2, k3, etc.), and then regenerates the data block corresponding to erasure code k1 from the file.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents and improvements made within the spirit and principle of the present invention are intended to be included within the scope of the present invention.
Claims (10)
1. A mimicry object storage system based on multiple erasure codes is characterized by comprising an interface service module, an erasure code module, a judgment module, a data service module, a metadata module and a file output judgment module, wherein the interface service module is used for providing an interface function for interaction between a client and the system, and a user can inquire, create and delete a storage bucket through the client and can upload, download and delete objects; the erasure coding module is used for embedding a plurality of erasure codes in the system as a function building pool, randomly selecting a plurality of erasure codes from the erasure codes for coding each object, storing the erasure codes into a bottom layer data block server, and recording a coding mode as metadata in the metadata server; the judgment module is used for dynamically and randomly determining the erasure codes and the selected number of the file objects in the object storage process; the data service module is used for storing the data block and the check block thereof; the metadata module is used for providing a service of a metadata storage function; and the file output judgment module is used for judging whether the output data is consistent with the requirement or not and outputting the data.
2. The multi-erasure-code-based mimicry object storage system of claim 1, wherein when the interface service module performs interaction with the client, the client sends an accessKey to the interface service module along with the data, and the system randomly selects one of two different authentication algorithms to authenticate the accessKey of the user.
3. The multi-erasure code-based mimicry object storage system of claim 2, wherein a data service node in the data service module continuously sends heartbeat information to an interface service node; after the interface service node sends the request for locating the data block, if the data block exists in the data service node, the data block is sent to the interface service node.
4. The multi-erasure code-based mimicry object storage system of claim 3, wherein the metadata service node in the metadata module provides Restful Web API to the interface service node, the metadata is extensible, the system metadata read-write efficiency is improved, the metadata node is prevented from going down, and the metadata sends a heartbeat packet to the interface service node through a message queue.
5. The multi-erasure-code-based mimicry object storage system of claim 4, wherein a message queue module is disposed between the interface service module and the data service module, and the object storage disposed between the interface service module and the data service module uses an REST interface and communicates through a message queue.
6. The multi-erasure code-based mimicry object storage system of claim 5, wherein when a user requests to download an object in the erasure coding module, multiple copies of a source object are obtained in a corresponding decoding manner, and data is returned to a client after a decision module performs multi-decision on a value of file hashing.
7. The multi-erasure-code-based mimicry object storage system of claim 6, wherein after the decision module receives a read operation instruction from a user, the decision device calls an erasure-coding module function, decodes an object, stores the object in a cache queue, and compares the content consistency of a plurality of files; if the contents are consistent, selecting one copy to output; if the contents are inconsistent, executing multi-value judgment and eliminating inconsistent data.
8. The multi-erasure code-based mimicry object storage system of claim 7, wherein the multi-erasure code-based mimicry object storage system further comprises a log module, wherein the log module is configured to record heartbeat information sent and received by each server, and record necessary log information during the process of uploading and downloading objects by a user.
9. The multi-erasure-code-based mimicry object storage system of claim 8, wherein the erasure-coding module dynamically and randomly selects a plurality of erasure codes during coding, each erasure-coding module divides original data into a plurality of data blocks, generates check data blocks according to different coding algorithms, and randomly stores the original data blocks and the check blocks into different data service nodes.
10. The multi-erasure-code-based mimicry object storage system of claim 9, wherein when the erasure-coding module decodes, the erasure-coding module randomly selects a plurality of erasure codes according to the decision module, requests the metadata service node for metadata information of a decoding object corresponding to the selected erasure code, and obtains a corresponding data block and a coding block from the data service node for decoding.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011373786.3A CN112486941A (en) | 2020-11-30 | 2020-11-30 | Mimicry object storage system based on multiple erasure codes |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011373786.3A CN112486941A (en) | 2020-11-30 | 2020-11-30 | Mimicry object storage system based on multiple erasure codes |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112486941A true CN112486941A (en) | 2021-03-12 |
Family
ID=74937645
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011373786.3A Pending CN112486941A (en) | 2020-11-30 | 2020-11-30 | Mimicry object storage system based on multiple erasure codes |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112486941A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113449065A (en) * | 2021-06-29 | 2021-09-28 | 苏州链约科技有限公司 | Data deduplication-oriented decentralized storage method and storage device |
CN114936188A (en) * | 2022-05-30 | 2022-08-23 | 重庆紫光华山智安科技有限公司 | Data processing method and device, electronic equipment and storage medium |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105159603A (en) * | 2015-08-18 | 2015-12-16 | 福建省海峡信息技术有限公司 | Repair method for distributed data storage system |
CN107154945A (en) * | 2017-05-31 | 2017-09-12 | 中南大学 | A kind of cloudy fragmentation method for secure storing and system based on correcting and eleting codes |
-
2020
- 2020-11-30 CN CN202011373786.3A patent/CN112486941A/en active Pending
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105159603A (en) * | 2015-08-18 | 2015-12-16 | 福建省海峡信息技术有限公司 | Repair method for distributed data storage system |
CN107154945A (en) * | 2017-05-31 | 2017-09-12 | 中南大学 | A kind of cloudy fragmentation method for secure storing and system based on correcting and eleting codes |
Non-Patent Citations (1)
Title |
---|
冯馨悦: "面向拟态增强的分布式存储系统设计", 《中国优秀硕士学位论文全文数据库 信息科技辑》 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113449065A (en) * | 2021-06-29 | 2021-09-28 | 苏州链约科技有限公司 | Data deduplication-oriented decentralized storage method and storage device |
CN114936188A (en) * | 2022-05-30 | 2022-08-23 | 重庆紫光华山智安科技有限公司 | Data processing method and device, electronic equipment and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11500788B2 (en) | Logical address based authorization of operations with respect to a storage system | |
US10108484B2 (en) | Detecting storage errors in a dispersed storage network | |
US10416889B2 (en) | Session execution decision | |
US20210216648A1 (en) | Modify Access Restrictions in Response to a Possible Attack Against Data Stored by a Storage System | |
US9329940B2 (en) | Dispersed storage having a plurality of snapshot paths and methods for use therewith | |
US8185614B2 (en) | Systems, methods, and apparatus for identifying accessible dispersed digital storage vaults utilizing a centralized registry | |
US9542239B2 (en) | Resolving write request conflicts in a dispersed storage network | |
US9514132B2 (en) | Secure data migration in a dispersed storage network | |
US9274890B2 (en) | Distributed storage network memory access based on memory state | |
US7533291B2 (en) | System and method for storing a data file backup | |
US9665429B2 (en) | Storage of data with verification in a dispersed storage network | |
US10652350B2 (en) | Caching for unique combination reads in a dispersed storage network | |
US20140143367A1 (en) | Robustness in a scalable block storage system | |
US10437673B2 (en) | Internet based shared memory in a distributed computing system | |
US10067831B2 (en) | Slice migration in a dispersed storage network | |
CN112486941A (en) | Mimicry object storage system based on multiple erasure codes | |
Yu et al. | On distributed object storage architecture based on mimic defense | |
CN109154880B (en) | Consistent storage data in a decentralized storage network | |
US20210382992A1 (en) | Remote Analysis of Potentially Corrupt Data Written to a Storage System | |
EP4231168A1 (en) | Mimic storage system and method for data security of industrial control system | |
US20240281544A1 (en) | Multi-Party Authorization for Requests Initiated by a Storage Management System | |
US10547615B2 (en) | Security response protocol based on security alert encoded data slices of a distributed storage network | |
US20190197032A1 (en) | Preventing unnecessary modifications, work, and conflicts within a dispersed storage network | |
CN118939472A (en) | Data management method and related equipment | |
de Oliveira Libório | Privacy-Enhanced Dependable and Searchable Storage in a Cloud-of-Clouds |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20210312 |
|
RJ01 | Rejection of invention patent application after publication |