CN113688115A - File big data distributed storage system based on Hadoop - Google Patents
File big data distributed storage system based on Hadoop Download PDFInfo
- Publication number
- CN113688115A CN113688115A CN202111000510.5A CN202111000510A CN113688115A CN 113688115 A CN113688115 A CN 113688115A CN 202111000510 A CN202111000510 A CN 202111000510A CN 113688115 A CN113688115 A CN 113688115A
- Authority
- CN
- China
- Prior art keywords
- node
- slave
- nodes
- data
- storage system
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000013500 data storage Methods 0.000 claims abstract description 17
- 230000005540 biological transmission Effects 0.000 claims abstract description 4
- 238000004364 calculation method Methods 0.000 claims description 15
- 230000006870 function Effects 0.000 claims description 15
- 238000013507 mapping Methods 0.000 claims description 14
- 230000001419 dependent effect Effects 0.000 claims description 8
- 238000000034 method Methods 0.000 claims description 4
- 238000007726 management method Methods 0.000 claims description 3
- 238000003825 pressing Methods 0.000 claims description 3
- 238000004422 calculation algorithm Methods 0.000 abstract description 7
- 230000004927 fusion Effects 0.000 description 3
- 238000004458 analytical method Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/18—File system types
- G06F16/182—Distributed file systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/0703—Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
- G06F11/0706—Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment
- G06F11/0727—Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment in a storage system, e.g. in a DASD or network based storage system
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/22—Indexing; Data structures therefor; Storage structures
- G06F16/2228—Indexing structures
- G06F16/2246—Trees, e.g. B+trees
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/22—Indexing; Data structures therefor; Storage structures
- G06F16/2228—Indexing structures
- G06F16/2255—Hash tables
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5005—Allocation of resources, e.g. of the central processing unit [CPU] to service a request
- G06F9/5011—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5083—Techniques for rebalancing the load in a distributed system
Abstract
The invention discloses a file big data distributed storage system based on Hadoop, which comprises a client, a control node with a master-slave structure and a plurality of slave nodes, wherein the client is connected with the control node; the control node is used for controlling the plurality of slave nodes, the slave nodes are used for data storage and data processing, and the control node carry out information transmission through a TCP/IP protocol; the system uses the HDFS distributed storage system to relieve the storage cost of enterprises; by adopting a Hash algorithm and improving a random data placement strategy stored by the HDFS block, the problem of data inclination under the conditions of capacity expansion and downtime of the storage system is effectively solved, and the stability of the storage system is improved.
Description
The technical field is as follows:
the invention belongs to a distributed storage system, and particularly relates to a file big data distributed storage system based on Hadoop.
Background art:
at present, many domestic companies use databases such as mysql, sqlserver, oracle and the like, the databases adopt a parallel storage scheme, but the support quantity of database servers is limited, expansion and expansion can be carried out on expansion of the databases only through hardware upgrading in the longitudinal direction, and expansion of the quantity of independent databases cannot be transversely increased to realize expansion of the capacity of the databases.
Compared with the traditional database, the multiple databases are connected in parallel for storage, the storage capacity and the calculation analysis capacity can be expanded on the original architecture, but the parallel databases separate calculation and storage and cannot simultaneously perform calculation analysis on the stored data, and due to bandwidth limitation, bandwidth competition exists in the data access process of the multiple parallel databases, so that bandwidth bottleneck occurs. Meanwhile, when capacity expansion is carried out, static shutdown is needed, then expansion storage is carried out, and data is redistributed. If the data suddenly increases in a certain period, the increase speed of the information data is greater than the upgrading speed of the database hardware, so that seamless connection of suddenly increasing data storage through capacity expansion cannot be met, the capacity of the database is insufficient, and the service quality and the timeliness requirements of customers are affected.
The load balance and the processing speed of the storage system under different scenes are ensured, an HDFS distributed system data storage strategy is generally adopted, and a distributed file system HDFS default data storage strategy under a Hadoop big data platform considers the aspects of data storage efficiency, balanced distribution as much as possible, data reliability and the like at the beginning of design, but the mode that the HDFS randomly selects a storage node to store data easily generates the following two problems: data distribution imbalance is easily caused, and data node hardware difference is not considered in default placement strategy
When the HDFS distributed system stores data, a strategy of randomly placing data blocks is adopted by default so as to achieve balanced distribution of the data in the nodes. When adding and deleting operations are performed on the storage nodes of the system, data distribution cannot be automatically adjusted, and the requirement for load balancing of the system cannot be met.
The HDF distributed system stores data information on the premise of cluster isomorphism, and hardware differences of data nodes in a cluster are not optimized correspondingly during design, so that disk spaces of some data nodes cannot be fully utilized, and the efficiency of the whole storage system is influenced.
Disclosure of Invention
Aiming at the problems that the existing distributed storage system is unbalanced in data distribution and the overall storage system efficiency is affected by the hardware difference of data nodes of the distributed storage system, the invention provides a file big data distributed storage system based on Hadoop, and the storage cost of an enterprise is relieved by using the HDFS distributed storage system. By adopting a Hash algorithm and improving a random data placement strategy stored by an HDFS block, the problem of data inclination under the conditions of capacity expansion and downtime of a storage system is effectively solved, the stability of the storage system is improved, and the system can achieve load balance under isomorphic conditions; the data storage strategy can be optimized by adopting a fusion weighted polling algorithm, the load balance problem of the storage system under heterogeneous conditions is effectively solved, and the hardware resources of the system are fully utilized.
The technical scheme adopted by the invention for solving the technical problems is as follows:
a file big data distributed storage system based on Hadoop comprises a client, a control node with a master-slave structure and a plurality of slave nodes; the control node is used for controlling the plurality of slave nodes, the slave nodes are used for data storage and data processing, and the control node carry out information transmission through a TCP/IP protocol; the client is used for receiving a map function and a reduce function configured by a user, wherein the map function is used for operation management of a key/value pair, a large-scale calculation task is decomposed into a plurality of subtasks, the subtasks are distributed to the dependent nodes, calculation results are obtained by using calculation resources of the dependent nodes, the reduce function is used for merging all value values with the same key value, and the output key value is a final result;
when the data to be processed is stored in the slave node, the control node determines the appointed slave node by adopting a Hash model, wherein the Hash model is as follows:
target=getHashCode(request.IPNum)&nodeNum
wherein getHashCode () represents a string operation hash value function, request.IPNum represents a request IP address, nodeNum represents the total number of slave nodes, and the slave nodes corresponding to numbers can be matched according to the value of target.
Further, a plurality of virtual layers are arranged between the slave nodes and the data to be processed, a first virtual layer is mapped to a first slave node, the first virtual layer comprises a plurality of virtual nodes, and the data to be processed is stored in the slave nodes through the virtual nodes.
Further, the mapping model of the virtual node and the slave node is as follows:
wherein, P is the initial hash position of the slave node, P' is the hash position of the mapping node, N is the number of slave nodes, and k is 0,1,2.. N.
Further, the mapping model of the virtual node and the slave node may also be:
wherein h represents the hash result of the characteristic character string, l represents the length of the character string, and w represents the weight of the character in the hash operation.
Further, the control node may be used in a load balancing server, after processing a data storage request sent by the client, collect load states fed back by the slave nodes in real time, distribute the request to the slave nodes according to weights, the slave nodes distribute data according to the request of the control node, send a storage result to the control node, and finally send a result indicating whether the data distribution is completed to the client by the control node.
Further, the load state is a disk usage rate, and the disk usage rate model is:
wherein, UtotalIs disk space, UfreeThe disk has space left.
Further, after receiving a data storage request, the control node starts to collect the disk utilization rate of the current slave node; the control node calculates the average disk utilization rate of the slave nodes, and if the disk utilization rate of a second slave node is greater than or equal to the average disk utilization rate, the weight of the second slave node is set to be 0; and if the disk utilization rate of the second slave node is less than the average disk utilization rate, calculating the weight of the second slave node according to the disk utilization rate, and executing a polling distribution task according to the weight.
Further, the client may execute a parallel query command through the control node.
Further, the parallel query process is as follows:
the client side sends an SQL command to the control node through a script or command interface;
step (2) the SQL interface analyzes the command according to the query keyword to generate a virtual root node, outputs fields as leaf nodes and constructs a query task tree; creating an attribute value for each relation table node on the query task tree, marking a query target, reading a file mapping table, and adding file information into the attribute value; and outputting the task tree to an optimizer.
The optimizer searches metadata according to the file information recorded by the attribute value, acquires the corresponding data block position information, and adds the corresponding data block position information into the attribute value; converting the task tree into operation aiming at the data block, and adjusting the operation sequence according to the judgment condition, the data volume, the field size and other factors; merging operation units by taking the slave nodes as units, and sequentially pressing the operation units into a queue to generate an operation list;
combining the operation lists of all the subordinate nodes into a query plan, and outputting the query plan to a distributor; the distributor distributes the operation list to an executor on a corresponding slave node according to the IP address of the node;
and (5) reading the operation list by the actuator, sequentially executing all operation commands, acquiring query results from a local memory, uploading the query results to the distributor by the actuator, summarizing calculation results by the distributor, and returning the summarized query results to the client.
The invention has the following beneficial effects:
the system uses the HDFS distributed storage system to relieve the storage cost of enterprises. By adopting a Hash algorithm and improving a random data placement strategy stored by an HDFS block, the problem of data inclination under the conditions of capacity expansion and downtime of a storage system is effectively solved, the stability of the storage system is improved, and the system can achieve load balance under isomorphic conditions; the data storage strategy can be optimized by adopting a fusion weighted polling algorithm, the load balance problem of the storage system under heterogeneous conditions is effectively solved, and the hardware resources of the system are fully utilized.
The foregoing description is only an overview of the technical solutions of the present invention, and in order to make the technical means of the present invention more clearly understood, the present invention may be implemented in accordance with the content of the description, and in order to make the above description and other objects, features, and advantages of the present invention more clearly understandable, preferred embodiments are specifically described below.
Drawings
Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the invention. Also, like reference numerals are used to refer to like parts throughout the drawings. In the drawings:
FIG. 1 is a diagram of a distributed storage system structure for big data of files based on Hadoop
FIG. 2 is a schematic diagram of a distributed storage system based on virtual nodes according to the present invention
Detailed Description
Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
In the description of the present invention, unless otherwise expressly specified or limited, the terms "mounted," "connected," "secured," and the like are to be construed broadly and can, for example, be connected or detachably connected or integrated; can be mechanically or electrically connected; either directly or indirectly through intervening media, either internally or in any other relationship. The specific meanings of the above terms in the present invention can be understood by those skilled in the art according to specific situations.
A file big data distributed storage system based on Hadoop comprises a client, as shown in figure 1, a control node and a plurality of slave nodes, wherein the control node is of a master-slave structure; the control node is used for controlling the plurality of slave nodes, the slave nodes are used for data storage and data processing, and the control node carry out information transmission through a TCP/IP protocol; the client is used for receiving a map function and a reduce function configured by a user, wherein the map function is used for operation management of a key/value pair, a large-scale calculation task is decomposed into a plurality of subtasks, the subtasks are distributed to the dependent nodes, calculation results are obtained by using calculation resources of the dependent nodes, the reduce function is used for merging all value values with the same key value, and the output key value is a final result;
when the data to be processed is stored in the slave node, the control node determines the appointed slave node by adopting a Hash model, wherein the Hash model is as follows:
target=getHashCode(request.IPNum)&nodeNum
wherein getHashCode () represents a string operation hash value function, request.IPNum represents a request IP address, nodeNum represents the total number of slave nodes, and the slave nodes corresponding to numbers can be matched according to the value of target.
Further, a plurality of virtual layers are arranged between the slave nodes and the data to be processed, a first virtual layer is mapped to a first slave node, the first virtual layer comprises a plurality of virtual nodes, and the data to be processed is stored in the slave nodes through the virtual nodes.
Further, as shown in fig. 2, the mapping model of the virtual node and the dependent node is as follows:
wherein, P is the initial hash position of the slave node, P' is the hash position of the mapping node, N is the number of slave nodes, and k is 0,1,2.. N. The previous hash model can be replaced by a mapping model of the virtual nodes and the dependent nodes to realize balanced storage.
Further, the mapping model of the virtual node and the slave node may also be:
wherein h represents the hash result of the characteristic character string, l represents the length of the character string, and w represents the weight of the character in the hash operation.
Further, the control node may be used in a load balancing server, after processing a data storage request sent by the client, collect load states fed back by the slave nodes in real time, distribute the request to the slave nodes according to weights, the slave nodes distribute data according to the request of the control node, send a storage result to the control node, and finally send a result indicating whether the data distribution is completed to the client by the control node.
Further, the load state is a disk usage rate, and the disk usage rate model is:
wherein, UtotalIs the space of the magnetic disk,Ufreethe disk has space left.
Further, after receiving a data storage request, the control node starts to collect the disk utilization rate of the current slave node; the control node calculates the average disk utilization rate of the slave nodes, and if the disk utilization rate of a second slave node is greater than or equal to the average disk utilization rate, the weight of the second slave node is set to be 0; and if the disk utilization rate of the second slave node is less than the average disk utilization rate, calculating the weight of the second slave node according to the disk utilization rate, and executing a polling distribution task according to the weight.
Further, the client may execute a parallel query command through the control node.
Further, the parallel query process is as follows:
the client side sends an SQL command to the control node through a script or command interface;
step (2) the SQL interface analyzes the command according to the query keyword to generate a virtual root node, outputs fields as leaf nodes and constructs a query task tree; creating an attribute value for each relation table node on the query task tree, marking a query target, reading a file mapping table, and adding file information into the attribute value; and outputting the task tree to an optimizer.
The optimizer searches metadata according to the file information recorded by the attribute value, acquires the corresponding data block position information, and adds the corresponding data block position information into the attribute value; converting the task tree into operation aiming at the data block, and adjusting the operation sequence according to the judgment condition, the data volume, the field size and other factors; merging operation units by taking the slave nodes as units, and sequentially pressing the operation units into a queue to generate an operation list;
combining the operation lists of all the subordinate nodes into a query plan, and outputting the query plan to a distributor; the distributor distributes the operation list to an executor on a corresponding slave node according to the IP address of the node;
and (5) reading the operation list by the actuator, sequentially executing all operation commands, acquiring query results from a local memory, uploading the query results to the distributor by the actuator, summarizing calculation results by the distributor, and returning the summarized query results to the client.
The invention has the advantages that: the system uses the HDFS distributed storage system to relieve the storage cost of enterprises. By adopting a Hash algorithm and improving a random data placement strategy stored by an HDFS block, the problem of data inclination under the conditions of capacity expansion and downtime of a storage system is effectively solved, the stability of the storage system is improved, and the system can achieve load balance under isomorphic conditions; the data storage strategy can be optimized by adopting a fusion weighted polling algorithm, the load balance problem of the storage system under heterogeneous conditions is effectively solved, and the hardware resources of the system are fully utilized.
The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the appended claims.
Claims (9)
1. A distributed storage system of big data of archives based on Hadoop, its characterized in that: the system comprises a client, a control node with a master-slave structure and a plurality of slave nodes; the control node is used for controlling the plurality of slave nodes, the slave nodes are used for data storage and data processing, and the control node carry out information transmission through a TCP/IP protocol; the client is used for receiving a map function and a reduce function configured by a user, wherein the map function is used for operation management of a key/value pair, a large-scale calculation task is decomposed into a plurality of subtasks, the subtasks are distributed to the dependent nodes, calculation results are obtained by using calculation resources of the dependent nodes, the reduce function is used for merging all value values with the same key value, and the output key value is a final result;
when the data to be processed is stored in the slave node, the control node determines the appointed slave node by adopting a Hash model, wherein the Hash model is as follows:
target=getHashCode(request.IPNum)&nodeNum
wherein getHashCode () represents a string operation hash value function, request.IPNum represents a request IP address, nodeNum represents the total number of slave nodes, and the slave nodes corresponding to numbers can be matched according to the value of target.
2. The Hadoop-based archive big data distributed storage system according to claim 1, wherein: and setting a plurality of virtual layers between the slave nodes and the data to be processed, wherein the first virtual layer is mapped to a first slave node, the first virtual layer comprises a plurality of virtual nodes, the data to be processed is stored into the slave nodes through the virtual nodes, and the hash model is replaced by the mapping model of the virtual nodes and the slave nodes.
3. The Hadoop-based archive big data distributed storage system according to claim 2, wherein: the mapping model of the virtual node and the subordinate node is as follows:
wherein, P is the initial hash position of the slave node, P' is the hash position of the mapping node, N is the number of slave nodes, and k is 0,1,2.. N.
4. The Hadoop-based archive big data distributed storage system according to claim 2, wherein: the mapping model of the virtual node and the slave node can also be as follows:
wherein h represents the hash result of the characteristic character string, l represents the length of the character string, and w represents the weight of the character in the hash operation.
5. The Hadoop-based archive big data distributed storage system according to claim 1, wherein: the control node can be used for a load balancing server, after a data storage request sent by the client is processed, the load state fed back by each slave node in real time is collected, the request is distributed to each slave node according to the weight, the slave nodes distribute data according to the request of the control node and send a storage result to the control node, and finally the control node sends the result of whether the data distribution is completed to the client.
7. The Hadoop-based archive big data distributed storage system according to claim 5, wherein: after receiving a data storage request, the control node starts to collect the disk utilization rate of the current slave node; the control node calculates the average disk utilization rate of the slave nodes, and if the disk utilization rate of a second slave node is greater than or equal to the average disk utilization rate, the weight of the second slave node is set to be 0; and if the disk utilization rate of the second slave node is less than the average disk utilization rate, calculating the weight of the second slave node according to the disk utilization rate, and executing a polling distribution task according to the weight.
8. The Hadoop-based archive big data distributed storage system according to claim 1, wherein: the client may execute a parallel query command through the control node.
9. The Hadoop-based archive big data distributed storage system according to claim 9, wherein: the parallel query process comprises the following steps:
the client side sends an SQL command to the control node through a script or command interface;
step (2) the SQL interface analyzes the command according to the query keyword to generate a virtual root node, outputs fields as leaf nodes and constructs a query task tree; creating an attribute value for each relation table node on the query task tree, marking a query target, reading a file mapping table, and adding file information into the attribute value; and outputting the task tree to an optimizer.
The optimizer searches metadata according to the file information recorded by the attribute value, acquires the corresponding data block position information, and adds the corresponding data block position information into the attribute value; converting the task tree into operation aiming at the data block, and adjusting the operation sequence according to the judgment condition, the data volume, the field size and other factors; merging operation units by taking the slave nodes as units, and sequentially pressing the operation units into a queue to generate an operation list;
combining the operation lists of all the subordinate nodes into a query plan, and outputting the query plan to a distributor; the distributor distributes the operation list to an executor on a corresponding slave node according to the IP address of the node;
and (5) reading the operation list by the actuator, sequentially executing all operation commands, acquiring query results from a local memory, uploading the query results to the distributor by the actuator, summarizing calculation results by the distributor, and returning the summarized query results to the client.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111000510.5A CN113688115B (en) | 2021-08-29 | 2021-08-29 | Archive big data distributed storage system based on Hadoop |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111000510.5A CN113688115B (en) | 2021-08-29 | 2021-08-29 | Archive big data distributed storage system based on Hadoop |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113688115A true CN113688115A (en) | 2021-11-23 |
CN113688115B CN113688115B (en) | 2024-02-20 |
Family
ID=78583716
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111000510.5A Active CN113688115B (en) | 2021-08-29 | 2021-08-29 | Archive big data distributed storage system based on Hadoop |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113688115B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116955306A (en) * | 2023-06-21 | 2023-10-27 | 东莞市铁石文档科技有限公司 | File management system based on distributed storage |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102063486A (en) * | 2010-12-28 | 2011-05-18 | 东北大学 | Multi-dimensional data management-oriented cloud computing query processing method |
CN103929500A (en) * | 2014-05-06 | 2014-07-16 | 刘跃 | Method for data fragmentation of distributed storage system |
CN104065716A (en) * | 2014-06-18 | 2014-09-24 | 江苏物联网研究发展中心 | OpenStack based Hadoop service providing method |
CN105824618A (en) * | 2016-03-10 | 2016-08-03 | 浪潮软件集团有限公司 | Real-time message processing method for Storm |
US20180203866A1 (en) * | 2017-01-17 | 2018-07-19 | Cisco Technology, Inc. | Distributed object storage |
CN108449376A (en) * | 2018-01-31 | 2018-08-24 | 合肥和钧正策信息技术有限公司 | A kind of load-balancing method of big data calculate node that serving enterprise |
CN109740038A (en) * | 2019-01-02 | 2019-05-10 | 安徽芃睿科技有限公司 | Network data distributed parallel computing environment and method |
-
2021
- 2021-08-29 CN CN202111000510.5A patent/CN113688115B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102063486A (en) * | 2010-12-28 | 2011-05-18 | 东北大学 | Multi-dimensional data management-oriented cloud computing query processing method |
CN103929500A (en) * | 2014-05-06 | 2014-07-16 | 刘跃 | Method for data fragmentation of distributed storage system |
CN104065716A (en) * | 2014-06-18 | 2014-09-24 | 江苏物联网研究发展中心 | OpenStack based Hadoop service providing method |
CN105824618A (en) * | 2016-03-10 | 2016-08-03 | 浪潮软件集团有限公司 | Real-time message processing method for Storm |
US20180203866A1 (en) * | 2017-01-17 | 2018-07-19 | Cisco Technology, Inc. | Distributed object storage |
CN108449376A (en) * | 2018-01-31 | 2018-08-24 | 合肥和钧正策信息技术有限公司 | A kind of load-balancing method of big data calculate node that serving enterprise |
CN109740038A (en) * | 2019-01-02 | 2019-05-10 | 安徽芃睿科技有限公司 | Network data distributed parallel computing environment and method |
Non-Patent Citations (2)
Title |
---|
DONGZHAN ZHANG等: "Data Deduplication based on Hadoop", pages 1 - 6, Retrieved from the Internet <URL:《网页在线公开:https://ieeexplore.ieee.org/abstract/document/8026928》> * |
覃伟荣: "Hadoop中改进的共享式存储设备设计", 《计算机工程与设计》, vol. 39, no. 5, pages 1319 - 1325 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116955306A (en) * | 2023-06-21 | 2023-10-27 | 东莞市铁石文档科技有限公司 | File management system based on distributed storage |
CN116955306B (en) * | 2023-06-21 | 2024-04-12 | 东莞市铁石文档科技有限公司 | File management system based on distributed storage |
Also Published As
Publication number | Publication date |
---|---|
CN113688115B (en) | 2024-02-20 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Lakshman et al. | Cassandra: a decentralized structured storage system | |
CN106066896B (en) | Application-aware big data deduplication storage system and method | |
CN102831120B (en) | A kind of data processing method and system | |
CN104023088B (en) | Storage server selection method applied to distributed file system | |
CN105005611B (en) | A kind of file management system and file management method | |
CN104184812B (en) | A kind of multipoint data transmission method based on private clound | |
CN102546782A (en) | Distribution system and data operation method thereof | |
EP2419845A2 (en) | Policy-based storage structure distribution | |
CN110213352A (en) | The unified Decentralized Autonomous storage resource polymerization of name space | |
CN104077423A (en) | Consistent hash based structural data storage, inquiry and migration method | |
CN101196929A (en) | Metadata management method for splitting name space | |
Zhang et al. | Survey of research on big data storage | |
WO2004063928A1 (en) | Database load reducing system and load reducing program | |
CN106570113B (en) | Mass vector slice data cloud storage method and system | |
CN102104617A (en) | Method for storing massive picture data by website operating system | |
CN104111924A (en) | Database system | |
CN113655969B (en) | Data balanced storage method based on streaming distributed storage system | |
CN107590257A (en) | A kind of data base management method and device | |
CN106960011A (en) | Metadata of distributed type file system management system and method | |
CN111708497A (en) | Cloud environment data storage optimization method based on HDFS | |
CN109597903A (en) | Image file processing apparatus and method, document storage system and storage medium | |
CN112947860A (en) | Hierarchical storage and scheduling method of distributed data copies | |
US20080270483A1 (en) | Storage Management System | |
CN113688115B (en) | Archive big data distributed storage system based on Hadoop | |
JP6059558B2 (en) | Load balancing judgment system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
CB02 | Change of applicant information | ||
CB02 | Change of applicant information |
Address after: Room 769, building 2, East Ring Road, Yanqing Park, Zhongguancun, Yanqing District, Beijing 102101 Applicant after: ZHONGDUN innovative digital technology (Beijing) Co.,Ltd. Address before: Room 769, building 2, East Ring Road, Yanqing Park, Zhongguancun, Yanqing District, Beijing 102101 Applicant before: ZHONGDUN innovation archives management (Beijing) Co.,Ltd. |
|
GR01 | Patent grant | ||
GR01 | Patent grant |