CN113688115B - Archive big data distributed storage system based on Hadoop - Google Patents

Archive big data distributed storage system based on Hadoop Download PDF

Info

Publication number
CN113688115B
CN113688115B CN202111000510.5A CN202111000510A CN113688115B CN 113688115 B CN113688115 B CN 113688115B CN 202111000510 A CN202111000510 A CN 202111000510A CN 113688115 B CN113688115 B CN 113688115B
Authority
CN
China
Prior art keywords
node
slave
data
storage system
nodes
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111000510.5A
Other languages
Chinese (zh)
Other versions
CN113688115A (en
Inventor
王佩
李帅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhongdun Innovative Digital Technology Beijing Co ltd
Original Assignee
Zhongdun Innovative Digital Technology Beijing Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhongdun Innovative Digital Technology Beijing Co ltd filed Critical Zhongdun Innovative Digital Technology Beijing Co ltd
Priority to CN202111000510.5A priority Critical patent/CN113688115B/en
Publication of CN113688115A publication Critical patent/CN113688115A/en
Application granted granted Critical
Publication of CN113688115B publication Critical patent/CN113688115B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/182Distributed file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0706Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment
    • G06F11/0727Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment in a storage system, e.g. in a DASD or network based storage system
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • G06F16/2246Trees, e.g. B+trees
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • G06F16/2255Hash tables
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5011Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5083Techniques for rebalancing the load in a distributed system

Abstract

The invention discloses a file big data distributed storage system based on Hadoop, which comprises a client, a control node of a master-slave structure and a plurality of slave nodes; the control node is used for controlling the plurality of slave nodes, the slave nodes are used for data storage and data processing, and the control node carry out information transmission through a TCP/IP protocol; the system uses the HDFS distributed storage system to relieve the storage cost of enterprises; by adopting a hash algorithm and improving a random placement data strategy stored by an HDFS block, the problem of data inclination of a storage system under the conditions of capacity expansion and downtime is effectively solved, and the stability of the storage system is improved.

Description

Archive big data distributed storage system based on Hadoop
Technical field:
the invention belongs to a distributed storage system, and particularly relates to a Hadoop-based archive big data distributed storage system.
The background technology is as follows:
at present, a plurality of domestic companies use mysql, sqlserver, oracle and other databases, and the databases adopt a parallel storage scheme, but the number of the database servers is limited, and the capacity of the databases can be expanded only through the longitudinal direction, namely hardware upgrading, but can not be expanded by transversely increasing the number of independent databases.
Compared with the traditional databases, the multiple databases are connected in parallel for storage, so that the storage capacity and the calculation and analysis capacity can be expanded on the original architecture, but the parallel databases separate calculation from storage, the stored data cannot be calculated and analyzed at the same time, and the multiple parallel databases have bandwidth competition in the process of data access due to bandwidth limitation, so that bandwidth bottleneck occurs. Meanwhile, when capacity expansion is performed, static shutdown is required, expansion storage is performed, and data are also distributed again. If the data suddenly increases in a certain period, the information data increase speed is greater than the database hardware upgrade speed, so that seamless connection of suddenly increasing data storage through capacity expansion cannot be met, insufficient database capacity easily occurs, and the service quality and the customer timeliness requirements are affected.
The load balancing and processing speed of the storage system under different scenes are guaranteed, an HDFS distributed system data storage strategy is generally adopted, the distributed file system HDFS default data storage strategy under the Hadoop big data platform has the consideration of the aspects of data storage efficiency, balanced distribution as much as possible, data reliability and the like at the beginning of design, but the mode that the HDFS randomly selects storage nodes to store data easily causes the following two problems: data node hardware differences are not considered in a default placement strategy and data distribution imbalance is easy to cause
When the HDFS distributed system stores data, a strategy of randomly placing data blocks is adopted by default so as to obtain the balanced distribution of the data in the nodes. When the storage nodes of the system are subjected to addition and deletion operations, the data distribution cannot be automatically adjusted, and the requirement of system load balancing cannot be met.
The HDF distributed system stores data information on the premise of cluster isomorphism, and hardware differences of data nodes in the clusters are not correspondingly optimized during design, so that disk space of some data nodes cannot be fully utilized, and the efficiency of the whole storage system is affected.
Disclosure of Invention
Aiming at the problems that the existing distributed storage system is unbalanced in data distribution and the data node hardware difference of the distributed storage system affects the efficiency of the whole storage system, the invention provides a file big data distributed storage system based on Hadoop, and the storage cost of enterprises is relieved by using the HDFS distributed storage system. By adopting a hash algorithm, the problem of data inclination of a storage system under the conditions of capacity expansion and downtime is effectively solved by improving a random placement data strategy stored by an HDFS block, the stability of the storage system is improved, and the system can achieve load balancing under isomorphic conditions; the data storage strategy can be optimized by adopting the fusion weighted polling algorithm, so that the problem of load balancing of the storage system under heterogeneous conditions is effectively solved, and the hardware resources of the system are fully utilized.
The technical scheme adopted by the invention for solving the technical problems is as follows:
the file big data distributed storage system based on Hadoop comprises a client, a control node of a master-slave structure and a plurality of slave nodes; the control node is used for controlling the plurality of slave nodes, the slave nodes are used for data storage and data processing, and the control node carry out information transmission through a TCP/IP protocol; the client is used for receiving a map function and a reduce function configured by a user, the map function is used for operation management of key/value pairs, a large-scale calculation task is decomposed into a plurality of subtasks, the subtasks are distributed to each slave node, calculation results are obtained by using calculation resources of the slave nodes, the reduce function is used for carrying out merging processing on all value values of the same key value, and the output key value is a final result;
when the data to be processed is stored in the subordinate node, the control node adopts a hash model to determine the appointed subordinate node, and the hash model is as follows:
target=getHashCode(request.IPNum)&nodeNum
where getHashCode () represents a string operation hash function, request.ipnum represents a request IP address, nodeNum represents the total number of slave nodes, and the corresponding numbered slave nodes can be allocated according to the value of target.
Further, a plurality of virtual layers are arranged between the slave nodes and the data to be processed, a first virtual layer is mapped to a first slave node, the first virtual layer comprises a plurality of virtual nodes, and the data to be processed is stored in the slave nodes through the virtual nodes.
Further, the mapping model of the virtual node and the subordinate node is as follows:
wherein P is the initial hash position of the slave node, P' is the hash position of the map node, N is the number of slave nodes, k=0, 1,2.
Further, the mapping model of the virtual node and the subordinate node may be:
where h represents the hash result of the feature string, l represents the string length, and w represents the weight of the character in the hash operation.
Further, the control node may be configured to, after processing the data storage request sent by the client, collect load states fed back by each slave node in real time, distribute the request to each slave node according to a weight, where the slave node distributes data according to the request of the control node, sends a storage result to the control node, and finally the control node sends a result of whether the data distribution is completed to the client.
Further, the load state is a disk usage rate, and the disk usage rate model is:
wherein U is total For disk space, U free Space remains for the disk.
Further, after receiving the data storage request, the control node starts to collect the current disk utilization rate of the slave node; the control node calculates the average disk utilization rate of the slave nodes, and if the disk utilization rate of the second slave node is greater than or equal to the average disk utilization rate, the weight of the second slave node is set to 0; and if the disk utilization rate of the second slave node is smaller than the average disk utilization rate, calculating the weight of the second slave node according to the disk utilization rate, and executing a polling allocation task according to the weight.
Further, the client may execute a parallel query command through the control node.
Further, the parallel query flow is as follows:
the client sends SQL commands to the control node through a script or command interface;
the SQL interface generates a virtual root node according to the query keyword analysis command, outputs a field as a leaf node and constructs a query task tree; creating an attribute value for each relation table node on the query task tree, which is used for marking a query target, reading a file mapping table and adding file information into the attribute value; the task tree is output to the optimizer.
Step (3), the optimizer retrieves metadata according to the file information recorded by the attribute values, obtains the corresponding position information of the data block, and adds the position information into the attribute values; converting the task tree into an operation aiming at the data block, and adjusting the operation sequence according to factors such as judging conditions, data quantity, field size and the like; merging the operation units by taking the slave nodes as units, pressing the operation units into a queue in sequence, and generating an operation list;
combining the operation lists of all the subordinate nodes into a query plan and outputting the query plan to a distributor; the distributor distributes the operation list to the executor on the corresponding subordinate node according to the node IP address;
and (5) the executor reads the operation list, sequentially executes all operation commands, acquires the query result from the local memory, uploads the query result to the distributor, and the distributor collects the calculation result and returns the collected query result to the client.
The beneficial effects of the invention are as follows:
the system uses an HDFS distributed storage system to mitigate the storage costs of an enterprise. By adopting a hash algorithm, the problem of data inclination of a storage system under the conditions of capacity expansion and downtime is effectively solved by improving a random placement data strategy stored by an HDFS block, the stability of the storage system is improved, and the system can achieve load balancing under isomorphic conditions; the data storage strategy can be optimized by adopting the fusion weighted polling algorithm, so that the problem of load balancing of the storage system under heterogeneous conditions is effectively solved, and the hardware resources of the system are fully utilized.
The foregoing description is only an overview of the present invention, and is intended to be more clearly understood as the present invention, as it is embodied in the following description, and is intended to be more clearly understood as the following description of the preferred embodiments, given in detail, of the present invention, along with other objects, features and advantages of the present invention.
Drawings
Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the invention. Also, like reference numerals are used to designate like parts throughout the figures. In the drawings:
FIG. 1 is a block diagram of a Hadoop-based archive big data distributed storage system of the present invention
FIG. 2 is a schematic diagram of a distributed storage system based on virtual nodes according to the present invention
Detailed Description
Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
In the description of the present invention, unless explicitly stated and limited otherwise, the terms "mounted," "connected," "secured," and the like are to be construed broadly, and may be, for example, connected, detachably connected, or integrated; can be mechanically or electrically connected; can be directly connected or indirectly connected through an intermediate medium, and can be communicated with the inside of two elements or the interaction relationship of the two elements. The specific meaning of the above terms in the present invention can be understood by those of ordinary skill in the art according to the specific circumstances.
The system comprises a client, a control node of a master-slave structure and a plurality of slave nodes as shown in figure 1; the control node is used for controlling the plurality of slave nodes, the slave nodes are used for data storage and data processing, and the control node carry out information transmission through a TCP/IP protocol; the client is used for receiving a map function and a reduce function configured by a user, the map function is used for operation management of key/value pairs, a large-scale calculation task is decomposed into a plurality of subtasks, the subtasks are distributed to each slave node, calculation results are obtained by using calculation resources of the slave nodes, the reduce function is used for carrying out merging processing on all value values of the same key value, and the output key value is a final result;
when the data to be processed is stored in the subordinate node, the control node adopts a hash model to determine the appointed subordinate node, and the hash model is as follows:
target=getHashCode(request.IPNum)&nodeNum
where getHashCode () represents a string operation hash function, request.ipnum represents a request IP address, nodeNum represents the total number of slave nodes, and the corresponding numbered slave nodes can be allocated according to the value of target.
Further, a plurality of virtual layers are arranged between the slave nodes and the data to be processed, a first virtual layer is mapped to a first slave node, the first virtual layer comprises a plurality of virtual nodes, and the data to be processed is stored in the slave nodes through the virtual nodes.
Further, as shown in fig. 2, the mapping model of the virtual node and the subordinate node is:
wherein P is the initial hash position of the slave node, P' is the hash position of the map node, N is the number of slave nodes, k=0, 1,2. The previous hash model may be replaced with a mapping model of the virtual node and the slave node to achieve balanced storage.
Further, the mapping model of the virtual node and the subordinate node may be:
where h represents the hash result of the feature string, l represents the string length, and w represents the weight of the character in the hash operation.
Further, the control node may be configured to, after processing the data storage request sent by the client, collect load states fed back by each slave node in real time, distribute the request to each slave node according to a weight, where the slave node distributes data according to the request of the control node, sends a storage result to the control node, and finally the control node sends a result of whether the data distribution is completed to the client.
Further, the load state is a disk usage rate, and the disk usage rate model is:
wherein U is total For disk space, U free Space remains for the disk.
Further, after receiving the data storage request, the control node starts to collect the current disk utilization rate of the slave node; the control node calculates the average disk utilization rate of the slave nodes, and if the disk utilization rate of the second slave node is greater than or equal to the average disk utilization rate, the weight of the second slave node is set to 0; and if the disk utilization rate of the second slave node is smaller than the average disk utilization rate, calculating the weight of the second slave node according to the disk utilization rate, and executing a polling allocation task according to the weight.
Further, the client may execute a parallel query command through the control node.
Further, the parallel query flow is as follows:
the client sends SQL commands to the control node through a script or command interface;
the SQL interface generates a virtual root node according to the query keyword analysis command, outputs a field as a leaf node and constructs a query task tree; creating an attribute value for each relation table node on the query task tree, which is used for marking a query target, reading a file mapping table and adding file information into the attribute value; the task tree is output to the optimizer.
Step (3), the optimizer retrieves metadata according to the file information recorded by the attribute values, obtains the corresponding position information of the data block, and adds the position information into the attribute values; converting the task tree into an operation aiming at the data block, and adjusting the operation sequence according to factors such as judging conditions, data quantity, field size and the like; merging the operation units by taking the slave nodes as units, pressing the operation units into a queue in sequence, and generating an operation list;
combining the operation lists of all the subordinate nodes into a query plan and outputting the query plan to a distributor; the distributor distributes the operation list to the executor on the corresponding subordinate node according to the node IP address;
and (5) the executor reads the operation list, sequentially executes all operation commands, acquires the query result from the local memory, uploads the query result to the distributor, and the distributor collects the calculation result and returns the collected query result to the client.
The invention has the advantages that: the system uses an HDFS distributed storage system to mitigate the storage costs of an enterprise. By adopting a hash algorithm, the problem of data inclination of a storage system under the conditions of capacity expansion and downtime is effectively solved by improving a random placement data strategy stored by an HDFS block, the stability of the storage system is improved, and the system can achieve load balancing under isomorphic conditions; the data storage strategy can be optimized by adopting the fusion weighted polling algorithm, so that the problem of load balancing of the storage system under heterogeneous conditions is effectively solved, and the hardware resources of the system are fully utilized.
The present invention is not limited to the above-mentioned embodiments, and any changes or substitutions that can be easily understood by those skilled in the art within the technical scope of the present invention are intended to be included in the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (8)

1. The utility model provides a archives big data distributed storage system based on Hadoop which characterized in that: the system comprises a client, a control node of a master-slave structure and a plurality of slave nodes; the control node is used for controlling the plurality of slave nodes, the slave nodes are used for data storage and data processing, and the control node carry out information transmission through a TCP/IP protocol; the client is used for receiving a map function and a reduce function configured by a user, the map function is used for operation management of key/value pairs, a large-scale calculation task is decomposed into a plurality of subtasks, the subtasks are distributed to each slave node, calculation results are obtained by using calculation resources of the slave nodes, the reduce function is used for carrying out merging processing on all value values of the same key value, and the output key value is a final result;
the control node can be used for a load balancing server, after processing a data storage request sent by the client, collecting a load state fed back by each slave node in real time, distributing the request to each slave node according to a weight, distributing data by the slave node according to the request of the control node, sending a storage result to the control node, and finally sending a result of whether the data distribution is completed to the client by the control node;
when the data to be processed is stored in the subordinate node, the control node adopts a hash model to determine the appointed subordinate node, and the hash model is as follows:
target=getHashCode(request.IPNum)&nodeNum
wherein getHashCode () represents a string operation hash function, request.ipnum represents a request IP address, nodeNum represents the total number of slave nodes, and slave nodes of corresponding numbers are allocated according to the value of target.
2. The Hadoop-based archival big data distributed storage system as claimed in claim 1, wherein: and setting a plurality of virtual layers between the slave node and the data to be processed, wherein a first virtual layer is mapped to a first slave node, the first virtual layer comprises a plurality of virtual nodes, the data to be processed is stored into the slave node through the virtual nodes, and the hash model is replaced by a mapping model of the virtual nodes and the slave node.
3. The Hadoop-based archival big data distributed storage system as claimed in claim 2, wherein: the mapping model of the virtual node and the subordinate node is as follows:
wherein P is the initial hash position of the slave node, P' is the hash position of the map node, N is the number of slave nodes, k=0, 1,2.
4. The Hadoop-based archival big data distributed storage system as claimed in claim 2, wherein: the mapping model of the virtual node and the subordinate node may also be:
where h represents the hash result of the feature string, l represents the string length, and w represents the weight of the character in the hash operation.
5. The Hadoop-based archival big data distributed storage system as claimed in claim 1, wherein: the load state is the disk utilization rate, and the disk utilization rate model is as follows:
wherein U is total For disk space, U free Space remains for the disk.
6. The Hadoop-based archival big data distributed storage system as claimed in claim 1, wherein: after receiving the data storage request, the control node starts to collect the current disk utilization rate of the slave node; the control node calculates the average disk utilization rate of the slave nodes, and if the disk utilization rate of the second slave node is greater than or equal to the average disk utilization rate, the weight of the second slave node is set to 0; and if the disk utilization rate of the second slave node is smaller than the average disk utilization rate, calculating the weight of the second slave node according to the disk utilization rate, and executing a polling allocation task according to the weight.
7. The Hadoop-based archival big data distributed storage system as claimed in claim 1, wherein: and the client executes a parallel query command through the control node.
8. The Hadoop-based archival big data distributed storage system as claimed in claim 7, wherein: the parallel query flow is as follows:
the client sends SQL commands to the control node through a script or command interface;
the SQL interface generates a virtual root node according to the query keyword analysis command, outputs a field as a leaf node and constructs a query task tree; creating an attribute value for each relation table node on the query task tree, which is used for marking a query target, reading a file mapping table and adding file information into the attribute value; outputting the task tree to an optimizer;
step (3), the optimizer retrieves metadata according to the file information recorded by the attribute values, obtains the corresponding position information of the data block, and adds the position information into the attribute values; converting the task tree into an operation aiming at the data block, and adjusting the operation sequence according to judging conditions, data quantity and field size factors; merging the operation units by taking the slave nodes as units, pressing the operation units into a queue in sequence, and generating an operation list;
combining the operation lists of all the subordinate nodes into a query plan and outputting the query plan to a distributor; the distributor distributes the operation list to the executor on the corresponding subordinate node according to the node IP address;
and (5) the executor reads the operation list, sequentially executes all operation commands, acquires the query result from the local memory, uploads the query result to the distributor, and the distributor collects the calculation result and returns the collected query result to the client.
CN202111000510.5A 2021-08-29 2021-08-29 Archive big data distributed storage system based on Hadoop Active CN113688115B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111000510.5A CN113688115B (en) 2021-08-29 2021-08-29 Archive big data distributed storage system based on Hadoop

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111000510.5A CN113688115B (en) 2021-08-29 2021-08-29 Archive big data distributed storage system based on Hadoop

Publications (2)

Publication Number Publication Date
CN113688115A CN113688115A (en) 2021-11-23
CN113688115B true CN113688115B (en) 2024-02-20

Family

ID=78583716

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111000510.5A Active CN113688115B (en) 2021-08-29 2021-08-29 Archive big data distributed storage system based on Hadoop

Country Status (1)

Country Link
CN (1) CN113688115B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116955306B (en) * 2023-06-21 2024-04-12 东莞市铁石文档科技有限公司 File management system based on distributed storage

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102063486A (en) * 2010-12-28 2011-05-18 东北大学 Multi-dimensional data management-oriented cloud computing query processing method
CN103929500A (en) * 2014-05-06 2014-07-16 刘跃 Method for data fragmentation of distributed storage system
CN104065716A (en) * 2014-06-18 2014-09-24 江苏物联网研究发展中心 OpenStack based Hadoop service providing method
CN105824618A (en) * 2016-03-10 2016-08-03 浪潮软件集团有限公司 Real-time message processing method for Storm
CN108449376A (en) * 2018-01-31 2018-08-24 合肥和钧正策信息技术有限公司 A kind of load-balancing method of big data calculate node that serving enterprise
CN109740038A (en) * 2019-01-02 2019-05-10 安徽芃睿科技有限公司 Network data distributed parallel computing environment and method

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10545914B2 (en) * 2017-01-17 2020-01-28 Cisco Technology, Inc. Distributed object storage

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102063486A (en) * 2010-12-28 2011-05-18 东北大学 Multi-dimensional data management-oriented cloud computing query processing method
CN103929500A (en) * 2014-05-06 2014-07-16 刘跃 Method for data fragmentation of distributed storage system
CN104065716A (en) * 2014-06-18 2014-09-24 江苏物联网研究发展中心 OpenStack based Hadoop service providing method
CN105824618A (en) * 2016-03-10 2016-08-03 浪潮软件集团有限公司 Real-time message processing method for Storm
CN108449376A (en) * 2018-01-31 2018-08-24 合肥和钧正策信息技术有限公司 A kind of load-balancing method of big data calculate node that serving enterprise
CN109740038A (en) * 2019-01-02 2019-05-10 安徽芃睿科技有限公司 Network data distributed parallel computing environment and method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
Hadoop中改进的共享式存储设备设计;覃伟荣;《计算机工程与设计》;第39卷(第5期);第1319-1325页 *

Also Published As

Publication number Publication date
CN113688115A (en) 2021-11-23

Similar Documents

Publication Publication Date Title
CN101370030B (en) Resource load stabilization method based on contents duplication
Zhu et al. Efficient, proximity-aware load balancing for DHT-based P2P systems
CN102831120B (en) A kind of data processing method and system
CN110213352B (en) Method for aggregating dispersed autonomous storage resources with uniform name space
CN104023088B (en) Storage server selection method applied to distributed file system
CN103516807A (en) Cloud computing platform server load balancing system and method
CN102609446B (en) Distributed Bloom filter system and application method thereof
CN104391930A (en) Distributed file storage device and method
CN104539730B (en) Towards the load-balancing method of video in a kind of HDFS
CN107450855B (en) Model-variable data distribution method and system for distributed storage
CN107330056A (en) Wind power plant SCADA system and its operation method based on big data cloud computing platform
CN103152393A (en) Charging method and charging system for cloud computing
WO2004063928A1 (en) Database load reducing system and load reducing program
CN104572505A (en) System and method for ensuring eventual consistency of mass data caches
CN113655969B (en) Data balanced storage method based on streaming distributed storage system
CN106960011A (en) Metadata of distributed type file system management system and method
Rajalakshmi et al. An improved dynamic data replica selection and placement in cloud
CN113688115B (en) Archive big data distributed storage system based on Hadoop
CN102331948A (en) Resource state-based virtual machine structure adjustment method and adjustment system
JP6201438B2 (en) Content distribution method, content distribution server, and thumbnail collection program
CN109597903A (en) Image file processing apparatus and method, document storage system and storage medium
CN104281980A (en) Remote diagnosis method and system for thermal generator set based on distributed calculation
CN107277144A (en) A kind of distributed high concurrent cloud storage Database Systems and its load equalization method
Zhao et al. Dynamic replica creation strategy based on file heat and node load in hybrid cloud
CN108920282A (en) A kind of copy of content generation, placement and the update method of holding load equilibrium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information
CB02 Change of applicant information

Address after: Room 769, building 2, East Ring Road, Yanqing Park, Zhongguancun, Yanqing District, Beijing 102101

Applicant after: ZHONGDUN innovative digital technology (Beijing) Co.,Ltd.

Address before: Room 769, building 2, East Ring Road, Yanqing Park, Zhongguancun, Yanqing District, Beijing 102101

Applicant before: ZHONGDUN innovation archives management (Beijing) Co.,Ltd.

GR01 Patent grant
GR01 Patent grant