CN110895451A - Data access performance optimization method based on distributed system - Google Patents
Data access performance optimization method based on distributed system Download PDFInfo
- Publication number
- CN110895451A CN110895451A CN201911115587.XA CN201911115587A CN110895451A CN 110895451 A CN110895451 A CN 110895451A CN 201911115587 A CN201911115587 A CN 201911115587A CN 110895451 A CN110895451 A CN 110895451A
- Authority
- CN
- China
- Prior art keywords
- node
- access
- data
- module
- performance optimization
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0602—Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
- G06F3/061—Improving I/O performance
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0668—Interfaces specially adapted for storage systems adopting a particular infrastructure
- G06F3/067—Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5083—Techniques for rebalancing the load in a distributed system
Abstract
The invention belongs to the technical field of distributed systems, and particularly relates to a data access performance optimization method based on a distributed system. Compared with the prior art, the method introduces the concept of 'access frequency' and combines with 'weight value' to participate in calculating the storage position of data together. When a storage node is selected for calculation, except for the priority consideration of a node with a high weight value, the access frequency of an accessed node is also considered, when a node is hit, the access frequency of the node is increased, and when a storage request is not hit, the access frequency of the node is reduced, and when the access frequency of a certain node is higher than a set value, the node is no longer used as a candidate node for data storage. The optimization method dynamically calculates the data storage position according to the access frequency of each node and the capacity weight of the node, optimizes the distribution condition and the access efficiency of the stored data, can be applied to distributed system products, improves the access efficiency of background storage, and improves the data access performance of the distributed system.
Description
Technical Field
The invention belongs to the technical field of distributed systems, and particularly relates to a data access performance optimization method based on a distributed system.
Background
Distributed systems, as one of the underlying infrastructures, face various challenges including the growth rate of data volume, diversity of data categories, higher and higher performance requirements, and the like. In a distributed system, a data distribution algorithm for determining data access efficiency not only affects the load balance of a cluster, but also is related to the reliability of data. The common data distribution algorithm establishes a multi-level mapping table through an actual cluster physical architecture, iterates according to a hierarchical structure, and finally returns a group of storage nodes most suitable for storing data objects.
In practical application, because the traditional data distribution algorithm abstracts the residual storage capacity of the storage nodes into the weight values of the storage nodes, when data is stored, the storage position of the data is obtained by calculation according to the weight values of the storage nodes. If the data object requested to be stored is too small, the change of the node weight is less influenced, and a group of data with high relevance is easily stored in the same group of storage nodes. When accessing this set of data, problems of load balancing and frequent I/O requests occur, which degrades the performance of the entire cluster.
Disclosure of Invention
Technical problem to be solved
The technical problem to be solved by the invention is as follows: how to provide a method for optimizing the data access performance of a distributed system.
(II) technical scheme
In order to solve the above technical problem, the present invention provides a data access performance optimization method based on a distributed system, where the data access performance optimization method is implemented based on a data access performance optimization system, and the data access performance optimization system includes: the system comprises a request receiving module, a node set acquisition module, a node weight acquisition module, an access frequency acquisition module, a frequency comparison module, a sorting module and a message management module;
the data access performance optimization method comprises the following steps:
step 1: the request receiving module receives request data requesting to access the storage;
step 2: the node set acquisition module calculates an available storage node set according to a current cluster mapping table of the distributed system;
and step 3: the node weight obtaining module obtains the weight values of all available storage nodes at present;
and 4, step 4: the access frequency acquisition module acquires the access frequencies of all available storage nodes at present;
and 5: the frequency comparison module compares the access frequency of all available storage nodes with a preset frequency threshold value, and screens out all available storage nodes with the access frequency smaller than the preset frequency threshold value;
step 6: the sorting module sorts all available storage nodes with access frequencies smaller than a preset frequency threshold value according to the weight values of the storage nodes to generate sorting results, and submits the sorting results to the message management module;
and 7: and the message management module stores the data object into the calculated optimal storage node.
Wherein the data access performance optimization system further comprises a weight update module;
the method further comprises the following steps: and 8: and the weight updating module updates the weight value of the access node according to the data volume of the stored data object.
Wherein the data access performance optimization system further comprises: an access frequency update module;
the method further comprises the following steps: and step 9: and the access frequency updating module adds one to the number of accesses to the optimal storage node, thereby updating the access frequency of the access node.
The node weight obtaining module obtains the weight values of all available nodes according to the current data storage capacity of each node.
The access frequency acquisition module acquires the access frequency of all available nodes according to the past access times of each node.
Wherein the optimal storage node is: and accessing the storage node with the access frequency less than the preset frequency threshold and the highest weight value.
(III) advantageous effects
Compared with the prior art, the invention has the distinguishing technical characteristics that: the concept of 'access frequency' is introduced, and the storage positions of data are calculated together with 'weight values'. When a storage node is selected for calculation, except for the priority consideration of a node with a high weight value, the access frequency of an accessed node is also considered, when a node is hit, the access frequency of the node is increased, and when a storage request is not hit, the access frequency of the node is reduced, and when the access frequency of a certain node is higher than a set value, the node is no longer used as a candidate node for data storage.
Therefore, compared with the prior art, the method for optimizing the data access performance of the distributed system provided by the invention has the advantages that the access frequency of the storage node is taken as one of the constraint conditions for calculating the data distribution, and the access frequency and the weight value of the storage node participate in the calculation of the data storage position together, so that the problems of uneven data distribution and frequent access of a single storage node caused during data storage are solved.
In the technical scheme provided by the invention, the optimization of a data distribution algorithm is realized, the weighted value of each node is updated in real time through a data storage request of a user, the access frequency of each node in a cluster is dynamically increased or reduced, and the current optimal storage node is calculated by utilizing the access frequency and the weighted value of the node.
In summary, the data access performance optimization method based on the distributed system provided by the invention dynamically calculates the data storage position according to the access frequency of each node and the capacity weight of the node, thereby optimizing the distribution condition and the access efficiency of the stored data. The research result is applied to the distributed system product, the access efficiency of background storage is improved, and the data access performance of the distributed system is improved.
Detailed Description
In order to make the objects, contents, and advantages of the present invention clearer, the following detailed description of the embodiments of the present invention will be given in conjunction with examples.
In order to solve the above technical problem, the present invention provides a data access performance optimization method based on a distributed system, where the data access performance optimization method is implemented based on a data access performance optimization system, and the data access performance optimization system includes: the system comprises a request receiving module, a node set acquisition module, a node weight acquisition module, an access frequency acquisition module, a frequency comparison module, a sorting module and a message management module;
the data access performance optimization method comprises the following steps:
step 1: the request receiving module receives request data requesting to access the storage;
step 2: the node set acquisition module calculates an available storage node set according to a current cluster mapping table of the distributed system;
and step 3: the node weight obtaining module obtains the weight values of all available storage nodes at present;
and 4, step 4: the access frequency acquisition module acquires the access frequencies of all available storage nodes at present;
and 5: the frequency comparison module compares the access frequency of all available storage nodes with a preset frequency threshold value, and screens out all available storage nodes with the access frequency smaller than the preset frequency threshold value;
step 6: the sorting module sorts all available storage nodes with access frequencies smaller than a preset frequency threshold value according to the weight values of the storage nodes to generate sorting results, and submits the sorting results to the message management module;
and 7: and the message management module stores the data object into the calculated optimal storage node.
Wherein the data access performance optimization system further comprises a weight update module;
the method further comprises the following steps: and 8: and the weight updating module updates the weight value of the access node according to the data volume of the stored data object.
Wherein the data access performance optimization system further comprises: an access frequency update module;
the method further comprises the following steps: and step 9: and the access frequency updating module adds one to the number of accesses to the optimal storage node, thereby updating the access frequency of the access node.
The node weight obtaining module obtains the weight values of all available nodes according to the current data storage capacity of each node.
The access frequency acquisition module acquires the access frequency of all available nodes according to the past access times of each node.
Wherein the optimal storage node is: and accessing the storage node with the access frequency less than the preset frequency threshold and the highest weight value.
The above description is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, several modifications and variations can be made without departing from the technical principle of the present invention, and these modifications and variations should also be regarded as the protection scope of the present invention.
Claims (6)
1. A data access performance optimization method based on a distributed system, wherein the data access performance optimization method is implemented based on a data access performance optimization system, and the data access performance optimization system comprises: the system comprises a request receiving module, a node set acquisition module, a node weight acquisition module, an access frequency acquisition module, a frequency comparison module, a sorting module and a message management module;
the data access performance optimization method comprises the following steps:
step 1: the request receiving module receives request data requesting to access the storage;
step 2: the node set acquisition module calculates an available storage node set according to a current cluster mapping table of the distributed system;
and step 3: the node weight obtaining module obtains the weight values of all available storage nodes at present;
and 4, step 4: the access frequency acquisition module acquires the access frequencies of all available storage nodes at present;
and 5: the frequency comparison module compares the access frequency of all available storage nodes with a preset frequency threshold value, and screens out all available storage nodes with the access frequency smaller than the preset frequency threshold value;
step 6: the sorting module sorts all available storage nodes with access frequencies smaller than a preset frequency threshold value according to the weight values of the storage nodes to generate sorting results, and submits the sorting results to the message management module;
and 7: and the message management module stores the data object into the calculated optimal storage node.
2. The distributed system-based data access performance optimization method of claim 1, wherein the data access performance optimization system further comprises a weight update module;
the method further comprises the following steps: and 8: and the weight updating module updates the weight value of the access node according to the data volume of the stored data object.
3. The distributed system-based data access performance optimization method of claim 1, wherein the data access performance optimization system further comprises: an access frequency update module;
the method further comprises the following steps: and step 9: and the access frequency updating module adds one to the number of accesses to the optimal storage node, thereby updating the access frequency of the access node.
4. The distributed system-based data access performance optimization method of claim 1, wherein the node weight obtaining module obtains the weight values of all available nodes according to the current data storage amount of each node.
5. The distributed system-based data access performance optimization method of claim 1, wherein the access frequency obtaining module obtains the access frequencies of all available nodes according to the past access times of each node.
6. The distributed system-based data access performance optimization method of claim 1, wherein the optimal storage node is: and accessing the storage node with the access frequency less than the preset frequency threshold and the highest weight value.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911115587.XA CN110895451A (en) | 2019-11-14 | 2019-11-14 | Data access performance optimization method based on distributed system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911115587.XA CN110895451A (en) | 2019-11-14 | 2019-11-14 | Data access performance optimization method based on distributed system |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110895451A true CN110895451A (en) | 2020-03-20 |
Family
ID=69788087
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201911115587.XA Pending CN110895451A (en) | 2019-11-14 | 2019-11-14 | Data access performance optimization method based on distributed system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110895451A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111753163A (en) * | 2020-07-08 | 2020-10-09 | 北京鼎泰智源科技有限公司 | Data acquisition method |
CN112445656A (en) * | 2020-12-14 | 2021-03-05 | 北京京航计算通讯研究所 | Method and device for repairing data in distributed storage system |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103118133A (en) * | 2013-02-28 | 2013-05-22 | 浙江大学 | Mixed cloud storage method based on file access frequency |
US20150089283A1 (en) * | 2012-05-03 | 2015-03-26 | Thomson Licensing | Method of data storing and maintenance in a distributed data storage system and corresponding device |
CN106254452A (en) * | 2016-08-01 | 2016-12-21 | 成都鼎智汇科技有限公司 | The big data access method of medical treatment under cloud platform |
CN107562913A (en) * | 2017-09-12 | 2018-01-09 | 郑州云海信息技术有限公司 | The date storage method and device of a kind of distributed file system |
CN109522330A (en) * | 2018-10-18 | 2019-03-26 | 上海达家迎信息科技有限公司 | Cloud platform data processing method, device, equipment and medium based on block chain |
CN109819057A (en) * | 2019-04-08 | 2019-05-28 | 科大讯飞股份有限公司 | A kind of load-balancing method and system |
CN110147372A (en) * | 2019-05-21 | 2019-08-20 | 电子科技大学 | A kind of distributed data base Intelligent Hybrid storage method towards HTAP |
-
2019
- 2019-11-14 CN CN201911115587.XA patent/CN110895451A/en active Pending
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150089283A1 (en) * | 2012-05-03 | 2015-03-26 | Thomson Licensing | Method of data storing and maintenance in a distributed data storage system and corresponding device |
CN103118133A (en) * | 2013-02-28 | 2013-05-22 | 浙江大学 | Mixed cloud storage method based on file access frequency |
CN106254452A (en) * | 2016-08-01 | 2016-12-21 | 成都鼎智汇科技有限公司 | The big data access method of medical treatment under cloud platform |
CN107562913A (en) * | 2017-09-12 | 2018-01-09 | 郑州云海信息技术有限公司 | The date storage method and device of a kind of distributed file system |
CN109522330A (en) * | 2018-10-18 | 2019-03-26 | 上海达家迎信息科技有限公司 | Cloud platform data processing method, device, equipment and medium based on block chain |
CN109819057A (en) * | 2019-04-08 | 2019-05-28 | 科大讯飞股份有限公司 | A kind of load-balancing method and system |
CN110147372A (en) * | 2019-05-21 | 2019-08-20 | 电子科技大学 | A kind of distributed data base Intelligent Hybrid storage method towards HTAP |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111753163A (en) * | 2020-07-08 | 2020-10-09 | 北京鼎泰智源科技有限公司 | Data acquisition method |
CN112445656A (en) * | 2020-12-14 | 2021-03-05 | 北京京航计算通讯研究所 | Method and device for repairing data in distributed storage system |
CN112445656B (en) * | 2020-12-14 | 2024-02-13 | 北京京航计算通讯研究所 | Method and device for repairing data in distributed storage system |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2021004063A1 (en) | Cache server bandwidth scheduling method and device | |
CN101888341B (en) | Calculable creditworthiness-based access control method under distributed environment of multiple trusting domains | |
CN110147372B (en) | HTAP-oriented distributed database intelligent hybrid storage method | |
CN108416465B (en) | Workflow optimization method in mobile cloud environment | |
US20020087798A1 (en) | System and method for adaptive data caching | |
CN110895451A (en) | Data access performance optimization method based on distributed system | |
CN102868542B (en) | The control method and system of service quality in a kind of service delivery network | |
KR101411321B1 (en) | Method and apparatus for managing neighbor node having similar characteristic with active node and computer readable medium thereof | |
CN105357247B (en) | Multidimensional property cloud resource range lookup method based on layering cloud peer-to-peer network | |
CN113778683A (en) | Handle identification system analysis load balancing method based on neural network | |
CN111258980A (en) | Dynamic file placement method based on combination prediction in cloud storage system | |
CN110825791A (en) | Data access performance optimization system based on distributed system | |
CN102378407B (en) | Object name resolution system and method in internet of things | |
CN113655969A (en) | Data balanced storage method based on streaming distributed storage system | |
CN117077811A (en) | Federal learning optimization method and related device | |
CN110008289B (en) | Relational database and power grid model data storage and retrieval method | |
US20110179031A1 (en) | Configuration information management device, distributed information management system, and distributed information management method | |
CN112765177B (en) | Distributed graph data segmentation and replication method based on load driving | |
CN108833531A (en) | A kind of data placement method based on consumption of data center and network energy consumption | |
EP3507699A1 (en) | Method and systems for master establishment using service-based statistics | |
CN110944050B (en) | Reverse proxy server cache dynamic configuration method and system | |
CN107294855B (en) | A kind of TCP under high-performance calculation network searches optimization method | |
CN113282241B (en) | Hard disk weight optimization method and device based on Ceph distributed storage | |
Ryeng et al. | Site-autonomous distributed semantic caching | |
CN115016889A (en) | Virtual machine optimization scheduling method for cloud computing |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20200320 |
|
RJ01 | Rejection of invention patent application after publication |