CN110895451A - Data access performance optimization method based on distributed system - Google Patents

Data access performance optimization method based on distributed system Download PDF

Info

Publication number
CN110895451A
CN110895451A CN201911115587.XA CN201911115587A CN110895451A CN 110895451 A CN110895451 A CN 110895451A CN 201911115587 A CN201911115587 A CN 201911115587A CN 110895451 A CN110895451 A CN 110895451A
Authority
CN
China
Prior art keywords
node
access
data
module
performance optimization
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911115587.XA
Other languages
Chinese (zh)
Inventor
周羽
王俊
张彤
张永静
朱英泮
郑宇宁
朱盼盼
姚帅
司敬
石春刚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jinghang Computing Communication Research Institute
Original Assignee
Beijing Jinghang Computing Communication Research Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jinghang Computing Communication Research Institute filed Critical Beijing Jinghang Computing Communication Research Institute
Priority to CN201911115587.XA priority Critical patent/CN110895451A/en
Publication of CN110895451A publication Critical patent/CN110895451A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/061Improving I/O performance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/067Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5083Techniques for rebalancing the load in a distributed system

Abstract

The invention belongs to the technical field of distributed systems, and particularly relates to a data access performance optimization method based on a distributed system. Compared with the prior art, the method introduces the concept of 'access frequency' and combines with 'weight value' to participate in calculating the storage position of data together. When a storage node is selected for calculation, except for the priority consideration of a node with a high weight value, the access frequency of an accessed node is also considered, when a node is hit, the access frequency of the node is increased, and when a storage request is not hit, the access frequency of the node is reduced, and when the access frequency of a certain node is higher than a set value, the node is no longer used as a candidate node for data storage. The optimization method dynamically calculates the data storage position according to the access frequency of each node and the capacity weight of the node, optimizes the distribution condition and the access efficiency of the stored data, can be applied to distributed system products, improves the access efficiency of background storage, and improves the data access performance of the distributed system.

Description

Data access performance optimization method based on distributed system
Technical Field
The invention belongs to the technical field of distributed systems, and particularly relates to a data access performance optimization method based on a distributed system.
Background
Distributed systems, as one of the underlying infrastructures, face various challenges including the growth rate of data volume, diversity of data categories, higher and higher performance requirements, and the like. In a distributed system, a data distribution algorithm for determining data access efficiency not only affects the load balance of a cluster, but also is related to the reliability of data. The common data distribution algorithm establishes a multi-level mapping table through an actual cluster physical architecture, iterates according to a hierarchical structure, and finally returns a group of storage nodes most suitable for storing data objects.
In practical application, because the traditional data distribution algorithm abstracts the residual storage capacity of the storage nodes into the weight values of the storage nodes, when data is stored, the storage position of the data is obtained by calculation according to the weight values of the storage nodes. If the data object requested to be stored is too small, the change of the node weight is less influenced, and a group of data with high relevance is easily stored in the same group of storage nodes. When accessing this set of data, problems of load balancing and frequent I/O requests occur, which degrades the performance of the entire cluster.
Disclosure of Invention
Technical problem to be solved
The technical problem to be solved by the invention is as follows: how to provide a method for optimizing the data access performance of a distributed system.
(II) technical scheme
In order to solve the above technical problem, the present invention provides a data access performance optimization method based on a distributed system, where the data access performance optimization method is implemented based on a data access performance optimization system, and the data access performance optimization system includes: the system comprises a request receiving module, a node set acquisition module, a node weight acquisition module, an access frequency acquisition module, a frequency comparison module, a sorting module and a message management module;
the data access performance optimization method comprises the following steps:
step 1: the request receiving module receives request data requesting to access the storage;
step 2: the node set acquisition module calculates an available storage node set according to a current cluster mapping table of the distributed system;
and step 3: the node weight obtaining module obtains the weight values of all available storage nodes at present;
and 4, step 4: the access frequency acquisition module acquires the access frequencies of all available storage nodes at present;
and 5: the frequency comparison module compares the access frequency of all available storage nodes with a preset frequency threshold value, and screens out all available storage nodes with the access frequency smaller than the preset frequency threshold value;
step 6: the sorting module sorts all available storage nodes with access frequencies smaller than a preset frequency threshold value according to the weight values of the storage nodes to generate sorting results, and submits the sorting results to the message management module;
and 7: and the message management module stores the data object into the calculated optimal storage node.
Wherein the data access performance optimization system further comprises a weight update module;
the method further comprises the following steps: and 8: and the weight updating module updates the weight value of the access node according to the data volume of the stored data object.
Wherein the data access performance optimization system further comprises: an access frequency update module;
the method further comprises the following steps: and step 9: and the access frequency updating module adds one to the number of accesses to the optimal storage node, thereby updating the access frequency of the access node.
The node weight obtaining module obtains the weight values of all available nodes according to the current data storage capacity of each node.
The access frequency acquisition module acquires the access frequency of all available nodes according to the past access times of each node.
Wherein the optimal storage node is: and accessing the storage node with the access frequency less than the preset frequency threshold and the highest weight value.
(III) advantageous effects
Compared with the prior art, the invention has the distinguishing technical characteristics that: the concept of 'access frequency' is introduced, and the storage positions of data are calculated together with 'weight values'. When a storage node is selected for calculation, except for the priority consideration of a node with a high weight value, the access frequency of an accessed node is also considered, when a node is hit, the access frequency of the node is increased, and when a storage request is not hit, the access frequency of the node is reduced, and when the access frequency of a certain node is higher than a set value, the node is no longer used as a candidate node for data storage.
Therefore, compared with the prior art, the method for optimizing the data access performance of the distributed system provided by the invention has the advantages that the access frequency of the storage node is taken as one of the constraint conditions for calculating the data distribution, and the access frequency and the weight value of the storage node participate in the calculation of the data storage position together, so that the problems of uneven data distribution and frequent access of a single storage node caused during data storage are solved.
In the technical scheme provided by the invention, the optimization of a data distribution algorithm is realized, the weighted value of each node is updated in real time through a data storage request of a user, the access frequency of each node in a cluster is dynamically increased or reduced, and the current optimal storage node is calculated by utilizing the access frequency and the weighted value of the node.
In summary, the data access performance optimization method based on the distributed system provided by the invention dynamically calculates the data storage position according to the access frequency of each node and the capacity weight of the node, thereby optimizing the distribution condition and the access efficiency of the stored data. The research result is applied to the distributed system product, the access efficiency of background storage is improved, and the data access performance of the distributed system is improved.
Detailed Description
In order to make the objects, contents, and advantages of the present invention clearer, the following detailed description of the embodiments of the present invention will be given in conjunction with examples.
In order to solve the above technical problem, the present invention provides a data access performance optimization method based on a distributed system, where the data access performance optimization method is implemented based on a data access performance optimization system, and the data access performance optimization system includes: the system comprises a request receiving module, a node set acquisition module, a node weight acquisition module, an access frequency acquisition module, a frequency comparison module, a sorting module and a message management module;
the data access performance optimization method comprises the following steps:
step 1: the request receiving module receives request data requesting to access the storage;
step 2: the node set acquisition module calculates an available storage node set according to a current cluster mapping table of the distributed system;
and step 3: the node weight obtaining module obtains the weight values of all available storage nodes at present;
and 4, step 4: the access frequency acquisition module acquires the access frequencies of all available storage nodes at present;
and 5: the frequency comparison module compares the access frequency of all available storage nodes with a preset frequency threshold value, and screens out all available storage nodes with the access frequency smaller than the preset frequency threshold value;
step 6: the sorting module sorts all available storage nodes with access frequencies smaller than a preset frequency threshold value according to the weight values of the storage nodes to generate sorting results, and submits the sorting results to the message management module;
and 7: and the message management module stores the data object into the calculated optimal storage node.
Wherein the data access performance optimization system further comprises a weight update module;
the method further comprises the following steps: and 8: and the weight updating module updates the weight value of the access node according to the data volume of the stored data object.
Wherein the data access performance optimization system further comprises: an access frequency update module;
the method further comprises the following steps: and step 9: and the access frequency updating module adds one to the number of accesses to the optimal storage node, thereby updating the access frequency of the access node.
The node weight obtaining module obtains the weight values of all available nodes according to the current data storage capacity of each node.
The access frequency acquisition module acquires the access frequency of all available nodes according to the past access times of each node.
Wherein the optimal storage node is: and accessing the storage node with the access frequency less than the preset frequency threshold and the highest weight value.
The above description is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, several modifications and variations can be made without departing from the technical principle of the present invention, and these modifications and variations should also be regarded as the protection scope of the present invention.

Claims (6)

1. A data access performance optimization method based on a distributed system, wherein the data access performance optimization method is implemented based on a data access performance optimization system, and the data access performance optimization system comprises: the system comprises a request receiving module, a node set acquisition module, a node weight acquisition module, an access frequency acquisition module, a frequency comparison module, a sorting module and a message management module;
the data access performance optimization method comprises the following steps:
step 1: the request receiving module receives request data requesting to access the storage;
step 2: the node set acquisition module calculates an available storage node set according to a current cluster mapping table of the distributed system;
and step 3: the node weight obtaining module obtains the weight values of all available storage nodes at present;
and 4, step 4: the access frequency acquisition module acquires the access frequencies of all available storage nodes at present;
and 5: the frequency comparison module compares the access frequency of all available storage nodes with a preset frequency threshold value, and screens out all available storage nodes with the access frequency smaller than the preset frequency threshold value;
step 6: the sorting module sorts all available storage nodes with access frequencies smaller than a preset frequency threshold value according to the weight values of the storage nodes to generate sorting results, and submits the sorting results to the message management module;
and 7: and the message management module stores the data object into the calculated optimal storage node.
2. The distributed system-based data access performance optimization method of claim 1, wherein the data access performance optimization system further comprises a weight update module;
the method further comprises the following steps: and 8: and the weight updating module updates the weight value of the access node according to the data volume of the stored data object.
3. The distributed system-based data access performance optimization method of claim 1, wherein the data access performance optimization system further comprises: an access frequency update module;
the method further comprises the following steps: and step 9: and the access frequency updating module adds one to the number of accesses to the optimal storage node, thereby updating the access frequency of the access node.
4. The distributed system-based data access performance optimization method of claim 1, wherein the node weight obtaining module obtains the weight values of all available nodes according to the current data storage amount of each node.
5. The distributed system-based data access performance optimization method of claim 1, wherein the access frequency obtaining module obtains the access frequencies of all available nodes according to the past access times of each node.
6. The distributed system-based data access performance optimization method of claim 1, wherein the optimal storage node is: and accessing the storage node with the access frequency less than the preset frequency threshold and the highest weight value.
CN201911115587.XA 2019-11-14 2019-11-14 Data access performance optimization method based on distributed system Pending CN110895451A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911115587.XA CN110895451A (en) 2019-11-14 2019-11-14 Data access performance optimization method based on distributed system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911115587.XA CN110895451A (en) 2019-11-14 2019-11-14 Data access performance optimization method based on distributed system

Publications (1)

Publication Number Publication Date
CN110895451A true CN110895451A (en) 2020-03-20

Family

ID=69788087

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911115587.XA Pending CN110895451A (en) 2019-11-14 2019-11-14 Data access performance optimization method based on distributed system

Country Status (1)

Country Link
CN (1) CN110895451A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111753163A (en) * 2020-07-08 2020-10-09 北京鼎泰智源科技有限公司 Data acquisition method
CN112445656A (en) * 2020-12-14 2021-03-05 北京京航计算通讯研究所 Method and device for repairing data in distributed storage system

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103118133A (en) * 2013-02-28 2013-05-22 浙江大学 Mixed cloud storage method based on file access frequency
US20150089283A1 (en) * 2012-05-03 2015-03-26 Thomson Licensing Method of data storing and maintenance in a distributed data storage system and corresponding device
CN106254452A (en) * 2016-08-01 2016-12-21 成都鼎智汇科技有限公司 The big data access method of medical treatment under cloud platform
CN107562913A (en) * 2017-09-12 2018-01-09 郑州云海信息技术有限公司 The date storage method and device of a kind of distributed file system
CN109522330A (en) * 2018-10-18 2019-03-26 上海达家迎信息科技有限公司 Cloud platform data processing method, device, equipment and medium based on block chain
CN109819057A (en) * 2019-04-08 2019-05-28 科大讯飞股份有限公司 A kind of load-balancing method and system
CN110147372A (en) * 2019-05-21 2019-08-20 电子科技大学 A kind of distributed data base Intelligent Hybrid storage method towards HTAP

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150089283A1 (en) * 2012-05-03 2015-03-26 Thomson Licensing Method of data storing and maintenance in a distributed data storage system and corresponding device
CN103118133A (en) * 2013-02-28 2013-05-22 浙江大学 Mixed cloud storage method based on file access frequency
CN106254452A (en) * 2016-08-01 2016-12-21 成都鼎智汇科技有限公司 The big data access method of medical treatment under cloud platform
CN107562913A (en) * 2017-09-12 2018-01-09 郑州云海信息技术有限公司 The date storage method and device of a kind of distributed file system
CN109522330A (en) * 2018-10-18 2019-03-26 上海达家迎信息科技有限公司 Cloud platform data processing method, device, equipment and medium based on block chain
CN109819057A (en) * 2019-04-08 2019-05-28 科大讯飞股份有限公司 A kind of load-balancing method and system
CN110147372A (en) * 2019-05-21 2019-08-20 电子科技大学 A kind of distributed data base Intelligent Hybrid storage method towards HTAP

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111753163A (en) * 2020-07-08 2020-10-09 北京鼎泰智源科技有限公司 Data acquisition method
CN112445656A (en) * 2020-12-14 2021-03-05 北京京航计算通讯研究所 Method and device for repairing data in distributed storage system
CN112445656B (en) * 2020-12-14 2024-02-13 北京京航计算通讯研究所 Method and device for repairing data in distributed storage system

Similar Documents

Publication Publication Date Title
WO2021004063A1 (en) Cache server bandwidth scheduling method and device
CN101888341B (en) Calculable creditworthiness-based access control method under distributed environment of multiple trusting domains
CN110147372B (en) HTAP-oriented distributed database intelligent hybrid storage method
CN108416465B (en) Workflow optimization method in mobile cloud environment
US20020087798A1 (en) System and method for adaptive data caching
CN110895451A (en) Data access performance optimization method based on distributed system
CN102868542B (en) The control method and system of service quality in a kind of service delivery network
KR101411321B1 (en) Method and apparatus for managing neighbor node having similar characteristic with active node and computer readable medium thereof
CN105357247B (en) Multidimensional property cloud resource range lookup method based on layering cloud peer-to-peer network
CN113778683A (en) Handle identification system analysis load balancing method based on neural network
CN111258980A (en) Dynamic file placement method based on combination prediction in cloud storage system
CN110825791A (en) Data access performance optimization system based on distributed system
CN102378407B (en) Object name resolution system and method in internet of things
CN113655969A (en) Data balanced storage method based on streaming distributed storage system
CN117077811A (en) Federal learning optimization method and related device
CN110008289B (en) Relational database and power grid model data storage and retrieval method
US20110179031A1 (en) Configuration information management device, distributed information management system, and distributed information management method
CN112765177B (en) Distributed graph data segmentation and replication method based on load driving
CN108833531A (en) A kind of data placement method based on consumption of data center and network energy consumption
EP3507699A1 (en) Method and systems for master establishment using service-based statistics
CN110944050B (en) Reverse proxy server cache dynamic configuration method and system
CN107294855B (en) A kind of TCP under high-performance calculation network searches optimization method
CN113282241B (en) Hard disk weight optimization method and device based on Ceph distributed storage
Ryeng et al. Site-autonomous distributed semantic caching
CN115016889A (en) Virtual machine optimization scheduling method for cloud computing

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20200320

RJ01 Rejection of invention patent application after publication