CN115202591A - Storage device, method and storage medium of distributed database system - Google Patents

Storage device, method and storage medium of distributed database system Download PDF

Info

Publication number
CN115202591A
CN115202591A CN202211127361.3A CN202211127361A CN115202591A CN 115202591 A CN115202591 A CN 115202591A CN 202211127361 A CN202211127361 A CN 202211127361A CN 115202591 A CN115202591 A CN 115202591A
Authority
CN
China
Prior art keywords
storage node
node
storage
graph
written
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202211127361.3A
Other languages
Chinese (zh)
Other versions
CN115202591B (en
Inventor
雷昱
齐洁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xiamen University
Original Assignee
Xiamen University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xiamen University filed Critical Xiamen University
Priority to CN202211127361.3A priority Critical patent/CN115202591B/en
Publication of CN115202591A publication Critical patent/CN115202591A/en
Application granted granted Critical
Publication of CN115202591B publication Critical patent/CN115202591B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/061Improving I/O performance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0638Organizing or formatting or addressing of data
    • G06F3/064Management of blocks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/067Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • G06F9/4881Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5083Techniques for rebalancing the load in a distributed system
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Human Computer Interaction (AREA)
  • Computing Systems (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a storage device, a method and a storage medium of a distributed database system, wherein the device comprises: the receiving unit is used for receiving data to be written in the distributed database system and determining the size of the data; the determining unit is used for selecting a first target storage node from the N storage nodes based on a load balancing strategy; the prediction unit predicts a second target storage node from the N storage nodes based on the size of the data to be written by using the trained graph neural network; and the writing unit writes the data to be written based on the first target storage node and the second target storage node. According to the invention, an Artificial Intelligence (AI) mode is adopted to select a proper target writing node based on the size of the written data block, and a more reasonable writing node is selected based on a certain rule between the target writing node selected by AI and the target node selected by a general load balancing strategy, so that the overall writing performance of the distributed database system is improved.

Description

Storage device, method and storage medium of distributed database system
Technical Field
The invention relates to the technical field of artificial intelligence, in particular to a storage device, a storage method and a storage medium of a distributed database system.
Background
Artificial intelligence is the subject of research on making computer to simulate some human thinking process and intelligent behavior (such as learning, reasoning, thinking, planning, etc.), and mainly includes the principle of computer to implement intelligence and the manufacture of computer similar to human brain intelligence to make computer implement higher-level application. Artificial intelligence will relate to computer science, psychology, philosophy and linguistics. The artificial intelligence is in the technical application level of thinking science, is an application branch of the artificial intelligence, and the application range of the artificial intelligence is wider and wider along with the development of the technology.
In the prior art, when data is written into each storage node, the data is generally written into the storage node by adopting a traditional scheduling strategy (namely load balancing), but due to the defect of a scheduling algorithm, the data can be written into some storage nodes all the time, namely, the existing algorithm is not intelligent enough, so that the problem of unbalance of utilization of the storage nodes is caused.
Disclosure of Invention
The present invention proposes the following technical solutions to address one or more technical defects in the prior art.
A storage apparatus of a distributed database system, the distributed database system comprising N storage nodes, the apparatus comprising:
the receiving unit is used for receiving data to be written in the distributed database system and determining the size of the data to be written;
the determining unit is used for selecting one storage node from the N storage nodes as a first target storage node based on a load balancing strategy;
the prediction unit predicts one storage node from the N storage nodes as a second target storage node by using the trained graph neural network;
the writing unit is used for writing the data to be written based on the first target storage node and the second target storage node;
wherein N is more than or equal to 2;
the distributed database system further comprises a scheduling server, the scheduling server writes the data to be written into corresponding storage nodes, the scheduling server is connected with each storage node, and the graph of the graph neural network is formed in the following mode: the scheduling server and the N storage nodes are used as graph nodes, the connection between the scheduling server and the N storage nodes is used as an edge of a graph, the weight of the edge is determined based on the bandwidth between the scheduling server and the corresponding storage node, and the characteristic value of the graph node is determined based on the processing capacity and the storage residual space of the storage node;
the way of determining the weights of the edges of the graph and the eigenvalues of the graph nodes is: normalizing the bandwidth between the scheduling server and the corresponding storage node, the processing capacity of the storage node and the storage residual space to obtain a bandwidth normalization value W between the scheduling server and the corresponding storage node i And the processing capacity normalization value P of the storage node i And storing the remainder space normalization value M i Calculating the Weight of the edge of the graph i
Figure 649295DEST_PATH_IMAGE002
Calculating a characteristic value C of a graph node i
Figure 190128DEST_PATH_IMAGE004
Wherein i represents the ith storage node and α represents
Figure 678879DEST_PATH_IMAGE006
Beta represents P i Weight of (a), gamma denotes
Figure 274945DEST_PATH_IMAGE008
The weight of (c).
Further, the writing unit operates to: and judging whether the ID1 of the first target storage node is the same as the ID2 of the second target storage node, if so, writing the data to be written into the storage node corresponding to the ID1, if not, judging whether the priority Pr1 of the first target storage node is greater than or equal to the priority Pr2 of the second target storage node, if so, writing the data to be written into the storage node corresponding to the ID1, and if not, writing the data to be written into the storage node corresponding to the ID 2.
Still further, the load balancing policy is a random selection policy, a polling policy, or a source address hashing policy.
Further, the training of the graph neural network is obtained through historical data, and the historical data comprises the size of a written data block, the identification of a written storage node, the bandwidth of a scheduling node to the node, the processing capacity and the residual storage space of the node and the like.
Furthermore, the priority of the storage node is determined based on the reliability of the storage node and the spatial position of the storage node relative to the scheduling node, the priority of the storage node is updated at regular time, and the priority of the storage node is calculated in a manner that:
Figure 267172DEST_PATH_IMAGE010
wherein the content of the first and second substances,
Figure 978907DEST_PATH_IMAGE012
indicates the priority of the ith storage node, T i Representing the reliability of the ith storage node, D i Representing the ith storage nodeRelative to the spatial position of the scheduling node, and when D i <1 hour, let D i And =1, δ corresponds to the weight of the reliability of the ith storage node as a whole, and ∈ specifically corresponds to the weight of the spatial position of the scheduling node where the ith storage node is located.
Further, the spatial location of the storage node relative to the scheduling node may be represented using the distance of the storage node relative to the scheduling node.
The invention also provides a storage method of the distributed database system, wherein the distributed database system comprises N storage nodes, and the method comprises the following steps:
a receiving step, namely receiving data to be written in the distributed database system and determining the size of the data to be written;
a determining step, namely selecting one storage node from the N storage nodes as a first target storage node based on a load balancing strategy;
predicting, namely predicting a storage node from the N storage nodes by using the trained graph neural network to serve as a second target storage node;
a writing step, writing the data to be written based on the first target storage node and the second target storage node;
wherein N is more than or equal to 2;
the distributed database system further comprises a scheduling server, the scheduling server writes the data to be written into corresponding storage nodes, the scheduling server is connected with each storage node, and the graph of the graph neural network is formed in the following manner: the method comprises the following steps that the scheduling server and the N storage nodes are used as graph nodes, the connection between the scheduling server and the N storage nodes is used as an edge of a graph, the weight of the edge is determined based on the bandwidth between the scheduling server and the corresponding storage node, the characteristic value of the graph node is determined based on the processing capacity and the storage residual space of the storage node, and the weight of the edge of the graph and the characteristic value of the graph node are determined in the following mode: normalizing the bandwidth between the dispatch server and the corresponding storage node and the processing of the storage nodeObtaining the bandwidth normalization value W between the scheduling server and the corresponding storage node through the capacity and the storage residual space i And the processing capacity normalization value P of the storage node i And storing the remainder space normalization value M i Calculating the Weight of the edge of the graph i
Figure 814008DEST_PATH_IMAGE014
Calculating a characteristic value C of a graph node i
Figure 823552DEST_PATH_IMAGE016
Wherein i represents the ith storage node and α represents
Figure 11563DEST_PATH_IMAGE017
Beta represents P i Weight of (a), gamma denotes
Figure 471364DEST_PATH_IMAGE019
The weight of (c).
Further, the writing step operates as: and judging whether the ID1 of the first target storage node is the same as the ID2 of the second target storage node, if so, writing the data to be written into the storage node corresponding to the ID1, if not, judging whether the priority Pr1 of the first target storage node is greater than or equal to the priority Pr2 of the second target storage node, if so, writing the data to be written into the storage node corresponding to the ID1, and if not, writing the data to be written into the storage node corresponding to the ID 2.
Still further, the load balancing policy is a random selection policy, a polling policy, or a source address hashing policy.
Further, the training of the graph neural network is obtained through historical data, wherein the historical data comprises the size of a written data block, the identification of a written storage node, the bandwidth of a scheduling node to the node, the processing capacity and the residual storage space of the node and the like.
Furthermore, the priority of the storage node is determined based on the reliability of the storage node and the spatial position of the storage node relative to the scheduling node, the priority of the storage node is updated at regular time, and the priority of the storage node is calculated in a manner that:
Figure 403548DEST_PATH_IMAGE021
wherein, the first and the second end of the pipe are connected with each other,
Figure 92149DEST_PATH_IMAGE023
indicates the priority of the ith storage node, T i Representing the reliability of the ith storage node, D i Represents the spatial position of the ith storage node relative to the scheduling node, and when D i <1 hour, let D i And =1, δ corresponds to the weight of the reliability of the ith storage node as a whole, and ∈ specifically corresponds to the weight of the spatial position of the scheduling node where the ith storage node is located.
Further, the spatial location of the storage node relative to the scheduling node may be represented using the distance of the storage node relative to the scheduling node.
The present invention also proposes a computer-readable storage medium having stored thereon computer program code which, when executed by a computer, performs the method of any of the above.
The invention has the technical effects that: a storage device, method and storage medium of a distributed database system, the device comprising: the receiving unit is used for receiving data to be written in the distributed database system and determining the size of the data to be written; the determining unit is used for selecting one storage node from the N storage nodes as a first target storage node based on a load balancing strategy; the prediction unit predicts one storage node from the N storage nodes as a second target storage node by using the trained graph neural network; and the writing unit writes the data to be written based on the first target storage node and the second target storage node. According to the invention, an Artificial Intelligence (AI) mode is adopted to select a proper target writing node based on the size of a written data block, and a more reasonable writing node is selected based on a certain rule based on the target writing node selected by the AI and the target node selected by a general load balancing strategy, so that the overall writing performance of the distributed database system is improved, and the technical problem of unbalanced writing of the nodes of the distributed database system is solved; in the invention, whether a target node predicted by artificial intelligence and a target writing node selected by a load balancing strategy are the same node or not is compared, if the target node and the target writing node are not the same node, the priority of the target node and the target writing node is compared, and data is written into the node with high priority, namely, the invention provides a node writing mode based on artificial intelligence and priority judgment, thereby further improving the data writing performance; in the invention, based on the actual situation of the distributed database system, a graph neural network is constructed to predict target write-in nodes based on the size of a data block to be written, an initial calculation method of edge weights and graph node characteristic values of the graph neural network is provided, and the constructed graph neural network can well predict the target write-in nodes through simulation calculation so as to balance the write-in data of the distributed database system. In the invention, the AI technology and the traditional selection technology are combined in a priority calculation mode, therefore, the invention provides a storage node priority calculation mode which is determined according to the reliability of the storage node and the spatial position of the storage node relative to the scheduling node and is mainly determined based on the reliability, and the spatial position is taken as an influence factor, so that the spatial position adopts logarithmic calculation to reduce the influence.
Drawings
Other features, objects and advantages of the present application will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings.
Fig. 1 is a flow chart of a storage method of a distributed database system according to an embodiment of the present invention.
Fig. 2 is a structural diagram of a storage apparatus of a distributed database system according to an embodiment of the present invention.
Detailed Description
The present application will be described in further detail with reference to the following drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant invention and not restrictive of the invention. It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings.
It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.
Fig. 1 illustrates a storage method of a distributed database system according to the present invention, where the distributed database system includes N storage nodes, that is, the distributed database system is composed of a plurality of storage nodes, and the method includes:
a receiving step S101, receiving data to be written in the distributed database system, and determining the size of the data to be written;
determining step S102, selecting one storage node from the N storage nodes as a first target storage node based on a load balancing strategy; in the present invention, the load balancing policy is a random selection policy, a polling policy or a source address hash policy, which are relatively mature load balancing policies and are not described again.
A predicting step S103, predicting a storage node from the N storage nodes by using the trained neural network as a second target storage node;
a writing step S104, writing the data to be written based on the first target storage node and the second target storage node; wherein N is more than or equal to 2.
Generally speaking, the method of the invention can be operated on a scheduling server or a load balancing server, and the important inventive concept of the invention is to select a proper target writing node based on the size of a written data block by adopting an Artificial Intelligence (AI) mode, and select a more reasonable writing node based on a certain rule based on the target writing node selected by the AI and the target node selected by a general load balancing strategy, thereby improving the overall writing performance of the distributed database system and solving the technical problem of unbalanced node writing of the distributed database system, which is one of the important invention points of the invention.
In one embodiment, the writing step S104 is performed by: and judging whether the ID1 of the first target storage node is the same as the ID2 of the second target storage node, if so, writing the data to be written into the storage node corresponding to the ID1, if not, judging whether the priority Pr1 of the first target storage node is greater than or equal to the priority Pr2 of the second target storage node, if so, writing the data to be written into the storage node corresponding to the ID1, and if not, writing the data to be written into the storage node corresponding to the ID 2.
In the invention, whether the target node predicted by artificial intelligence and the target writing node selected by the load balancing strategy are the same node or not is compared, if the target node and the target writing node are not the same node, the priority of the target node and the target writing node is compared, and data is written into the node with high priority, namely, the invention provides a node writing mode based on artificial intelligence and priority judgment, thereby further improving the data writing performance, which is another important invention point of the invention.
In one embodiment, the distributed database system of the present invention further includes a scheduling server, where the scheduling server writes the data to be written into corresponding storage nodes, and the scheduling server is connected to each storage node, and a graph of the graph neural network is formed in a manner that: and taking the scheduling server and the N storage nodes as graph nodes, taking the connection between the scheduling server and the N storage nodes as the edges of the graph, determining the weight of the edges based on the bandwidth between the scheduling server and the corresponding storage nodes, and determining the characteristic value of the graph node based on the processing capacity and the storage residual space of the storage node.
In one embodiment, the way to determine the weights of the edges of the graph and the eigenvalues of the graph nodes is: normalizing the bandwidth between the scheduling server and the corresponding storage node, the processing capacity and the storage residual space of the storage node to obtain a bandwidth normalization value W between the scheduling server and the corresponding storage node i And the processing capacity normalization value P of the storage node i And storing the remainder space normalization value M i Calculating the Weight of the edge of the graph i
Figure 793389DEST_PATH_IMAGE025
Calculating a characteristic value C of a graph node i
Figure 361773DEST_PATH_IMAGE027
Wherein i represents the ith storage node and alpha represents
Figure 656620DEST_PATH_IMAGE005
Beta represents P i Weight of (2), gamma denotes
Figure 335863DEST_PATH_IMAGE029
The weights may be obtained from historical data or may be adjusted based on experience in actual work.
In the invention, based on the actual situation of the distributed database system, a graph neural network is constructed to predict target write-in nodes based on the size of a data block to be written, an initial calculation method of edge weights and graph node characteristic values of the graph neural network is provided, and the constructed graph neural network can well predict the target write-in nodes through simulation calculation so as to balance the write-in data of the distributed database system.
In one embodiment, the training of the graph neural network is obtained through historical data, wherein the historical data comprises the size of a written data block, the identification of a written storage node, the bandwidth of a scheduling node to the node, the processing capacity and the residual storage space of the node and the like.
In one embodiment, the priority of the storage node is determined based on the reliability of the storage node and the spatial position of the storage node relative to the scheduling node, the priority of the storage node is updated at regular time, and the priority of the storage node is calculated by:
Figure 235817DEST_PATH_IMAGE031
wherein, the first and the second end of the pipe are connected with each other,
Figure 771840DEST_PATH_IMAGE033
indicates the priority, T, of the ith storage node i Representing the reliability of the i-th storage node, D i Represents the spatial position of the ith storage node relative to the scheduling node, and when D i <1 hour, let D i And =1, δ corresponds to the weight of the reliability of the ith storage node as a whole, and ∈ specifically corresponds to the weight of the spatial position of the scheduling node where the ith storage node is located.
In the invention, the AI technology and the traditional selection technology are combined in a priority calculation mode, therefore, the invention provides a storage node priority calculation mode which is determined according to the reliability of the storage node and the spatial position of the storage node relative to the scheduling node and is mainly determined based on the reliability, and the spatial position is taken as an influence factor, so that the spatial position adopts logarithmic calculation to reduce the influence, which is another important invention point of the invention.
In one embodiment, the spatial location of the storage node relative to the scheduling node may be represented using the distance of the storage node relative to the scheduling node.
Fig. 2 shows a storage apparatus of a distributed database system of the present invention, the distributed database system including N storage nodes, that is, the distributed database is composed of a plurality of storage nodes, the apparatus includes:
a receiving unit 201, configured to receive data to be written into the distributed database system, and determine the size of the data to be written;
a determining unit 202, configured to select one storage node from the N storage nodes as a first target storage node based on a load balancing policy; in the present invention, the load balancing policy is a random selection policy, a polling policy, or a source address hash policy, which are relatively mature load balancing policies and are not described again.
The predicting unit 203 predicts one storage node from the N storage nodes as a second target storage node by using the trained graph neural network;
a write unit 204, configured to write the data to be written based on the first target storage node and the second target storage node; wherein N is more than or equal to 2.
Generally speaking, the method of the invention can be operated on a scheduling server or a load balancing server, and the important inventive concept of the invention is to select a proper target writing node based on the size of a written data block by adopting an Artificial Intelligence (AI) mode, and select a more reasonable writing node based on a certain rule based on the target writing node selected by the AI and the target node selected by a general load balancing strategy, thereby improving the overall writing performance of the distributed database system and solving the technical problem of unbalanced node writing of the distributed database system, which is one of the important invention points of the invention.
In one embodiment, the operation of the write unit 204 is: and judging whether the ID1 of the first target storage node is the same as the ID2 of the second target storage node, if so, writing the data to be written into the storage node corresponding to the ID1, if not, judging whether the priority Pr1 of the first target storage node is greater than or equal to the priority Pr2 of the second target storage node, if so, writing the data to be written into the storage node corresponding to the ID1, and if not, writing the data to be written into the storage node corresponding to the ID 2.
In the invention, whether the target node predicted by artificial intelligence and the target writing node selected by the load balancing strategy are the same node or not is compared, if the target node and the target writing node are not the same node, the priority of the target node and the target writing node is compared, and data is written into the node with high priority, namely, the invention provides a node writing mode based on artificial intelligence and priority judgment, thereby further improving the data writing performance, which is another important invention point of the invention.
In one embodiment, the distributed database system of the present invention further includes a scheduling server, where the scheduling server writes the data to be written into a corresponding storage node, the scheduling server is connected to each storage node, and a diagram of the graph neural network is formed in a manner that: the scheduling server and the N storage nodes are used as graph nodes, the connection between the scheduling server and the N storage nodes is used as an edge of a graph, the weight of the edge is determined based on the bandwidth between the scheduling server and the corresponding storage node, and the characteristic value of the graph node is determined based on the processing capacity and the storage residual space of the storage node.
In one embodiment, the weights of the edges of the graph and the eigenvalues of the graph nodes are determined in the following manner: normalizing the bandwidth between the scheduling server and the corresponding storage node, the processing capacity and the storage residual space of the storage node to obtain a bandwidth normalization value W between the scheduling server and the corresponding storage node i And the processing capacity normalization value P of the storage node i And storing the remainder space normalization value M i Calculating the Weight of the edge of the graph i
Figure 944196DEST_PATH_IMAGE035
Computing characteristic values C of graph nodes i
Figure 39846DEST_PATH_IMAGE037
Wherein i represents the ith storage node and alpha represents
Figure 715678DEST_PATH_IMAGE038
Beta represents P i Weight of (2), gamma denotes
Figure 376598DEST_PATH_IMAGE039
The weights may be obtained from historical data or may be adjusted based on experience in actual work.
In the invention, based on the actual situation of the distributed database system, a graph neural network is constructed to predict target writing nodes based on the size of a data block to be written, an initial calculation method of edge weights and graph node characteristic values of the graph neural network is provided, and the constructed graph neural network can predict the target writing nodes well through simulation calculation so as to balance the writing data of the distributed database system.
In one embodiment, the training of the graph neural network is obtained through historical data, wherein the historical data comprises the size of a written data block, the identification of a written storage node, the bandwidth of a scheduling node to the node, the processing capacity and the residual storage space of the node and the like.
In one embodiment, the priority of the storage node is determined based on the reliability of the storage node and the spatial position of the storage node relative to the scheduling node, the priority of the storage node is updated at regular time, and the priority of the storage node is calculated by:
Figure 895304DEST_PATH_IMAGE041
wherein the content of the first and second substances,
Figure 588453DEST_PATH_IMAGE043
indicates the priority of the ith storage node, T i Representing the ith storage nodeDegree of reliability, D i Represents the spatial position of the ith storage node relative to the scheduling node, and when D i <1 hour, let D i And =1, δ corresponds to the weight of the reliability of the ith storage node as a whole, and ∈ specifically corresponds to the weight of the spatial position of the scheduling node where the ith storage node is located.
In the invention, the AI technology and the traditional selection technology are combined in a priority calculation mode, therefore, the invention provides a storage node priority calculation mode which is determined according to the reliability of the storage node and the spatial position of the storage node relative to the scheduling node and is mainly determined based on the reliability, and the spatial position is taken as an influence factor, so that the spatial position adopts logarithmic calculation to reduce the influence, which is another important invention point of the invention.
In one embodiment, the spatial location of the storage node relative to the scheduling node may be represented using the distance of the storage node relative to the scheduling node.
For convenience of description, the above devices are described as being functionally separated into various units and described separately. Of course, the functionality of the units may be implemented in one or more software and/or hardware when implementing the present application.
From the above description of the embodiments, it is clear to those skilled in the art that the present application can be implemented by software plus necessary general hardware platform. Based on such understanding, the technical solutions of the present application may be essentially implemented or the portions that contribute to the prior art may be embodied in the form of a software product, which may be stored in a storage medium, such as ROM/RAM, a magnetic disk, an optical disk, etc., and includes several instructions for enabling a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the apparatuses described in the embodiments or some portions of the embodiments of the present application.
Finally, it should be noted that: although the present invention has been described in detail with reference to the above embodiments, it should be understood by those skilled in the art that: modifications and equivalents may be made thereto without departing from the spirit and scope of the invention and it is intended to cover in the claims the invention as defined in the appended claims.

Claims (7)

1. A storage apparatus of a distributed database system, the distributed database system comprising N storage nodes, the apparatus comprising:
the receiving unit is used for receiving data to be written in the distributed database system and determining the size of the data to be written;
the determining unit is used for selecting one storage node from the N storage nodes as a first target storage node based on a load balancing strategy;
the prediction unit predicts one storage node from the N storage nodes as a second target storage node by using the trained graph neural network;
the writing unit writes the data to be written based on the first target storage node and the second target storage node;
wherein N is more than or equal to 2;
the distributed database system further comprises a scheduling server, the scheduling server writes the data to be written into corresponding storage nodes, the scheduling server is connected with each storage node, and the graph of the graph neural network is formed in the following mode: the scheduling server and the N storage nodes are used as graph nodes, the connection between the scheduling server and the N storage nodes is used as an edge of a graph, the weight of the edge is determined based on the bandwidth between the scheduling server and the corresponding storage node, and the characteristic value of the graph node is determined based on the processing capacity and the storage residual space of the storage node;
the way of determining the weights of the edges of the graph and the eigenvalues of the graph nodes is: normalizing the bandwidth between the scheduling server and the corresponding storage node, the processing capacity of the storage node and the storage residual space to obtain a bandwidth normalization value W between the scheduling server and the corresponding storage node i And the processing capacity normalization value P of the storage node i And storing the remainder space normalization value M i Calculating the Weight of the edge of the graph i
Figure 259106DEST_PATH_IMAGE002
Computing characteristic values C of graph nodes i
Figure 478735DEST_PATH_IMAGE004
Wherein i represents the ith storage node and alpha represents
Figure 69116DEST_PATH_IMAGE005
Beta represents P i Weight of (a), gamma denotes
Figure 783126DEST_PATH_IMAGE006
The weight of (c).
2. The apparatus of claim 1, wherein the write unit is operable to: and judging whether the ID1 of the first target storage node is the same as the ID2 of the second target storage node, if so, writing the data to be written into the storage node corresponding to the ID1, if not, judging whether the priority Pr1 of the first target storage node is greater than or equal to the priority Pr2 of the second target storage node, if so, writing the data to be written into the storage node corresponding to the ID1, and if not, writing the data to be written into the storage node corresponding to the ID 2.
3. The apparatus of claim 2, wherein the load balancing policy is a random selection policy, a round robin policy, or a source address hashing policy.
4. A storage method of a distributed database system, wherein the distributed database system includes N storage nodes, the method comprising:
a receiving step, namely receiving data to be written into the distributed database and determining the size of the data to be written;
determining, namely selecting one storage node from the N storage nodes as a first target storage node based on a load balancing strategy;
predicting, namely predicting one storage node from the N storage nodes by using the trained graph neural network to serve as a second target storage node;
a writing step, namely writing the data to be written based on the first target storage node and the second target storage node;
wherein N is more than or equal to 2;
the distributed database system further comprises a scheduling server, the scheduling server writes the data to be written into corresponding storage nodes, the scheduling server is connected with each storage node, and the graph of the graph neural network is formed in the following mode: the method comprises the following steps that the scheduling server and the N storage nodes are used as graph nodes, the connection between the scheduling server and the N storage nodes is used as an edge of a graph, the weight of the edge is determined based on the bandwidth between the scheduling server and the corresponding storage node, the characteristic value of the graph node is determined based on the processing capacity and the storage residual space of the storage node, and the weight of the edge of the graph and the characteristic value of the graph node are determined in the following mode: normalizing the bandwidth between the scheduling server and the corresponding storage node, the processing capacity and the storage residual space of the storage node to obtain a bandwidth normalization value W between the scheduling server and the corresponding storage node i And the processing capacity normalization value P of the storage node i And storing the remainder space normalization value M i Calculating the Weight of the edge of the graph i
Figure 673721DEST_PATH_IMAGE007
Computing characteristic values C of graph nodes i
Figure 64251DEST_PATH_IMAGE008
Wherein i represents the ith storage node and α represents
Figure 141929DEST_PATH_IMAGE010
Beta represents P i Weight of (a), gamma denotes
Figure 394050DEST_PATH_IMAGE006
The weight of (c).
5. The method of claim 4, wherein the writing step operates by: and judging whether the ID1 of the first target storage node is the same as the ID2 of the second target storage node, if so, writing the data to be written into the storage node corresponding to the ID1, if not, judging whether the priority Pr1 of the first target storage node is greater than or equal to the priority Pr2 of the second target storage node, if so, writing the data to be written into the storage node corresponding to the ID1, and if not, writing the data to be written into the storage node corresponding to the ID 2.
6. The method of claim 5, wherein the load balancing policy is a random selection policy, a round robin policy, or a source address hashing policy.
7. A computer-readable storage medium having computer program code stored thereon which, when executed by a computer, performs the method of any of the preceding claims 4-6.
CN202211127361.3A 2022-09-16 2022-09-16 Storage device, method and storage medium of distributed database system Active CN115202591B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211127361.3A CN115202591B (en) 2022-09-16 2022-09-16 Storage device, method and storage medium of distributed database system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211127361.3A CN115202591B (en) 2022-09-16 2022-09-16 Storage device, method and storage medium of distributed database system

Publications (2)

Publication Number Publication Date
CN115202591A true CN115202591A (en) 2022-10-18
CN115202591B CN115202591B (en) 2022-11-18

Family

ID=83572498

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211127361.3A Active CN115202591B (en) 2022-09-16 2022-09-16 Storage device, method and storage medium of distributed database system

Country Status (1)

Country Link
CN (1) CN115202591B (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110650208A (en) * 2019-09-29 2020-01-03 北京浪潮数据技术有限公司 Distributed cluster storage method, system, device and computer readable storage medium
CN110781006A (en) * 2019-10-28 2020-02-11 重庆紫光华山智安科技有限公司 Load balancing method, device, node and computer readable storage medium
CN111064808A (en) * 2019-12-30 2020-04-24 北京天融信网络安全技术有限公司 Load balancing method and device based on distributed storage system
US20200293838A1 (en) * 2019-03-13 2020-09-17 Deepmind Technologies Limited Scheduling computation graphs using neural networks
CN112486641A (en) * 2020-11-18 2021-03-12 鹏城实验室 Task scheduling method based on graph neural network
US20210312280A1 (en) * 2020-04-03 2021-10-07 Robert Bosch Gmbh Device and method for scheduling a set of jobs for a plurality of machines
WO2022116142A1 (en) * 2020-12-04 2022-06-09 深圳大学 Resource scheduling method based on graph neural network

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200293838A1 (en) * 2019-03-13 2020-09-17 Deepmind Technologies Limited Scheduling computation graphs using neural networks
CN110650208A (en) * 2019-09-29 2020-01-03 北京浪潮数据技术有限公司 Distributed cluster storage method, system, device and computer readable storage medium
CN110781006A (en) * 2019-10-28 2020-02-11 重庆紫光华山智安科技有限公司 Load balancing method, device, node and computer readable storage medium
CN111064808A (en) * 2019-12-30 2020-04-24 北京天融信网络安全技术有限公司 Load balancing method and device based on distributed storage system
US20210312280A1 (en) * 2020-04-03 2021-10-07 Robert Bosch Gmbh Device and method for scheduling a set of jobs for a plurality of machines
CN112486641A (en) * 2020-11-18 2021-03-12 鹏城实验室 Task scheduling method based on graph neural network
WO2022116142A1 (en) * 2020-12-04 2022-06-09 深圳大学 Resource scheduling method based on graph neural network

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
李强等: "基于Hopfield神经网络的云存储负载均衡策略", 《计算机应用》 *

Also Published As

Publication number Publication date
CN115202591B (en) 2022-11-18

Similar Documents

Publication Publication Date Title
EP4235514A2 (en) Methods, systems, articles of manufacture and apparatus to map workloads
CN110673951B (en) Mimicry scheduling method, system and medium for general operation environment
CN113326126A (en) Task processing method, task scheduling device and computer equipment
WO2020155300A1 (en) Model prediction method and device
CN112764893B (en) Data processing method and data processing system
CN112633567A (en) Method and device for predicting waiting duration and storage medium
KR20220013896A (en) Method and apparatus for determining the neural network architecture of a processor
CN116467082A (en) Big data-based resource allocation method and system
CN115994611A (en) Training method, prediction method, device and storage medium for category prediction model
CN115202591B (en) Storage device, method and storage medium of distributed database system
CN107038244A (en) A kind of data digging method and device, a kind of computer-readable recording medium and storage control
CN116360921A (en) Cloud platform resource optimal scheduling method and system for electric power Internet of things
CN114154252B (en) Risk assessment method and device for failure mode of power battery system of new energy automobile
CN114742644A (en) Method and device for training multi-scene wind control system and predicting business object risk
US20230064834A1 (en) Discrete optimisation
CN114444654A (en) NAS-oriented training-free neural network performance evaluation method, device and equipment
CN113392100A (en) System intelligent verification method, device and system based on particle swarm optimization neural network
CN113407192B (en) Model deployment method and device
KR102579116B1 (en) Apparatus and method for automatically learning and distributing artificial intelligence based on the cloud
CN116980423B (en) Model scheduling method, device, computing system, equipment and readable storage medium
KR102441442B1 (en) Method and apparatus for learning graph convolutional network
US20230351146A1 (en) Device and computer-implemented method for a neural architecture search
US20220318634A1 (en) Method and apparatus for retraining compressed model using variance equalization
CN117195036A (en) Intelligent processing method and system for power plant data
CN113723593A (en) Load shedding prediction method and system based on neural network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant