WO2023104192A1 - 一种集群系统的管理方法及装置 - Google Patents

一种集群系统的管理方法及装置 Download PDF

Info

Publication number
WO2023104192A1
WO2023104192A1 PCT/CN2022/138000 CN2022138000W WO2023104192A1 WO 2023104192 A1 WO2023104192 A1 WO 2023104192A1 CN 2022138000 W CN2022138000 W CN 2022138000W WO 2023104192 A1 WO2023104192 A1 WO 2023104192A1
Authority
WO
WIPO (PCT)
Prior art keywords
sub
grid
business data
data
spatial
Prior art date
Application number
PCT/CN2022/138000
Other languages
English (en)
French (fr)
Inventor
王昊
黄骞
关雪峰
向隆刚
庞兆星
叶鹏辉
张军
原朝
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2023104192A1 publication Critical patent/WO2023104192A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5083Techniques for rebalancing the load in a distributed system
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/13File access structures, e.g. distributed indices
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/16Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5061Partitioning or combining of resources

Definitions

  • the present application relates to the field of computer technology, in particular to a management method and device for a cluster system.
  • the current distributed big data system is managed in the Master-Slave mode.
  • the master node is responsible for task distribution, data collection, resource scheduling, etc., and the slave nodes mainly perform data processing and calculation; any slave node may manage or store data in any area. data.
  • any slave node may manage or store data in any area. data.
  • Figure 1 shows the processing architecture of a certain telecommunications business system. From data collection to data storage in the data warehouse, multiple functional clusters, such as collection cluster, association cluster, and positioning cluster, need to process the data. Due to the inconsistent data distribution strategy (for example, the previous level of the system is distributed by network elements, and the latter level is distributed by users), resulting in mesh interaction between different processing cluster nodes, and a single node failure will have a cascading amplification effect, and the system reliability is low. .
  • the public data (such as maps, fingerprint libraries, etc.) that need to be used in the processing process needs to be fully cached on a single node or broadcast a large configuration table in the cluster.
  • the processing capacity of the system decreases and needs to be degraded.
  • the present application provides a management method and device for a cluster system, which realizes space division management of the cluster system and improves the reliability and scale expansion capability of the cluster system.
  • the present application provides a cluster system management method, the management method is applied to the management node, the cluster system includes the management node and N computing nodes, the cluster system is responsible for processing business data in the preset area, and N is greater than Integer of 1, the management method includes: obtaining historical business data, which is business data in several periods before the current period; determining the temporal and spatial distribution characteristics of historical business data; based on the temporal and spatial distribution characteristics of historical business data, the preset area It is divided into M sub-areas, and the historical business data is evenly distributed among the M sub-areas; the corresponding relationship between the M sub-areas and multiple computing nodes is determined, and the business data of each sub-area in the M sub-areas is routed to the corresponding computing nodes for processing.
  • the management method of the cluster system provided in the first aspect, by analyzing the temporal and spatial distribution of historical business data, the temporal and spatial distribution characteristics of historical business data are obtained, and based on the temporal and spatial distribution characteristics, the preset area is divided into M with the goal of data distribution balance
  • Each computing node in multiple computing nodes is responsible for processing the business data of the corresponding sub-region, which realizes the space division management of the cluster system and improves the reliability and scale expansion capability of the cluster system.
  • the preset threshold can be 1 mb
  • the determination of the spatio-temporal distribution characteristics of the historical business data includes: based on the spatial distribution characteristics of the business data of multiple time periods in the historical business data, determining the spatial distribution matrix of the business data corresponding to multiple time slices respectively , the adjacent time periods in the multiple time periods are continuous; based on the spatial distribution matrix of the business data corresponding to the multiple time slices, the temporal and spatial distribution characteristics of the historical business data are determined.
  • the above-mentioned spatial distribution matrix of business data corresponding to multiple time slices is used to determine the temporal and spatial distribution characteristics of historical business data, including: spatial distribution of business data corresponding to each time slice in multiple time slices The matrix is normalized to determine the spatial distribution matrix of business data corresponding to each time slice after normalization; based on the similarity between the spatial distribution matrices of business data corresponding to each time slice after normalization, multiple A time slice is divided into multiple time slices; the spatial distribution matrix of the business data corresponding to the normalized multiple time slices is aggregated to determine the aggregated distribution matrix of multiple time slices, and the aggregated distribution matrix of multiple time slices Characterize the spatio-temporal distribution characteristics of historical business data.
  • the preset area is divided into M sub-areas, including: based on the temporal and spatial distribution characteristics and spatial proximity of historical business data, the preset area is divided into M M sub-regions, and historical business data is evenly distributed among M sub-regions.
  • the aforementioned division of the preset area into M sub-areas based on the spatio-temporal distribution characteristics and spatial proximity of the historical business data includes: determining an adaptive division threshold; Initial spatial grid division; based on the spatio-temporal distribution characteristics of historical business data, determine the amount of data in each sub-grid in the initial spatial grid; traverse each sub-grid in the initial spatial grid, based on each sub-network in the initial spatial grid According to the amount of data in the grid and the adaptive division threshold, each sub-grid in the initial spatial grid is divided or aggregated to determine the adaptive spatial grid division of historical business data; based on each sub-grid in the adaptive spatial grid The proximity relationship of the adaptive spatial grid and the data volume in each sub-grid in the adaptive spatial grid divide the preset area into M sub-areas, and the data volume distribution in the M sub-areas is balanced.
  • the above-mentioned initial spatial grid division of the historical business data based on the adaptive division threshold includes: determining the initial division level based on the adaptive division threshold and the location information of the preset area; Hierarchical and hierarchical calculation models based on spatial distribution, the initial spatial grid division of historical business data.
  • the above-mentioned pre-set area is divided into multiple sub-areas based on the adjacent relationship of each sub-grid in the adaptive grid and the amount of data in each sub-grid in the adaptive grid, including:
  • the adaptive spatial grid division of historical business data determines the undirected graph corresponding to the adaptive spatial grid.
  • Each node of the undirected graph is determined based on each sub-grid in the adaptive spatial grid, and the weight of each node is based on the self-adaptive spatial grid.
  • the amount of data in each sub-grid in the adaptive spatial grid is determined, the edges connecting adjacent nodes are determined based on the proximity relationship of each sub-grid in the adaptive spatial grid, and the weight of each edge is based on the Determine the distance between adjacent sub-grids; determine the adjacency matrix of the undirected graph based on the undirected graph corresponding to the adaptive spatial grid; determine the adjacency matrix of the undirected graph based on the adaptive spatial grid
  • the area is divided into M sub-areas, and the data volume distribution in the M sub-areas is balanced.
  • the determination of the corresponding relationship between M sub-regions and N computing nodes includes: determining that N is greater than or equal to M; dividing N computing nodes into M sub-clusters, and M sub-clusters and M sub-regions are the same One to one correspondence.
  • the determination of the correspondence between the M sub-areas and the N computing nodes includes: determining that N is less than the M; counting the average business data volume of each of the M sub-areas in multiple time periods; The M sub-areas are divided into N groups of sub-areas with the goal of minimizing the difference in the average amount of business data among the groups; the N groups of sub-areas are in one-to-one correspondence with the N computing nodes.
  • the above management method further includes: based on the temporal and spatial distribution characteristics of the historical business data of the current period and the temporal and spatial distribution characteristics of the historical business data of the previous period, determining whether the division of the preset areas needs to be updated.
  • each of the plurality of computing nodes caches data related to the service in a sub-area corresponding to each of the computing nodes.
  • the present application provides a management device for a cluster system, the management device is applied to a cluster system, the cluster system includes the management node and N computing nodes, and the cluster system is responsible for processing load request, the N is an integer greater than 1, and the means include:
  • An acquisition module configured to acquire historical business data, where the historical business data is business data in several periods before the current period;
  • a determining module configured to determine the spatio-temporal distribution characteristics of the historical business data
  • a division module configured to divide the preset area into M sub-areas based on the spatio-temporal distribution characteristics of the historical business data, and the historical business data is evenly distributed among the M sub-areas;
  • the routing module is configured to determine the corresponding relationship between the M sub-areas and the N computing nodes, and route the service data of each sub-area in the M sub-areas to the corresponding computing nodes for processing.
  • the determination module is specifically configured to, based on the spatial distribution characteristics of business data in multiple time periods in historical business data, determine the spatial distribution matrix of business data corresponding to multiple time slices, the multiple The adjacent time periods in a time period are continuous;
  • the temporal and spatial distribution characteristics of the historical service data are determined.
  • the determining the temporal and spatial distribution characteristics of the historical business data based on the spatial distribution matrices of the business data respectively corresponding to the multiple time slices includes:
  • the aggregated distribution matrix of the multiple time slices characterizes the historical business data The spatiotemporal distribution characteristics.
  • the division module is specifically configured to divide the preset area into multiple sub-areas based on the spatio-temporal distribution characteristics and spatial proximity of the historical business data, and the historical business data is in the Balanced distribution among the M subregions.
  • the division of the preset area into M sub-areas based on the spatio-temporal distribution characteristics and spatial proximity relationship of the historical service data includes:
  • each sub-grid in the adaptive spatial grid Based on the adjacent relationship of each sub-grid in the adaptive spatial grid and the amount of data in each sub-grid in the adaptive spatial grid, divide the preset area into M sub-areas, and the plurality of sub-grids The data volume distribution in the region is balanced.
  • the initial spatial grid division of the historical service data based on the adaptive division threshold includes:
  • an initial spatial grid division is performed on the historical service data.
  • the preset area is divided into M sub-regions, including:
  • each node of the undirected graph is based on each subnet in the adaptive spatial grid
  • the weight of each node is determined based on the amount of data in each sub-grid in the adaptive spatial grid, and the edges connecting the adjacent nodes are determined based on the weight of each sub-grid in the adaptive spatial grid.
  • the adjacent relationship is determined, and the weight of each edge is determined based on the distance between each adjacent sub-grid in the adaptive spatial grid;
  • the preset area is divided into M sub-areas, and the distribution of data volume among the M sub-areas is balanced.
  • the routing module is specifically configured to determine that the N is greater than or equal to the M; divide the N computing nodes into M sub-clusters, and the M sub-clusters are related to the M sub-clusters Regions correspond one to one.
  • the routing module is specifically configured to determine that the N is smaller than the M
  • the N groups of sub-areas are in one-to-one correspondence with the N computing nodes.
  • the device further includes: an update module, configured to determine the Whether the division of preset areas needs to be updated.
  • the update module is specifically configured to compare whether the similarity between the temporal and spatial distribution characteristics of the historical business data of the current period and the temporal and spatial distribution characteristics of the historical business data of the previous period is greater than a preset threshold;
  • the present application provides a cluster system, including a management node and a plurality of computing nodes, the cluster system is responsible for processing business data in a preset area, the management node includes a memory and a processor, and the memory contains Instructions are stored to implement the method described in the first aspect when the instructions are executed by the processor.
  • the present application provides a computing device, including a memory and a processor, where instructions are stored in the memory, and when the instructions are executed by the processor, the method described in the first aspect is implemented.
  • the present application provides a computer storage medium, including computer instructions.
  • the computer instructions are executed by a processor, the method described in the first aspect is implemented.
  • the present application provides a computer program or a computer program product, the computer program or computer program product includes instructions, and when the instructions are executed, the computer is instructed to execute the method described in the first aspect.
  • Fig. 1 is the processing frame diagram of existing telecommunication service system
  • FIG. 2 is a schematic diagram of the architecture of a cluster system provided by an embodiment of the present application.
  • FIG. 3 is a schematic diagram of a telecom service processing architecture after applying the cluster system provided by the embodiment of the present application;
  • FIG. 5 is a schematic diagram of the architecture of the positioning system after applying the cluster system provided by the embodiment of the present application;
  • FIG. 6 is a schematic structural diagram of a management device for a cluster system provided by an embodiment of the present application.
  • Fig. 8 is a schematic diagram of adaptive division to obtain an adaptive spatial grid based on an adaptive division threshold and an initial spatial grid;
  • Fig. 9 is a schematic diagram of dividing historical business data by an adaptive spatial grid
  • Figure 10 is an undirected graph based on the proximity relationship
  • Figure 11 is a schematic diagram of routing results
  • FIG. 12 is a schematic flowchart of a management method for a cluster system provided by an embodiment of the present application.
  • FIG. 13 is a schematic structural diagram of a cluster system management device provided by an embodiment of the present application.
  • FIG. 14 is a schematic structural diagram of a computing device provided by an embodiment of the present application.
  • FIG. 2 shows an architecture diagram of a cluster system provided by an embodiment of the present application.
  • the cluster system 100 includes a management node 10 and a plurality of computing nodes (such as computing node 21, computing node 22 and computing node 23 shown in Figure 2), wherein the management node 10 communicates with a plurality of computing nodes connection, the way of communication connection is wireless communication connection or wired communication connection; the management node 10 is mainly used for task distribution, data collection and resource scheduling, etc., and the computing node 21, computing node 22 and computing node 23 are used for performing Processing and calculation of business data. That is to say, the cluster system may be a distributed system.
  • the cluster system 100 is responsible for processing business data in a preset area, and divides the preset area into multiple sub-areas, for example, divides the preset area into multiple sub-areas, and the multiple sub-areas include at least area 1, area 2 and area 3.
  • the management node 10 receives the business data in the preset area, and then distributes the business data in each sub-area to the corresponding computing node for processing, for example, distributes the business data in area 1 to the computing node 21 for processing, and distributes the business data in area 2
  • the business data in area 3 is distributed to computing node 22 for processing, and the business data in area 3 is distributed to computing node 23 for processing. It is easy to understand that, the ellipsis after the computing node 23 in FIG. It is assumed that an ellipsis in a region indicates that in addition to the region 1 , region 2 and region 3 shown, a plurality of sub-regions may also include one or more sub-regions.
  • the number of multiple computing nodes can be smaller than the number of multiple sub-areas, and one computing node can be responsible for processing business data in multiple sub-areas; the number of multiple computing nodes can be equal to the number of multiple sub-areas, then the number of computing nodes and sub-areas Areas have a one-to-one relationship, that is, one computing node is responsible for processing business data in one sub-area; the data of multiple computing nodes can be larger than the data of multiple sub-areas, then multiple computing nodes are divided into multiple sub-clusters, and one sub-area
  • the cluster is responsible for processing business data in a sub-area, that is, multiple computing nodes can be jointly responsible for processing business data in a sub-area.
  • a cluster system such as a distributed system, includes multiple computing nodes, and the number of computing nodes can be deployed according to the requirements of the distributed system for computing resources.
  • This embodiment of the present application does not Make special restrictions.
  • computing nodes are usually deployed in a cluster, that is, several computing nodes form a cluster node.
  • a cluster management program can be deployed on one of the computing nodes or cluster nodes as the management node 10, and other cluster nodes as the above-mentioned multiple cluster nodes.
  • a group management program can be deployed on the master node, with the master node as the management node and other slave nodes as multiple cluster nodes.
  • a cluster node may include only one computing node or may include multiple computing nodes.
  • Computing nodes include but are not limited to personal computers, workstations, servers, other types of physical machines, and virtual machines deployed on physical machines.
  • the management node determines the spatio-temporal distribution characteristics of the historical business data by analyzing the historical business data in the management area in the space and time dimensions, and according to the spatio-temporal distribution characteristics of the historical business data, the data Divide the data so that spatially adjacent data are gathered on one computing node, and at the same time ensure that the data volume of each computing node is balanced; in addition, the cluster is divided to form multiple sub-clusters, which may include one or more nodes. Each sub-cluster manages or processes data or requests in a certain geographical area, and ensures that the load among the sub-clusters is relatively balanced, forming a spatial division and federation management strategy.
  • FIG. 3 is a telecommunication service processing architecture after applying the trunking system provided by the embodiment of the present application.
  • the processing architecture through the analysis of historical data within the scope of system management, based on the spatiotemporal distribution of data and tidal characteristics, divides the management and area into multiple partitions (such as area 1 and area 2), And based on the partition, multiple computing nodes in the cluster system are divided into multiple sub-clusters to ensure that the loads of different sub-clusters can be relatively balanced in different time periods.
  • each sub-cluster manages the data and computing tasks in the corresponding geographical area, and the management scope of the sub-clusters does not overlap; in this way, different sub-clusters do not affect each other, which can improve the horizontal expansion capability of the system and reduce the cascading amplification effect of failures.
  • each sub-cluster since each sub-cluster only manages a part of the area, public data such as maps and fingerprint databases can also be partitioned, and each sub-cluster caches the public data belonging to its management area, which can greatly improve the processing capacity of the system to support larger scale system delivery.
  • the cluster system provided in the embodiment of the present application can also be applied to a positioning scenario.
  • the solution of applying the trunking system provided in the embodiment of the present application to a positioning scenario is introduced below.
  • the realization of the general positioning service is to perform WiFi positioning or fingerprint positioning based on the positioning request sent by the user, and the underlying implementation is to query the fingerprint database according to the WiFi ID or base station cell ID list submitted by the user.
  • FIG 4 shows the architecture of the existing positioning system.
  • the positioning service request submitted by the user is sent to the positioning server through the elastic load balancing module.
  • the elastic load balancing module sends the task to the resource utilization rate based on the resource overhead of the current cluster positioning server.
  • Lower server execution belongs to random routing. This routing method leads to an extremely low cache hit rate of the location server, so the user's location service request will be converted into a query request of the underlying database with a high probability.
  • the positioning system has a very high degree of concurrency. The current degree of concurrency reaches 100,000 times per second, which puts great pressure on the access to the underlying database and creates an access bottleneck.
  • Figure 5 shows the architecture of the positioning system after applying the cluster system provided by the embodiment of the present application. Based on the historical spatio-temporal distribution of positioning requests, the management area is divided into multiple partitions, and the positioning of each partition is guaranteed in different time periods. Service request load is relatively balanced. Based on the partition, multiple computing nodes in the cluster system are divided to form multiple sub-clusters.
  • Each sub-cluster only processes positioning service requests submitted in a certain spatial area.
  • Each sub-cluster can include one or more positioning servers, and each positioning server is equipped with an in-memory database; the spatial areas processed by each sub-cluster do not overlap.
  • the original random routing based on resource overhead is changed to the adaptive routing based on the space area, and the hot request data of the area is cached in the memory database of each sub-cluster, which greatly improves the cache hit rate.
  • the space division management function of the management node of the cluster system provided in the embodiment of the present application is realized by the management device of the cluster system deployed on the node.
  • the management device of the cluster system may be implemented by program codes in the memory.
  • the program is deployed in the master node of the system cluster in the form of a program package for data partitioning, distribution and routing.
  • Fig. 6 shows a schematic structural diagram of a management device of a cluster system.
  • the management device of the cluster system may include five modules, which are data pre-analysis module, time slice aggregation module, adaptive partition module, data routing module considering spatial proximity and load, and dynamic update module.
  • the data pre-analysis module is used to perform pre-analysis operations on historical business data, analyze the distribution pattern of historical business data in space and time dimensions, identify time slices with significant differences in time-space distribution of historical business data, and classify time slices. According to the spatial distribution characteristics of the data, the spatial adaptive division threshold and the initial division level are calculated.
  • the time slice aggregation module is used to construct the distribution matrix, aggregate the distribution matrices corresponding to different time slices, and obtain the optimal average distribution result taking into account the data characteristics of each time period.
  • the adaptive division module is used to divide the spatial area managed by the system into grids of different sizes based on the average distribution matrix output by the time slice aggregation module. Mesh division operation, get adaptive division result set.
  • the data routing module that takes into account spatial proximity and load is used to calculate the spatial routing strategy of data or services, distribute spatially adjacent data or services to the same server node, and ensure load balancing among nodes.
  • the dynamic update module is an optional module. Since the spatio-temporal distribution characteristics of data will change over time, the load balancing effect of the original routing or system partitioning strategy may be reduced. This module can regularly check whether the routing or division strategy needs to be updated based on the update period set by the user.
  • the implementation method of the spatial division management function of a cluster system management device is as follows: input is historical data or spatio-temporal data collected in real time by the system, and passes through the data pre-analysis module, self-adaptive division module, and division aggregation module , a data routing module that takes space proximity and load into account, and dynamically updates the processing of the module, and outputs partitioning strategies or routing results. Then the system partitions the data according to the result or performs space division and segmentation management on the system; the specific steps of the implementation method are as follows:
  • Pre-analyze the spatio-temporal data in the management area identify the tidal heterogeneity time slices of the data, and calculate the adaptive division threshold and initial division level, perform adaptive grid division on different time slices according to the calculation results, and obtain the adaptive division results set.
  • the business data mentioned in this article can be the data of various businesses, such as the data of service business, including but not limited to positioning service requests and search service requests, etc.; the data of storage business, including but not limited to telecommunications business data, etc.
  • the historical business data mentioned in this article means the business data of several cycles before the current cycle, for example, the historical business data is the business data of the previous cycle, or the business data of the previous two cycles.
  • the following takes a positioning scenario as an example to introduce in detail the implementation method of the space division management function of the management device of the cluster system provided by the embodiment of the present application.
  • This embodiment takes a positioning cloud service on the cloud as an example, and describes in detail the implementation method of the space division management function of the management device of the cluster system provided by the embodiment of the present application.
  • the positioning cloud service is deployed in a public cloud environment. Under the premise of user authorization, when calling the software on the terminal device that needs to enable the positioning function, the system can provide users with outdoor or indoor positioning services based on this system.
  • the implementation process is as follows: the cloud server first receives the WiFi positioning or fingerprint positioning request sent by the user, and sends it to the positioning server with relatively small resource overhead through the load balancing module in a random routing manner.
  • the management device of the cluster system provided by the embodiment of the present application is located in the load balancing module, which changes the load balancing method from random routing to regional routing and ensures service request load balancing, which is used to solve the cache hit rate of the current location server Low, online database access bottleneck problem.
  • the historical data of the user location service request is used as input to generate an area routing scheme for the user location request through the processing of modules 1-5 of the management device of the cluster system provided by the embodiment of the present application.
  • the generated routing policy divides the management area of the system into n areas.
  • Each positioning server is responsible for the processing of positioning service requests in one area.
  • the system master node will determine which spatial area the user request belongs to, and send the user's positioning service request to the corresponding positioning server for processing.
  • An in-memory database is deployed in the positioning server. Based on historical user location service requests, store hotspot data in the memory database as a location service cache.
  • the local memory database is first queried. If the query result is obtained, it can be returned directly without querying the underlying online database. Since each location server only caches the hotspot data belonging to its management area, when the user location service request is routed according to the region, the cache hit rate can be greatly improved and the access pressure of the underlying database can be greatly reduced. At the same time, the query speed of the in-memory database is much faster than that of the disk-based online database. Based on this scheme, the system capacity and processing capacity are greatly improved when the system scale remains unchanged, and it can handle user requests with a higher degree of parallelism.
  • the input of the data pre-analysis module is the statistics of business data in a period of time, such as one week, one month, including the time and location information of business data transmission.
  • historical data will be continuously updated by business data occurring in real time.
  • Step 1.1 after sampling the historical business data, the spatial distribution characteristics of the data are counted according to a certain time interval. For example, statistics are performed at a granularity of 1 hour.
  • the space is first divided into uniform grids, and the amount of data contained in each grid in the corresponding time period is counted, and a distribution matrix is constructed based on the statistical values.
  • the number of rows and columns of the distribution matrix corresponds to the number of rows and columns of the grid, and the matrix value is the statistical value of the number of requests in each grid.
  • distribution matrices may also be generated with different time granularities or different space division methods, which is not limited in this application.
  • Step 1.2 identify tidal heterogeneity time slices.
  • the distribution matrix of different time periods represents the spatial distribution characteristics of the data in the corresponding time interval.
  • the distribution matrices of all time periods are standardized, and then the similarity of the spatial distribution characteristics of adjacent time periods can be evaluated by a variety of matrix similarity calculation methods.
  • the similarity of spatial distribution characteristics in adjacent time periods can be calculated by methods such as jaccard correlation coefficient, cosine similarity, Pearson correlation coefficient, and Euclidean distance.
  • the cosine similarity of the distribution matrix of the two time periods is greater than the difference threshold, it is considered that there is tidal heterogeneity in the time dimension, and it is determined to be a different time slice; if it is less than the difference threshold, it is considered that there is no tidal heterogeneity, and this Both time periods belong to the same time slice. Computes the average value of distribution matrices belonging to the same time slice as the distribution matrix for that time slice.
  • Step 1.3 adaptive partition threshold calculation.
  • the adaptive division threshold can be determined by various methods. For example, it can be determined based on the percentage of the data in the standardized grid to the total data volume, or it can be determined based on the final total number of grids generated, or it can be set through human experience. Or conduct multiple rounds of experiments by controlling variables to find the maximum value of load balancing results and algorithm efficiency gains. For the case where data storage load balancing is to be performed, this embodiment provides a feasible threshold calculation method. At this time, the adaptive partition threshold needs to be determined in combination with the characteristics of the underlying storage model of the distributed database system, and the data storage capacity in a grid should not exceed the minimum storage unit size for one query I/O scheduling of the system.
  • T obj Size block /S obj .
  • Size block represents the minimum storage unit size for one I/O scheduling of the distributed database system
  • S obj represents the storage space occupied by each point data after serialization
  • T obj represents the adaptive partition threshold.
  • Step 1.4 initial division level calculation.
  • the global initial division level is calculated according to the model.
  • IBox the enclosing rectangle of the data set, that is, the location and size of the system management area, and use (x min , y min , x max , y max ) to represent the latitude and longitude boundaries of the enclosing rectangle.
  • the total number of points in the data set be C
  • the point threshold calculated by the above model be T obj
  • the initial division level to be determined be n.
  • the length L and width W of the space grid divided by the nth layer are:
  • the initial division level n can be calculated according to the above formula, and the calculated initial division level n is used as the initial level of adaptive network division.
  • the distribution matrix aggregation module the distribution matrices corresponding to different time slices are aggregated to form an average representation of the data distribution in the entire data set time span, so as to ensure that the data load balance in each time interval can be relatively good. Effect.
  • the accuracy rate is used to express the sum of the differences between the aggregation matrix C and the distribution matrix of each time slice. The higher the accuracy rate, it proves that the aggregation matrix C is relatively optimal.
  • nor(A) represents the normalization of the matrix, and its calculation formula is as follows:
  • Step 3.1 divide the data into an initial spatial grid according to the initial division level, and count the amount of data in each initial spatial grid;
  • Step 3.2 traverse each initial spatial grid, and judge whether the current grid needs to be subdivided to the next level or aggregated to the next level according to the adaptive division threshold obtained in the data pre-analysis module. Stop until the amount of data in the grid satisfies the adaptive partition threshold condition or the grid data exceeds a certain amount. Finally, the space is divided into grid regions of different scales, that is, adaptive spatial grid division is obtained (see Fig. 8 and Fig. 9).
  • Step 4.1 construction of an undirected graph based on neighbor relations.
  • the multi-scale adaptive grid division results determined in the adaptive division module are abstracted into an undirected graph structure, each grid is used as a node, the node weight is the equivalent data amount in the grid, and the distance between grids The neighbor relationship of is used as an edge, and the edge weight corresponds to the distance between the grids (see Figure 10).
  • Step 4.2 construct the adjacency matrix of the undirected graph according to the adjacent relationship.
  • the adaptive grid is grouped with the goal of balancing the amount of data in each group after partitioning, so as to realize data routing.
  • the number of groups is determined based on user needs, and can be set to the number of sub-clusters in the current cluster (each sub-cluster is responsible for computing tasks in a certain spatial area), or it can be smaller than the number of sub-clusters (multiple sub-clusters are jointly responsible for computing in a certain spatial area) Task).
  • the node ID number corresponding to each adaptive grid is output, and a node ID corresponds to a cluster node.
  • the adaptive grid is divided into seven sub-areas based on the proximity relationship and data volume balance, and the IDs of the corresponding cluster nodes are marked on multiple sub-areas, such as 1101-1107 in Figure 11, so that the management node can collect
  • route the business data in the sub-area For example, if the business data received by the management node is sent from area 1, and the ID marked in area 1 is 1106, then the business data will be routed to the cluster node whose ID is 1106 for processing.
  • step 4.4 is also included, and this step is only used in scenarios where two-stage aggregation is required to further improve the balance of data/service distribution.
  • there is no strong aggregation requirement for the spatial area managed by each node of the cluster that is, it is not required that a computer node can only manage one sub-area, and can manage multiple disjoint sub-areas.
  • the number of division groups generated in step 4.3 is greater than the number of computing nodes, and then the division in step 4.3 is aggregated again through the two-stage aggregation method.
  • the two-stage aggregation uses the set partition algorithm to perform group aggregation optimization in the results of step 4.3 to ensure that the difference in the amount of data/services between aggregated groups is minimized.
  • each time slice in the group can be generated in the statistical step 4.3 (the granularity of the time slice can be determined as required, for example, one day can be used as a time slice granularity or one hour can be used as a time slice Granularity), the average data/service volume of each group is constructed as a set, and the set is grouped based on the Karmarkar Karp algorithm.
  • the cluster system has 3 computing nodes.
  • step 4.3 the graph partition algorithm is used to generate 9 partitioned sub-regions, and the amount of data contained in these 9 sub-regions in different time slices is counted, and the average value is calculated, and then the set ⁇ N1, ...,N9 ⁇ .
  • the load balancing routing results can be checked regularly to determine whether the routing strategy needs to be updated.
  • the following steps are performed in the dynamic update module:
  • step 5.1 it is assumed that the user sets the update period as one month.
  • a new aggregation distribution matrix C new is recalculated based on the historical data of the previous month.
  • Step 5.2 calculate the similarity.
  • the original aggregation distribution matrix be C old
  • the two distribution matrices C old and C new take their Frobenius norm to calculate their similarity.
  • Step 5.3 threshold judgment. If the similarity exceeds the set threshold ⁇ , it means that the data distribution has changed significantly, and it is necessary to recalculate the adaptive partition and routing result partition according to the new data distribution, and update the routing strategy at this time; if the similarity does not exceed the threshold ⁇ , continue Under the previous strategy.
  • the setting of the threshold ⁇ depends on actual application requirements.
  • FIG. 12 is a flow chart of a method for managing a cluster system provided by an embodiment of the present application. This method can be applied to the management node 10 in the cluster system shown in FIG. 2 , or the management device of the cluster system shown in FIG. 6 to realize space division management of the cluster system. As shown in FIG. 12, the method at least includes steps S1201 to S1204.
  • step S1201 historical business data is acquired.
  • the historical business data is the business data in several periods before the current period.
  • the historical business data may be the business data in the previous month or the business data in the previous three months, and the number of previous cycles may be determined as required, which is not limited in this application.
  • the historical data is stored in the underlying database
  • the management node obtains the historical service data by calling the historical service data from the underlying database.
  • the management node sends the calling request information of historical business data to the underlying database, and the underlying database responds to the request by sending the historical business data to the management node.
  • business data has space and time attributes, that is, the business data carries the location and time information of the business data, for example, the business data can include a location service request, and the location service request carries the location and time of the request information.
  • step S1202 the temporal and spatial distribution characteristics of historical business data are determined.
  • random sampling is performed on the acquired historical business data to obtain the sampled historical business data, and then statistics are made on the sampled historical business data in a certain time interval, according to the business
  • the spatial attribute of the data obtains the spatial distribution characteristics of the historical business data in each time interval.
  • the spatial distribution matrix of historical business data where the number of rows and columns of the distribution matrix corresponds to the number of rows and columns of the grid, the matrix value is the statistical value of the number of requests in each grid, and the spatial distribution characteristics of historical business data in each time period represent each Spatial distribution characteristics of historical business data in a time period.
  • Each time period may be called each time slice, and the spatial distribution matrix of the historical service data of each time period is called the spatial distribution matrix of the historical service data of each time slice.
  • the distribution matrix of historical business data of each time slice is first standardized, that is, the distribution matrix of each time slice is normalized, and the historical business data of each time slice after normalization processing distribution matrix.
  • the distribution matrix of the historical service data of the time slice is determined.
  • the average value of the distribution matrices of the historical service data of several time slices belonging to the same time slice is used as the distribution matrix of the historical service data of the time slice. For example, when the time slice at 6-7 o'clock and the time slice at 17-18 o'clock belong to the same time slice, then calculate the distribution matrix of the historical business data of the time slice at 6-7 o'clock and the history of the time slice at 17-18 o'clock The average value of the distribution matrix of business data is used as the distribution matrix of historical business data of this time slice.
  • the aggregated distribution matrix represents the spatiotemporal distribution characteristics of historical data .
  • step S1203 the preset area is divided into M sub-areas based on the spatio-temporal distribution characteristics of the historical service data.
  • the preset area is divided into M sub-areas, so that historical business data is evenly distributed among multiple sub-areas, where M is an integer greater than 1, for example, M can be equal to 3.
  • the adaptive threshold is first determined.
  • the method of the adaptive threshold refer to the description of step 1.3 in the data pre-analysis module above. For the sake of brevity, details are not repeated here.
  • step 1.4 determines the initial division level according to the adaptive division threshold and the location information of the preset area.
  • the spatial grid division of the layer corresponding to the initial division level in the spatial distribution hierarchical calculation model is determined, and the spatial grid division is the initial spatial grid division.
  • the initial spatial grid division is performed on the historical business data, and the amount of data in each initial grid is counted.
  • the preset area is divided into multiple sub-areas.
  • step S4.1 to step S4.3 in the data routing module considering spatial proximity and load.
  • step S4.3 in the data routing module considering spatial proximity and load.
  • step S1204 the corresponding relationship between the M sub-areas and the N computing nodes is determined, and the service data of each sub-area in the M sub-areas is routed to the corresponding computing nodes for processing.
  • M may be equal to N, that is, the number of M sub-areas may be the same as the number of N computing nodes, then multiple computing nodes and multiple sub-areas may have a one-to-one correspondence, that is, one computing node is responsible for one sub-area
  • the processing of the business data that is, the management node distributes the business data of a certain sub-area to the corresponding computing node, that is, the business data is routed to the computing node in charge of the sub-area for processing.
  • M can be smaller than N, that is, the number of M sub-regions is less than the number of N computing nodes, then N computing nodes are divided into M sub-clusters, and M sub-clusters correspond to M sub-regions one-to-one, that is, one sub-cluster is responsible for one sub-region processing of business data.
  • M can be greater than N, that is, the number of M sub-regions is greater than the number of N computing nodes, then M sub-regions are divided into N groups of sub-regions, and N computing nodes correspond to N groups of sub-regions one by one, that is, one computing node is responsible for a Processing of business data in group sub-areas, that is, one computing node can process business data in two or more sub-areas.
  • the above method further includes step S1205, based on the spatio-temporal distribution characteristics of the historical business data of the current cycle and the spatio-temporal distribution characteristics of the historical business data of the previous cycle, determine whether the division of the preset area needs to be updated.
  • step S1205 For the specific method of step S1205, refer to the above description of steps 5.1 to 5.3 in the dynamic update module, and for the sake of brevity, details are not repeated here.
  • the cluster system provided by the embodiment of this application analyzes the distribution law of historical business data based on the two dimensions of time and space, and realizes a unique partition routing scheme that is relatively optimal for a long period of time globally.
  • the application system can adjust the system architecture based on this result, forming a The new architecture of space division management realizes the expansion capability of the system and doubles the performance.
  • the management method of the cluster system provided by the embodiment of the present application, on the one hand, considers the load balancing method of the data space dimension and the time dimension at the same time, and can obtain the global relative optimal in the face of various time intervals and regions with significantly different human activity patterns business division and routing scheme.
  • a long-period relatively stable partition routing scheme can be formed, not limited to load balancing of a single batch of data, avoiding additional computing overhead and frequent architecture adjustments for adjusting partitions; in addition, It solves the problem of space jump based on space-filling curve division, and further enhances the spatial proximity of the same node data.
  • the embodiment of the present application also provides a cluster system management device 1300, which includes various components for implementing the cluster system management method shown in FIG. 12 A unit or means of steps.
  • Figure 1300 is a schematic structural diagram of a cluster system management device provided by an embodiment of the present application. As shown in Figure 1300, the cluster system management device 1300 at least includes:
  • Obtaining module 1301 is used for obtaining historical business data, and described historical business data is the business data in several cycles before current cycle;
  • a division module 1303, configured to divide the preset area into M sub-areas based on the spatio-temporal distribution characteristics of the historical business data, and the historical business data is evenly distributed among the sub-areas;
  • the routing module 1304 is configured to determine the corresponding relationship between the M sub-areas and the N computing nodes, and route the service data of each sub-area in the M sub-areas to the corresponding computing nodes for processing.
  • the determination module 1302 is specifically configured to, based on the spatial distribution characteristics of business data in multiple time periods in historical business data, determine the spatial distribution matrix of business data corresponding to multiple time slices, the Adjacent time periods in multiple time periods are continuous;
  • the temporal and spatial distribution characteristics of the historical service data are determined.
  • the determining the temporal and spatial distribution characteristics of the historical business data based on the spatial distribution matrices of the business data respectively corresponding to the multiple time slices includes:
  • the aggregated distribution matrix of the multiple time slices characterizes the historical business data The spatiotemporal distribution characteristics.
  • the division module 1303 is specifically configured to divide the preset area into multiple sub-areas based on the spatio-temporal distribution characteristics and spatial proximity of the historical business data. The multiple sub-regions are evenly distributed.
  • the preset area is divided into multiple sub-areas based on the spatio-temporal distribution characteristics and spatial proximity relationship of the historical service data, including:
  • each sub-grid in the adaptive spatial grid Based on the adjacent relationship of each sub-grid in the adaptive spatial grid and the amount of data in each sub-grid in the adaptive spatial grid, divide the preset area into M sub-areas, and the M sub-areas The data volume distribution in the region is balanced.
  • the initial spatial grid division of the historical service data based on the adaptive division threshold includes:
  • an initial space grid division is performed on the historical business data.
  • the preset area is divided into M sub-regions, including:
  • each node of the undirected graph is based on each subnet in the adaptive spatial grid
  • the weight of each node is determined based on the amount of data in each sub-grid in the adaptive spatial grid, and the edges connecting the adjacent nodes are determined based on the weight of each sub-grid in the adaptive spatial grid.
  • the adjacent relationship is determined, and the weight of each edge is determined based on the distance between each adjacent sub-grid in the adaptive spatial grid;
  • the preset area is divided into M sub-areas, and the distribution of data volume among the M sub-areas is balanced.
  • the routing module 1304 is specifically configured to determine that the N is greater than or equal to the M, divide the N computing nodes into M subclusters, and the M subclusters are related to the M The sub-areas correspond one-to-one.
  • routing module 1304 is specifically configured to:
  • the N groups of sub-areas are in one-to-one correspondence with the N computing nodes.
  • the device further includes: an updating module 1305, configured to determine the Whether the division of the preset areas mentioned above needs to be updated.
  • the update module 1305 is specifically configured to compare whether the similarity between the temporal and spatial distribution characteristics of the historical business data of the current period and the temporal and spatial distribution characteristics of the historical business data of the previous period is greater than the expected set threshold;
  • the management device 1300 of the cluster system may correspond to the implementation of the method described in the embodiment of the present application, and the above-mentioned and other operations and/or functions of the modules in the management device 1300 of the cluster system are respectively in order to realize FIG. 12
  • the corresponding flow of each method in will not be repeated here.
  • FIG. 14 is a schematic structural diagram of a computing device provided by an embodiment of the present application.
  • the computing device 1400 includes a processor 1401 , a memory 1402 and a communication interface 1403 .
  • the processor 1401, the memory 1402, and the communication interface 1403 are connected through a bus for communication, and communication may also be realized through other means such as wireless transmission.
  • the communication interface 1403 is used to communicate with other communication devices, such as receiving load requests sent by terminals in the management area; the memory 1402 stores executable program codes, and the processor 1401 can call the program codes stored in the memory 1402 to execute The management method of the cluster system in the foregoing method embodiments.
  • the processor 1401 may be a central processing unit CPU, and the processor 1401 may also be other general processors, digital signal processors (digital signal processor, DSP), application specific integrated circuits (application specific integrated circuit (ASIC), field programmable gate array (field programmable gate array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc.
  • DSP digital signal processor
  • ASIC application specific integrated circuit
  • FPGA field programmable gate array
  • a general purpose processor may be a microprocessor or any conventional processor or the like.
  • the memory 1402 may include read-only memory and random-access memory, and provides instructions and data to the processor 1401 .
  • Memory 1402 may also include non-volatile random access memory.
  • the memory 1402 may also store a database, which has historical business data.
  • the memory 1402 can be volatile memory or nonvolatile memory, or can include both volatile and nonvolatile memory.
  • the non-volatile memory can be read-only memory (read-only memory, ROM), programmable read-only memory (programmable ROM, PROM), erasable programmable read-only memory (erasable PROM, EPROM), electrically programmable Erases programmable read-only memory (electrically EPROM, EEPROM) or flash memory.
  • the volatile memory can be random access memory (RAM), which acts as external cache memory.
  • RAM random access memory
  • SRAM static random access memory
  • DRAM dynamic random access memory
  • SDRAM synchronous dynamic random access memory
  • Double data rate synchronous dynamic random access memory double data date SDRAM, DDR SDRAM
  • enhanced SDRAM enhanced synchronous dynamic random access memory
  • SLDRAM synchronous connection dynamic random access memory
  • direct rambus RAM direct rambus RAM
  • the computing device 1400 may correspond to the management device of the cluster system in the embodiment of the present application, and may correspond to the corresponding subject executing the method shown in FIG. 12 according to the embodiment of the present application, and calculate
  • the above-mentioned and other operations and/or functions of each component in the device 1400 are respectively for realizing the corresponding flow of each method in FIG. 12 , and for the sake of brevity, details are not repeated here.
  • An embodiment of the present application provides a computer storage medium, including computer instructions.
  • the computer instructions are executed by a processor, any one of the above methods is implemented.
  • An embodiment of the present application provides a computer program product, which enables any one of the above methods to be implemented when the computer program product runs on a processor.
  • RAM random access memory
  • ROM read-only memory
  • EEPROM electrically programmable ROM
  • EEPROM electrically erasable programmable ROM
  • registers hard disk, removable disk, CD-ROM, or any other Any other known storage medium.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • Computational Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Computing Systems (AREA)
  • Algebra (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

一种集群系统的管理方法,该管理方法应用于管理节点,集群系统包括管理节点和多个计算节点,集群系统负责处理预设区域内的业务数据,该方法包括:获取历史业务数据,历史业务数据为当前周期之前若干周期内的业务数据;确定历史业务数据的时空分布特征;基于历史业务数据的时空分布特征,将预设区域划分为多个子区域,历史业务数据在多个子区域间均衡分布;确定多个子区域与多个计算节点的对应关系,将多个子区域中各个子区域的业务数据路由至对应的计算节点进行处理。本申请提供的集群系统的管理方法,实现了集群系统的空间分治管理,提升了集群系统的可靠性及规模扩充能力。

Description

一种集群系统的管理方法及装置
本申请要求于2021年12月10日提交的申请号为202111508888.6、申请名称为“一种集群系统的管理方法及装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及计算机技术领域,尤其涉及一种集群系统的管理方法及装置。
背景技术
当前分布式大数据系统以Master-Slave模式进行管理,主节点负责任务的分发、数据的收集、资源调度等工作,从节点主要进行数据处理和计算;任一从节点可能管理或存储任何区域的数据。随着系统规模的不断扩展,业务功能越来越复杂,这种方式会面临很多问题和挑战。
例如,图1示出了某电信业务系统的处理架构,从数据采集开始到数据落盘存储在数仓中需要经过采集集群、关联集群、定位集群等多个功能集群对数据做相关处理。由于数据分发策略不统一(如系统前一级按网元分发,后一级按用户分发),导致不同处理集群节点间网状交互,单节点故障会出现级联放大效应,系统可靠性较低。
此外,由于单节点可能处理任何区域的数据,处理过程中需要使用的公共数据(例如地图、指纹库等)需要在单节点做全量缓存或者在集群中广播大配置表。随着系统处理规模的不断扩展,系统处理能力下降,需要降级运行。
发明内容
本申请提供一种集群系统的管理方法及装置,实现了集群系统的空间分治管理,提升了集群系统的可靠性及规模扩充能力。
第一方面,本申请提供了一种集群系统的管理方法,该管理方法应用于管理节点,集群系统包括管理节点和N个计算节点,集群系统负责处理预设区域内的业务数据,N为大于1的整数,该管理方法包括:获取历史业务数据,历史业务数据为当前周期之前若干周期内的业务数据;确定历史业务数据的时空分布特征;基于历史业务数据的时空分布特征,将预设区域划分为M个子区域,历史业务数据在M个子区域间均衡分布;确定M个子区域与多个计算节点的对应关系,将M个子区域中各个子区域的业务数据路由至对应的计算节点进行处理。
第一方面提供的集群系统的管理方法,通过对历史业务数据的时空分布进行分析,得到历史业务数据的时空分布特征,基于该时空分布特征,以数据分布均衡为目标将预设区域划为M个子区域,使多个计算节点中各个计算节点分别负责处理对应子区域的业务数据,实现了集群系统的空间分治管理,提升了集群系统的可靠性及规模扩充能力。
本领域技术人员可以理解的,历史业务数据在M个子区域间分布的差异值小于预设阈 值(例如该预设阈值可以为1mb),即可理解为历史业务数据在M个子区域间均衡分布。
在一个可能的实现中,上述确定历史业务数据的时空分布特征,包括:基于历史业务数据中多个时间段的业务数据的空间分布特征,确定多个时间切片分别对应的业务数据的空间分布矩阵,多个时间段中的相邻时间段之间连续;基于多个时间切片分别对应的业务数据的空间分布矩阵,确定历史业务数据的时空分布特征。
在一个可能的实现中,上述基于多个时间切片分别对应的业务数据的空间分布矩阵,确定历史业务数据的时空分布特征,包括:将多个时间切片中各个时间切片对应的业务数据的空间分布矩阵进行归一化处理,确定归一化后的各个时间切片对应的业务数据的空间分布矩阵;基于归一化后的各个时间切片对应的业务数据的空间分布矩阵之间的相似度,将多个时间切片划分为多种时间切片;将归一化后的多种时间切片对应的业务数据的空间分布矩阵进行聚合处理,确定多种时间切片的聚合分布矩阵,多种时间切片的聚合分布矩阵表征历史业务数据的时空分布特征。
在另一个可能的实现中,上述基于历史业务数据的时空分布特征,将预设区域划分为M个子区域,包括:基于历史业务数据的时空分布特征和空间邻近关系,将预设区域划分为M个子区域,历史业务数据在M个子区域间均衡分布。
在另一个可能的实现中,上述基于历史业务数据的时空分布特征和空间邻近关系,将预设区域划分为M个子区域,包括:确定自适应划分阈值;基于自适应划分阈值对历史业务数据进行初始空间网格划分;基于历史业务数据的时空分布特征,确定初始空间网格中各个子网格内的数据量;遍历初始空间网格中各个子网格,基于初始空间网格中各个子网格内的数据量和自适应划分阈值,对初始空间网格中各个子网格进行剖分或聚合,确定历史业务数据的自适应空间网格划分;基于自适应空间网格中各个子网格的邻近关系和自适应空间网格中各个子网格内的数据量,将预设区域划分为M个子区域,M个子区域内的数据量分布均衡。
在另一个可能的实现中,上述基于自适应划分阈值对所述历史业务数据进行初始空间网格划分,包括:基于自适应划分阈值和预设区域的位置信息,确定初始划分层级;基于初始划分层级和基于空间分布的层级计算模型,对历史业务数据进行初始空间网格划分。
在另一个可能的实现中,上述基于自适应网格中各个子网格的邻近关系和自适应网格中各个子网格内的数据量,将预设区域划分为多个子区域,包括:基于历史业务数据的自适应空间网格划分,确定自适应空间网格对应的无向图,其中,无向图的各个节点基于自适应空间网格中各个子网格确定,各个节点的权重基于自适应空间网格中各个子网格内的数据量确定,连接相邻节点的边基于自适应空间网格中各个子网格的邻近关系确定,各个边的权重基于自适应空间网格中各个相邻子网格的距离确定;基于自适应空间网格对应的无向图,确定无向图的邻接矩阵;基于自适应空间网格对应的无向图和无向图的邻接矩阵,将预设区域划分为M个子区域,M个子区域内的数据量分布均衡。
在另一个可能的实现中,上述确定M个子区域与N个计算节点的对应关系,包括:确定N大于或等于M;将N个计算节点划分为M个子集群,M个子集群与M个子区域一一对应。
在另一个可能的实现中,上述确定M个子区域与N个计算节点的对应关系,包括:确定N小于所述M;统计M个子区域中各个子区域在多个时间段的平均业务数据量;以组间 的平均业务数据量的差异值最小为目标,将M个子区域划分为N组子区域;N组子区域与所述N个计算节点一一对应。
在另一个可能的实现中,上述管理方法还包括:基于当前周期的历史业务数据的时空分布特征和上一周期的历史业务数据的时空分布特征,确定预设区域的划分是否需要更新。
在另一个可能的实现中,上述基于当前周期的历史业务数据的时空分布特征和上一周期的历史业务数据的时空分布特征,确定预设区域的划分是否需要更新,包括:
比较当前周期的历史业务数据的时空分布特征和上一周期的历史业务数据的时空分布特征的相似度是否大于预设阈值;
若是,则确定预设区域的划分需要更新,若否,则确定预设区域的划分不需要更新。
在另一个可能的实现中,上述多个计算节点中各个计算节点缓存上述各个计算节点对应的子区域内的与所述业务相关的数据。
第二方面,本申请提供了一种集群系统的管理装置,所述管理装置应用于集群系统,所述集群系统包括所述管理节点和N个计算节点,所述集群系统负责处理预设区域内的负载请求,所述N为大于1的整数,所述装置包括:
获取模块,用于获取历史业务数据,所述历史业务数据为当前周期之前若干周期内的业务数据;
确定模块,用于确定所述历史业务数据的时空分布特征;
划分模块,用于基于所述历史业务数据的时空分布特征,将所述预设区域划分为M个子区域,所述历史业务数据在所述M个子区域间均衡分布;
路由模块,用于确定所述M个子区域与所述N个计算节点的对应关系,将所述M个子区域中各个子区域的业务数据路由至对应的所述计算节点进行处理。
在一个可能的实现中,所述确定模块具体用于,基于历史业务数据中多个时间段的业务数据的空间分布特征,确定多个时间切片分别对应的业务数据的空间分布矩阵,所述多个时间段中的相邻时间段之间连续;
基于所述多个时间切片分别对应的业务数据的空间分布矩阵,确定所述历史业务数据的时空分布特征。
在另一个可能的实现中,所述基于所述多个时间切片分别对应的业务数据的空间分布矩阵,确定所述历史业务数据的时空分布特征,包括:
将多个时间切片中各个时间切片对应的业务数据的空间分布矩阵进行归一化处理,确定归一化后的所述各个时间切片对应的业务数据的空间分布矩阵;
基于所述归一化后的所述各个时间切片对应的业务数据的空间分布矩阵之间的相似度,将所述多个时间切片划分为多种时间切片;
将归一化后的所述多种时间切片对应的业务数据的空间分布矩阵进行聚合处理,确定多种时间切片的聚合分布矩阵,所述多种时间切片的聚合分布矩阵表征所述历史业务数据的时空分布特征。
在另一个可能的实现中,所述划分模块具体用于,基于所述历史业务数据的时空分布特征和空间邻近关系,将所述预设区域划分为多个子区域,所述历史业务数据在所述M个子区域间均衡分布。
在另一个可能的实现中,所述基于所述历史业务数据的时空分布特征和空间邻近关系, 将所述预设区域划分为M个子区域,包括:
确定自适应划分阈值;
基于所述自适应划分阈值对所述历史业务数据进行初始空间网格划分;
基于所述历史业务数据的时空分布特征,确定所述初始空间网格中各个子网格内的数据量;
遍历所述初始空间网格中各个子网格,基于所述初始空间网格中各个子网格内的数据量和自适应划分阈值,对所述初始空间网格中各个子网格进行剖分或聚合,确定所述历史业务数据的自适应空间网格划分;
基于所述自适应空间网格中各个子网格的邻近关系和所述自适应空间网格中各个子网格内的数据量,将所述预设区域划分为M个子区域,所述多个子区域内的数据量分布均衡。
在另一个可能的实现中,所述基于所述自适应划分阈值对所述历史业务数据进行初始空间网格划分,包括:
基于所述自适应划分阈值和所述预设区域的位置信息,确定初始划分层级;
基于所述初始划分层级和基于空间分布的层级计算模型,对所述历史业务数据进行初始空间网格划分。
在另一个可能的实现中,所述基于所述自适应网格中各个子网格的邻近关系和所述自适应网格中各个子网格内的数据量,将所述预设区域划分为M个子区域,包括:
基于所述历史业务数据的自适应空间网格划分,确定所述自适应空间网格对应的无向图,其中,所述无向图的各个节点基于所述自适应空间网格中各个子网格确定,所述各个节点的权重基于所述自适应空间网格中各个子网格内的数据量确定,连接所述相邻节点的边基于所述自适应空间网格中各个子网格的邻近关系确定,各个所述边的权重基于所述自适应空间网格中各个相邻子网格的距离确定;
基于所述自适应空间网格对应的无向图,确定所述无向图的邻接矩阵;
基于所述自适应网格和所述无向图的邻接矩阵,将所述预设区域划分为M个子区域,所述M个子区域间的数据量分布均衡。
在另一个可能的实现中,所述路由模块具体用于,确定所述N大于或等于所述M;将所述N个计算节点划分为M个子集群,所述M个子集群与所述M个子区域一一对应。
在另一个可能的实现中,所述路由模块具体用于,确定所述N小于所述M;
统计所述M个子区域中各个子区域在多个时间段的平均业务数据量;
以组间的所述平均业务数据量的差异值最小为目标,将所述M个子区域划分为N组子区域;
所述N组子区域与所述N个计算节点一一对应。
在另一个可能的实现中,所述装置还包括:更新模块,用于基于所述当前周期的历史业务数据的时空分布特征和所述上一周期的历史业务数据的时空分布特征,确定所述预设区域的划分是否需要更新。
在另一个可能的实现中,所述更新模块具体用于,比较所述当前周期的历史业务数据的时空分布特征和所述上一周期的历史业务数据的时空分布特征的相似度是否大于预设阈值;
若是,则确定所述预设区域的划分需要更新,若否,则确定所述预设区域的划分不需要更新。
第三方面,本申请提供了一种集群系统,包括管理节点和多个计算节点,所述集群系统负责处理预设区域内的业务数据,所述管理节点包括存储器和处理器,所述存储器中存储有指令,当所述指令被处理器执行时,以实现第一方面所述的方法。
第四方面,本申请提供了一种计算设备,包括存储器和处理器,所述存储器中存储有指令,当所述指令被处理器执行时,以实现第一方面所述的方法。
第五方面,本申请提供了一种计算机存储介质,包括计算机指令,当所述计算机指令在被处理器执行时,使得第一方面所述的方法被实现。
第六方面,本申请提供了一种计算机程序或计算机程序产品,该计算机程序或计算机程序产品包括指令,当该指令执行时,令计算机执行第一方面所述的方法。
附图说明
图1为现有的电信业务系统的处理架构图;
图2为本申请实施例提供的一种集群系统的架构示意图;
图3为应用了本申请实施例提供的集群系统后的电信业务处理架构示意图;
图4为现有的定位系统的架构示意图;
图5为应用了本申请实施例提供的集群系统后的定位系统的架构示意图;
图6为本申请实施例提供的一种集群系统的管理装置的结构示意图;
图7为应用了集群系统的管理装置后的定位系统的定位请求的路由过程示意图;
图8为基于自适应划分阈值和初始空间网格,进行自适应划分得到自适应空间网格的示意图;
图9为自适应空间网格对历史业务数据进行划分的示意图;
图10为基于邻近关系的无向图;
图11为路由结果示意图;
图12为本申请实施例提供的一种集群系统的管理方法的流程示意图;
图13为本申请实施例提供的一种集群系统的管理装置的结构示意图;
图14为本申请实施例提供的一种计算设备的结构示意图。
具体实施方式
下面通过附图和实施例,对本申请的技术方案做进一步的详细描述。
下面通过附图和实施例,对本申请的技术方案做进一步的详细描述。
图2示出了本申请实施例提供的一种集群系统的架构图。如图2所示,该集群系统100包括管理节点10和多个计算节点(例如图2示出的计算节点21、计算节点22和计算节点23),其中,管理节点10与多个计算节点通信连接,其通信连接的方式为无线通信连接或有线通信连接;管理节点10主要用于负则任务的分发、数据的收集和资源调度等,计算节点21、计算节点22和计算节点23用于进行业务数据的处理和计算。也就是说,集群系统可以为分布式系统。
集群系统100负责处理预设区域内的业务数据,将预设区域划分为多个子区域,例如, 将预设区域划分为多个子区域,多个子区域至少包括图2中示出的区域1、区域2和区域3。管理节点10接收预设区域内的业务数据,然后将各个子区域内的业务数据分发给对应计算节点进行处理,例如,将区域1内的业务数据分发给计算节点21进行处理,将区域2内的业务数据分发给计算节点22进行处理,将区域3内的业务数据分发给计算节点23进行处理。容易理解的是,图2中的计算节点23后的省略号表示除示出的计算节点21、计算节点22和计算节点23外,多个计算节点还可以包括其他的一个或多个计算节点,预设区域内的省略号表示除示出的区域1、区域2和区域3之外,多个子区域还可以包括一个或多个子区域。
需要解释的是,图2中示出的多个计算节点的个数和预设区域划分的多个子区域个数仅为一种示例,并不形成对本申请实施例的限定,多个计算节点如何分组和预设区域如何划分可根据实际情况确定。例如,多个计算节点的数量可以小于多个子区域的数量,则一个计算节点可以负责处理多个子区域内的业务数据;多个计算节点的数量可以等于多个子区域的数量,则计算节点与子区域为一一对应的关系,即一个计算节点负责处理一个子区域内的业务数据;多个计算节点的数据可以大于多个子区域的数据,则将多个计算节点划分为多个子集群,一个子集群负责处理一个子区域内的业务数据,即可以由多个计算节点共同负责处理一个子区域内的业务数据。
本领域技术人员应当知道的是,在集群系统中,例如分布式系统中,包括多个计算节点,计算节点的数量可以根据分布式系统对计算资源的需求进行部署,本申请实施例对此不进行特殊限定。在集群统中,计算节点通常采用集群式部署,即若干个计算节点形成一个集群节点。可以在其中一个计算节点或集群节点上部署集群管理程序作为管理节点10,其他集群节点作为上述多个集群节点。例如,在分布式系统中,可以在主节点上部署群管理程序,将主节点作为管理节点,其他从节点作为多个集群节点。一个集群节点可以仅包括一个计算节点或也可以包括多个计算节点。计算节点包括但不限于个人电脑、工作站、服务器、其他类型的物理机和部署在物理机上的虚拟机等。
本申请实施例提供的集群系统,管理节点通过对管理区域内的历史业务数据在空间维度和时间维度上的分析,确定历史业务数据的时空分布特征,根据历史业务数据的时空分布特征,对数据进行划分,使得空间邻近的数据汇聚在一台计算节点上,同时保证各计算节点的数据量维持均衡;此外,对集群进行划分,形成多个子集群,子集群内可能包括一个或多个节点,每个子集群管理或处理某一地理区域的数据或请求,并保证子集群间负载相对均衡,形成空间分治联邦式管理策略。
图3为应用了本申请实施例提供的集群系统后的电信业务处理架构。如图3所示,该处理架构,通过对系统管理范围内的历史数据的分析,基于数据的时空分布规律和潮汐特点,将管理与区域划分为多个分区(例如区域1和区域2),并基于该分区将集群系统中的多个计算节点划分为多个子集群,保证不同子集群间在不同时间段内的负载均能相对均衡。其中,每个子集群管理相应地理区域的数据和计算任务,子集群间管理范围不重叠;这样不同子集群间互不影响,可提升系统的横向扩展能力,降低故障的级联放大效应。
此外,由于每个子集群只管理一部分区域,可将地图、指纹库等公共数据也进行分区,每个子集群缓存属于其管理区域的公共数据,这样可以大大提升系统的处理能力,以支持更大规模的系统交付。
本申请实施例提供的集群系统还可应用于定位场景。下面介绍本申请实施例提供的集群系统应用于定位场景的方案。
一般的定位服务的实现为,基于用户发送的定位请求进行WiFi定位或指纹定位,底层实现为根据用户提交的WiFi ID或基站小区ID列表查询指纹数据库。
图4示出了现有的定位系统的架构,用户提交的定位服务请求经过弹性负载均衡模块发送给定位服务器,弹性负载均衡模块基于当前集群定位服务器的资源开销情况,将任务发送给资源利用率较低的服务器执行,属于随机路由。这种路由方式导致定位服务器的缓存命中率极低,因此用户的定位服务请求会极大概率转换为底层数据库的查询请求。该定位系统并发度极高,当前并发度达到10万次每秒,对底层数据库访问造成极大压力,出现访问瓶颈。
图5示出了应用了本申请实施例提供的集群系统后的定位系统的架构,基于定位请求的历史时空分布规律,将管理区域划分为多个分区,并保证每个分区不同时间段的定位服务请求负载相对均衡。基于该分区对集群系统中的多个计算节点做划分,形成多个子集群。
每个子集群仅处理某一空间区域中提交的定位服务请求,每个子集群可包括一台或多台定位服务器,每台定位服务器配有内存数据库;每个子集群处理的空间区域不重叠。
通过本申请实施例提供的集群系统将原本基于资源开销的随机路由更改为基于空间区域的自适应路由,并在每个子集群的内存数据库中缓存该区域的热点请求数据,大幅提高缓存命中率,降低底层数据库的访问压力。因此可以使系统在不增加集群规模的前提下,处理能力大幅提高。
本申请实施例提供的集群系统的管理节点的空间分治管理功能是通过部署在该节点的集群系统的管理装置实现。在一个示例中,该集群系统的管理装置可以通过存储器中的程序代码实现。例如,将该程序以程序包的方式部署在系统集群主节点中,用于数据分区、分发及路由。
图6示出了一种集群系统的管理装置的结构示意图。如图6所示,集群系统的管理装置可以包括五个模块,分别是数据预分析模块、时间切片聚合模块、自适应划分模块、顾及空间邻近性和负载的数据路由模块和动态更新模块。
其中,数据预分析模块用于对历史业务数据进行预分析操作,分析历史业务数据在空间维度及时间维度的分布模式,识别历史业务数据时空分布具有显著差异的时间切片,分类时间切片。根据数据的空间分布特征计算空间自适应划分阈值及初始划分层级。
时间切片聚合模块用于构建分布矩阵,将不同时间切片对应的分布矩阵进行聚合,得到顾及各时间段数据特征的最优平均分布结果。
自适应划分模块用于基于时间切片聚合模块输出的平均分布矩阵,将系统管理的空间区域划分为不同大小的网格,网格的大小由网格内数据的数量和密集程度决定,执行自适应网格划分操作,得到自适应划分结果集。
顾及空间邻近性和负载的数据路由模块用于计算数据或服务的空间路由策略,将空间邻近的数据或服务分发到同一个服务器节点,并保证各节点间负载均衡。
动态更新模块为可选模块。由于数据的时空分布特征随着时间推移会出现变化,可能导致原来的路由或系统划分策略的负载均衡效果降低。该模块可基于用户设定的更新周期,定期审视路由或划分策略是否需要更新。
本申请实施例提供的一种集群系统的管理装置的空间分治管理功能的实现方法为:输入为历史数据或系统实时采集的时空数据,经过数据预分析模块、自适应划分模块、划分聚合模块、顾及空间临近性和负载的数据路由模块、动态更新模块的处理,输出分区策略或路由结果。进而系统根据该结果对数据进行分区存储或对系统进行空间分治切分管理;该实现方法具体步骤如下:
对管理区域内的时空数据进行预分析,识别数据潮汐异质性时间切片,并计算自适应划分阈值和初始划分层级,根据计算结果对不同时间切片进行自适应网格划分,得到自适应划分结果集。
然后根据空间临近性和负载进行数据路由,生成数据或服务的路由策略。
此外,还可以根据用户设定的更新周期,基于时间衰减模型判断是否需要动态调整路由策略。
容易理解的是,本文提及的业务数据可以为多种业务的数据,例如服务类业务的数据,包括但不限于定位服务请求和搜索服务请求等;存储类业务的数据,包括但不限于电信业务数据等。本文提到的历史业务数据的含义为当前周期之前若干周期的业务数据,例如,历史业务数据为之前一个周期的业务数据,或者之前两个周期的业务数据。
下面以定位场景为例,详细介绍本申请实施例提供的集群系统的管理装置的空间分治管理功能的实现方法。
本实施例以云上某定位云服务为例,对本申请实施例提供的集群系统的管理装置的空间分治管理功能的实现方法进行详细描述。该定位云服务部署于公有云环境。在用户授权前提下,调用终端设备上需要启用定位功能的软件时,可基于该系统为用户提供室外或室内的定位服务。实现流程为:云服务器首先接收用户发送的WiFi定位或指纹定位请求,通过负载均衡模块以随机路由的方式发送到资源开销相对较小的定位服务器。然后优先查询该定位服务器缓存,若命中返回查询结果给定位算法模块进行位置计算,若未命中则查询在线指纹数据库,并将查询结果返回定位算法模块进行位置计算。本申请实施例提供的集群系统的管理装置的在该实施例中位于负载均衡模块,将负载均衡方式由随机路由转变为按区域路由并保证服务请求负载均衡,用来解决当前定位服务器缓存命中率低,在线数据库访问瓶颈问题。
如图7所示,用户定位服务请求的历史数据作为输入经过本申请实施例提供的集群系统的管理装置的模块1-5的处理生成用户定位请求的区域路由方案。假设整个系统具有n个定位服务器,则生成的路由策略将系统的管理地域分割为n个区域。每个定位服务器负责一个区域的定位服务请求的处理,系统主节点将判断用户请求属于哪一个空间区域,并将该用户的定位服务请求发送给对应的定位服务器做处理。定位服务器中部署有内存数据库。基于历史用户定位服务请求的情况,将热点数据存储在内存数据库中作为定位服务缓存。在定位服务器做定位处理过程中,优先查询本地内存数据库,若获得查询结果则可以直接返回,不需要查询底层在线数据库。由于每个定位服务器中只缓存属于其管理区域的热点数据,因此当用户定位服务请求按照区域进行路由后,可以大幅提高缓存命中率,大幅降低底层数据库的访问压力。同时内存数据库的查询速度远大于基于磁盘的在线数据库,基于这套方案在系统规模不变的情况下,系统容量和处理能力大幅提高,可以处理更高并行度的用户请求。
下面详细介绍本申请实施例提供的集群系统的管理装置中各模块功能的实现方法。
数据预分析模块的输入为业务数据在一段历史时间的统计,例如一周,一个月,包含业务数据发送的时间和位置信息。
在一个示例中,实时发生的业务数据将不断对历史数据进行更新。
在数据预分析模块中执行如下步骤:步骤1.1,对历史业务数据进行抽样后按一定时间区间统计数据的空间分布特征。例如以1小时为粒度进行统计,在本实施例中首先对空间进行均匀网格划分,统计对应时间段包含在每个网格中的数据数量,并基于该统计值构建分布矩阵。分布矩阵行列数对应划分网格的行列数,矩阵值为每个网格中的请求数统计值。当然,在其他实施例中也可以以不同时间粒度或不同的空间划分方式生成分布矩阵,本申请对此不进行限定。
步骤1.2,识别潮汐异质性时间切片。不同时间段的分布矩阵表示对应时间区间内数据的空间分布特征。将所有时间段的分布矩阵做标准化,然后可通过多种矩阵相似度计算方法评价相邻时间段空间分布特征的相似度。例如,可通过jaccard相关系数、余弦相似度、皮尔森相关系数和欧几里得距离等方法计算相邻时间段空间分布特征的相似度。若两时间段的分布矩阵余弦相似度大于差异阈值,则认为其在时间维度存在潮汐异质性,确定其为不同的时间切片;若小于差异阈值,则认为不存在潮汐异质性,将这两个时间段属于同一时间切片内。计算属于同一时间切片内分布矩阵平均值作为该时间切片的分布矩阵。
步骤1.3,自适应划分阈值计算。自适应划分阈值可以通过多种方法确定,例如,可以基于标准化后网格内数据占总数据量的百分比确定,也可以基于最终生成的总网格数量来确定,也可以是通过人为经验设置,或通过控制变量进行多轮实验,找到负载均衡结果和算法效率收益最大值。对于要进行数据存储负载均衡的情况,本实施例给出一种可行的阈值计算方式。此时自适应划分阈值需要结合分布式数据库系统的底层存储模型特征确定,设定一个网格中的数据存储量不超过系统一次查询I/O调度的最小存储单元大小。因此可以得到T obj=Size block/S obj。Size block表示分布式数据库系统一次I/O调度的最小存储单元大小,S obj表示每条点数据序列化后占用的存储空间,T obj表示自适应划分阈值。
步骤1.4,初始划分层级计算。根据步骤1.3的自适应划分阈值,基于空间分布的层级计算模型,根据该模型计算得到全局的初始划分层级。定义IBox为数据集的外包矩形即系统管理区域的位置大小,用(x min,y min,x max,y max)表示外包矩形的经纬度边界。设数据集的总点数为C,由上述模型计算出来的点数阈值为T obj,待求初始划分层级为n。
考虑将数据空间上均匀分布时对应的层级作为初始划分层级,即:f(n)=C/T obj,其中,f(n)代表第n层级空间划分与IBox相交的网格数量,即f(n)=IBoxINum grid(n);
在全球均匀划分网格基础上,第n层划分的空间网格,其长度L和宽度W分别为:
L=[180-(-180)]/2 n=360/2 n,W=[90-(-90)]/2 n=180/2 n
则第n层级空间划分与IBox相交的网络数量f(n)为:
f(n)=[(x max-x min)/L]*[(y max-y min)/W]
由此可得:C/T obj=[2 n(x max-x min)/360]*[2 n(y max-y min)/180]
根据上式可计算得到初始划分层级n,将计算得到的初始划分层级n作为自适应网络划分的初始层级。
在分布矩阵聚合模块中,将不同时间分片对应的分布矩阵进行聚合,形成对整个数据 集时间跨度内数据分布的平均表征,以保证对各时间区间的数据负载均衡都能达到相对较好的效果。为了得到理论上最优的聚合模型,使用准确率来表达聚合矩阵C和各时间切片的分布矩阵之间的差异度之和。准确率越高,证明聚合矩阵C相对最优。设有n个时间切片,每个时间切片对应的分布矩阵用A n来表示。w n表示第n个时间切片数据的权重,则准确率的表达式如下:
Figure PCTCN2022138000-appb-000001
其中,nor(A)代表对矩阵进行标准化,其计算公式如下:
nor(A)=A/∑ ijA i,j
为了在理论上求得准确率最高的初始分布矩阵C,求accuracy的极大值。经过数学推导可得出当且仅当C满足下式时,准确率取得极大值,即准确率取极大值时,聚合矩阵nor(C)为:
Figure PCTCN2022138000-appb-000002
在自适应划分模块中执行如下步骤:
步骤3.1,根据初始划分层级对数据进行初始空间网格划分,统计每个初始空间网格中的数据量;
步骤3.2,遍历每一个初始空间网格,根据数据预分析模块中得出的自适应划分阈值,判断当前网格是否需要向下一个层级剖分或向上一个层级聚合。直到网格中的数据量满足自适应划分阈值条件或网格数据超过一定数量时停止。最终将空间划分为不同尺度的网格区域,即得到自适应空间网格划分(参见图8和图9)。
在顾及空间邻近性和负载的数据路由模块中执行如下步骤:
步骤4.1,基于邻近关系的无向图构建。将自适应划分模块中确定的多尺度自适应网格划分结果抽象为无向图结构,以每个网格做节点,节点权重为网格中的等效数据量,以网格与网格间的邻近关系做边,边权重对应网格间的距离(参见图10)。
步骤4.2,根据相邻关系构建无向图的邻接矩阵。
步骤4.3,基于图划分方法,以划分后每组数据量均衡为目标对自适应网格进行分组,以实现数据路由。组数基于用户需求而定,可以设置为当前集群中的子集群个数(每个子集群负责一定空间区域的计算任务),也可以小于子集群个数(多个子集群共同负责一定空间区域的计算任务)。最终输出每个自适应网格对应的节点ID编号,一种节点ID对应一个集群节点。
例如,在图11中,基于邻近关系和数据量均衡将自适应网格划分为7个子区域,多个子区域上标注对应集群节点的ID,如图11中的1101-1107,以便于管理节点收到业务数据后,将子区域的业务数据路由至对应的集群节点进行处理,例如,管理节点接收的业务数据为区域1中发出的,区域1中标注的ID为1106,则将该业务数据路由至ID为1106的集群节点进行处理。
可选的,还包括步骤4.4,该步骤仅用于需要使用二阶段聚合进一步提高数据/服务分配均衡度的场景。在一些场景下对集群各节点管理的空间区域没有很强的聚集性要求,即并不要求一个计算机节点只能管理一个子区域,可以管理多个不相交的子区域。在这种场景下步骤4.3生成的划分组数大于计算节点个数,再通过二阶段聚合方法对步骤4.3的划 分进行二次聚合。二阶段聚合利用集合划分算法在步骤4.3结果中进行分组聚合寻优,以保证聚合后的组间数据/服务量差异最小。在一种可能的实施例中,可以统计步骤4.3生成分组内各时间切片(该时间切片的粒度可以根据需要确定切片粒度大小,例如可以为一天为一个时间切片粒度或者以一个小时为一个时间切片粒度)的平均数据/服务量,将各分组的平均数据量统计值构建为集合,基于Karmarkar Karp算法对该集合进行分组。例如集群系统有3个计算节点,在步骤4.3中通过图划分算法生成9个划分子区域,统计这9个子区域在不同时间切片内包含的数据量,并计算平均值,进而构建集合{N1,…,N9}。利用Karmarkar Karp算法将集合{N1,…,N9}划分为3组,组间数据量差异最小化,输出结果为{{N1,N4},{N2,N5,N6,N9},{N3,N7,N8}},则分区1、4的数据/服务发送到计算节点21处理,分区2、5、6、9的数据/服务发送到计算节点22处理,分区3、7、8的数据/服务发送到计算节点23处理。
在动态更新模块中根据用户设定的更新周期或默认的更新周期,可定期对负载均衡路由结果进行校验,判断是否需要更新路由策略。示例性的,在动态更新模块中执行如下步骤:
步骤5.1,假设用户设置更新周期为一个月。基于上个月的历史数据重新计算新的聚合分布矩阵C new
步骤5.2,计算相似性。设原来聚合分布矩阵为C old,将C old、C new两个分布矩阵标准化后,取其Frobenius范数来计算其相似性。
步骤5.3,阈值判断。若相似性超过设定阈值ε,说明数据分布已经发生了显著改变,需要按照新数据的分布重新计算自适应划分和路由结果分区,此时更新路由策略;若相似性没有超过阈值ε,则继续沿用之前的策略。阈值ε的设定由实际应用需求而定。
图12为本申请实施例提供的集群系统的管理方法的流程图。该方法可应用于图2所示的集群系统中的管理节点10,或图6所示的集群系统的管理装置以实现集群系统的空间分治管理。如图12所示,该方法至少包括步骤S1201至步骤S1204。
在步骤S1201中,获取历史业务数据。
其中,历史业务数据为当前周期之前若干周期内的业务数据。例如,当周期为一个月时,历史业务数据可以为前一个月内的业务数据或前三个月内的业务数据,可根据需要确定之前周期的数量,本申请不做限定。
在一个示例中,历史数据存储在底层数据库中,管理节点通过从底层数据库中调取历史业务数据的方式获取历史业务数据。例如,管理节点向底层数据库发送历史业务数据的调用请求信息,底层数据库响应该请求,向管理节点发送历史业务数据。
容易理解的是,业务数据均具有空间和时间属性,即业务数据中携带有该业务数据的位置和时间信息,例如业务数据可以包括定位服务请求,定位服务请求中携带该请求发出的位置和时间信息。
在步骤S1202中,确定历史业务数据的时空分布特征。
示例性的,为了降低管理节点的数据处理量,对获取的历史业务数据进行随机抽样,得到抽样后的历史业务数据,然后对抽样后的得到的历史业务数据以一定时间区间进行统计,根据业务数据的空间属性得到各个时间区间的历史业务数据的空间分布特征。例如,以1小时为粒度进行统计,首先对集群系统管理的区域的空间进行均匀网格划分,统计对 应时间段包含在每个网格中的数据数量,并基于该统计值构建各时间区间的历史业务数据的空间分布矩阵,其中,分布矩阵行列数对应划分网格的行列数,矩阵值为每个网格中的请求数统计值,各个时间段的历史业务数据的空间分布特征即表征各个时间段的历史业务数据的空间分布特征。
各个时间段可称之为各个时间切片,则各个时间段的历史业务数据的空间分布矩阵称之为各个时间切片的历史业务数据的空间分布矩阵。
识别时间切片的异质性。具体的,为了便于比较,先将各个时间切片的历史业务数据的分布矩阵做标准化处理,即对各个时间切片的分布矩阵进行归一化处理,归一化处理后的各个时间切片的历史业务数据的分布矩阵。再对归一化处理后的各个时间切片的历史业务数据的分布矩阵分别进行相似度计算(例如余弦相似度),当两个时间切片的历史业务数据的分布矩阵的余弦相似度大于预设阈值,则该两个时间切片属于不同种的时间切片,当两个时间切片的历史业务数据的分布矩阵的余弦相似度小于或等于预设阈值,则两个时间切片属于同一种时间切片。
基于属于同一种时间切片的历史业务数据的分布矩阵,确定该种时间切片的历史业务数据的分布矩阵。在一个示例中,将属于同一种时间切片的若干时间切片的历史业务数据的分布矩阵的平均值作为该种时间切片的历史业务数据的分布矩阵。例如,当6-7点的时间切片和17-18点的时间切片属于同一种时间切片,则计算6-7点的时间切片的历史业务数据的分布矩阵和17-18点的时间切片的历史业务数据的分布矩阵的平均值,作为该种时间切片的历史业务数据的分布矩阵。
将具有异质性的时间切片的历史业务数据的分布矩阵,即多种时间切片的历史业务数据的分布矩阵进行聚合,得到聚合分布矩阵,该聚合后的分布矩阵即表征历史数据的时空分布特征。
多种时间切片的历史业务数据的分布矩阵的聚合方法可参见上文中分布矩阵聚合模块中的nor(C)的确定方法,为了简洁,这里不再赘述。
在步骤S1203中,基于历史业务数据的时空分布特征,将预设区域划分为M个子区域。
基于历史业务数据的时空分布特征和空间邻近关系,将预设区域划分为M个子区域,使历史业务数据在多个子区域间均衡分布,其中M为大于1的整数,例如M可以等于3。
具体的,先确定自适应划分阈值,自适应阈值的方法可参见上文数据预分析模块中步骤1.3的描述,为了简洁,这里不再赘述。
然后根据自适应划分阈值和预设区域的位置信息确定初始划分层级,初始划分层级的确定方法参见上文数据预分析模块中步骤1.4的描述,为了简洁,这里不再赘述。
根据初始划分层级和基于空间分布的层级计算模型,确定在空间分布的层级计算模型中初始划分层级对应层的空间网格划分,该空间网格划分即为初始空间网格划分。
根据初始空间网格划分对历史业务数据进行初始空间网格划分,统计每个初始网格中的数据量。
遍历每一个初始网格,根据数据预分析模块中得出的自适应划分阈值,判断当前网格是否需要向下一个层级剖分或向上一个层级聚合。直到网格中的数据量满足自适应划分阈值条件或网格数据超过一定数量时停止。最终将空间划分为不同尺度的网格区域,得到自适应空间网格划分。
基于自适应空间网格中各个子网格的邻近关系和自适应空间网格中各个子网格内的数据量,将预设区域划分为多个子区域。其具体划分方法参见上文对顾及空间邻近性和负载的数据路由模块中步骤S4.1至步骤S4.3的描述,为了简洁,这里不再赘述。
在步骤S1204中,确定M个子区域与N个计算节点的对应关系,将M个子区域中各个子区域的业务数据路由至对应的所述计算节点进行处理。
示例性的,M可以等于N,即M个子区域的数量与N个计算节点的数量可以相同,则多个计算节点和多个子区域可以为一一对应的关系,即一个计算节点负责一个子区域的业务数据的处理,即管理节点将某一子区域的业务数据分发给对应的计算节点,即将该业务数据路由至负责该子区域的计算节点进行处理。
M可以小于N,即M个子区域的数量小于N个计算节点的数量,则将N个计算节点划分为M个子集群,M个子集群与M个子区域一一对应,即一个子集群负责一个子区域的业务数据的处理。
M可以大于N,即M个子区域的数量大于N个计算节点的数量,则将M个子区域划分为N组子区域,N个计算节点与N组子区域一一对应,即一个计算节点负责一组子区域的业务数据的处理,也就是说,一个计算节点可以处理两个或两个以上的子区域的业务数据。
将M个子区域划分为N组子区域的具体方法可以参加上文中对步骤4.4的描述,为了简洁,这里不再赘述。
在另一个示例中,上述方法还包括步骤S1205,基于当前周期的历史业务数据的时空分布特征和上一周期的历史业务数据的时空分布特征,确定预设区域的划分是否需要更新。
步骤S1205的具体方法可参见上文对动态更新模块中的步骤5.1至步骤5.3的描述,为了简洁,这里不再赘述。
本申请实施例提供的集群系统,基于时间和空间两个维度分析历史业务数据的分布规律,实现长周期全局相对最优的唯一分区路由方案,应用系统可以基于该结果做系统架构调整,形成按空间分治管理的新架构,实现系统的扩展能力提升,性能倍增。
本申请实施例提供的集群系统的管理方法,一方面,同时顾及数据空间维度和时间维度的负载均衡方法,面对人的活动规律显著不同的各时间区间和地域,均能得到全局相对最优的业务划分和路由方案。输出按空间地域分治联邦式管理的最优策略,实现将复杂系统按地理区域切分为多个独立管理单元,并保证各管理单元的存储/计算负载相对均衡,解决了时变城市流分布式负载均衡问题;另一方面,可基于用户需求形成较长周期相对稳定的划分路由方案,而不局限于单批数据的负载均衡,避免调整分区的额外计算开销和频繁的架构调整;另外还解决了基于空间填充曲线划分的空间跳变问题,进一步增强了同一节点数据的空间临近性。
与前述方法实施例基于相同的构思,本申请实施例中还提供了一种集群系统的管理装置1300,该集群系统的管理装置1300包括用以实现图12所示的集群系统的管理方法的各个步骤的单元或手段。
图1300为本申请实施例提供的一种集群系统的管理装置的结构示意图。如图1300所示,该一种集群系统的管理装置1300至少包括:
获取模块1301,用于获取历史业务数据,所述历史业务数据为当前周期之前若干周期 内的业务数据;
确定模块1302,用于确定所述历史业务数据的时空分布特征;
划分模块1303,用于基于所述历史业务数据的时空分布特征,将所述预设区域划分为M个子区域,所述历史业务数据在所述,个子区域间均衡分布;
路由模块1304,用于确定所述M个子区域与所述N个计算节点的对应关系,将所述M个子区域中各个子区域的业务数据路由至对应的所述计算节点进行处理。
在一个可能的实现中,所述确定模块1302具体用于,基于历史业务数据中多个时间段的业务数据的空间分布特征,确定多个时间切片分别对应的业务数据的空间分布矩阵,所述多个时间段中的相邻时间段之间连续;
基于所述多个时间切片分别对应的业务数据的空间分布矩阵,确定所述历史业务数据的时空分布特征。
在另一个可能的实现中,所述基于所述多个时间切片分别对应的业务数据的空间分布矩阵,确定所述历史业务数据的时空分布特征,包括:
将多个时间切片中各个时间切片对应的业务数据的空间分布矩阵进行归一化处理,确定归一化后的所述各个时间切片对应的业务数据的空间分布矩阵;
基于所述归一化后的所述各个时间切片对应的业务数据的空间分布矩阵之间的相似度,将所述多个时间切片划分为多种时间切片;
将归一化后的所述多种时间切片对应的业务数据的空间分布矩阵进行聚合处理,确定多种时间切片的聚合分布矩阵,所述多种时间切片的聚合分布矩阵表征所述历史业务数据的时空分布特征。
在另一个可能的实现中,所述划分模块1303具体用于,基于所述历史业务数据的时空分布特征和空间邻近关系,将所述预设区域划分为多个子区域,所述历史业务数据在所述多个子区域间均衡分布。
在另一个可能的实现中,所述基于所述历史业务数据的时空分布特征和空间邻近关系,将所述预设区域划分为多个子区域,包括:
确定自适应划分阈值;
基于所述自适应划分阈值对所述历史业务数据进行初始空间网格划分;
基于所述历史业务数据的时空分布特征,确定所述初始空间网格中各个子网格内的数据量;
遍历所述初始空间网格中各个子网格,基于所述初始空间网格中各个子网格内的数据量和自适应划分阈值,对所述初始空间网格中各个子网格进行剖分或聚合,确定所述历史业务数据的自适应空间网格划分;
基于所述自适应空间网格中各个子网格的邻近关系和所述自适应空间网格中各个子网格内的数据量,将所述预设区域划分为M个子区域,所述M个子区域内的数据量分布均衡。
在另一个可能的实现中,所述基于所述自适应划分阈值对所述历史业务数据进行初始空间网格划分,包括:
基于所述自适应划分阈值和所述预设区域的位置信息,确定初始划分层级;
基于所述初始划分层级和基于空间分布的层级计算模型,对所述历史业务数据进行初 始空间网格划分。
在另一个可能的实现中,所述基于所述自适应网格中各个子网格的邻近关系和所述自适应网格中各个子网格内的数据量,将所述预设区域划分为M个子区域,包括:
基于所述历史业务数据的自适应空间网格划分,确定所述自适应空间网格对应的无向图,其中,所述无向图的各个节点基于所述自适应空间网格中各个子网格确定,所述各个节点的权重基于所述自适应空间网格中各个子网格内的数据量确定,连接所述相邻节点的边基于所述自适应空间网格中各个子网格的邻近关系确定,各个所述边的权重基于所述自适应空间网格中各个相邻子网格的距离确定;
基于所述自适应空间网格对应的无向图,确定所述无向图的邻接矩阵;
基于所述自适应网格和所述无向图的邻接矩阵,将所述预设区域划分为M个子区域,所述M个子区域间的数据量分布均衡。
在另一个可能的实现中,所述路由模块1304具体用于,确定所述N大于或等于所述M,将所述N个计算节点划分为M个子集群,所述M个子集群与所述M个子区域一一对应。
在另一个可能的实现中,所述路由模块1304具体用于,
确定所述N小于所述M;
统计所述M个子区域中各个子区域在多个时间段的平均业务数据量;
以组间的所述平均业务数据量的差异值最小为目标,将所述M个子区域划分为N组子区域;
所述N组子区域与所述N个计算节点一一对应。
在另一个可能的实现中,所述装置还包括:更新模块1305,用于基于所述当前周期的历史业务数据的时空分布特征和所述上一周期的历史业务数据的时空分布特征,确定所述预设区域的划分是否需要更新。
在另一个可能的实现中,所述更新模块1305具体用于,比较所述当前周期的历史业务数据的时空分布特征和所述上一周期的历史业务数据的时空分布特征的相似度是否大于预设阈值;
若是,则确定所述预设区域的划分需要更新,若否,则确定所述预设区域的划分不需要更新。
根据本申请实施例的集群系统的管理装置1300可对应于执行本申请实施例中描述的方法,并且集群系统的管理装置1300中的各个模块的上述和其它操作和/或功能分别为了实现图12中的各个方法的相应流程,为了简洁,在此不再赘述。
图14为本申请实施例提供的一种计算设备的结构示意图。
如图14所示,所述计算设备1400包括处理器1401、存储器1402和通信接口1403。其中,处理器1401、存储器1402和通信接口1403通过总线通信连接,也可以通过无线传输等其他手段实现通信。该通信接口1403用于与其他通信设备进行通信连接,例如接收管理区域内的终端发送的负载请求等;该存储器1402存储可执行程序代码,且处理器1401可以调用存储器1402中存储的程序代码执行前述方法实施例中的集群系统的管理方法。
应理解,在本申请实施例中,该处理器1401可以是中央处理单元CPU,该处理器 1401还可以是其他通用处理器、数字信号处理器(digital signal processor,DSP)、专用集成电路(application specific integrated circuit,ASIC)、现场可编程门阵列(field programmable gate array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者是任何常规的处理器等。
该存储器1402可以包括只读存储器和随机存取存储器,并向处理器1401提供指令和数据。存储器1402还可以包括非易失性随机存取存储器。例如,存储器1402还可以存储数据库,该数据库中具有历史业务数据。
该存储器1402可以是易失性存储器或非易失性存储器,或可包括易失性和非易失性存储器两者。其中,非易失性存储器可以是只读存储器(read-only memory,ROM)、可编程只读存储器(programmable ROM,PROM)、可擦除可编程只读存储器(erasable PROM,EPROM)、电可擦除可编程只读存储器(electrically EPROM,EEPROM)或闪存。易失性存储器可以是随机存取存储器(random access memory,RAM),其用作外部高速缓存。通过示例性但不是限制性说明,许多形式的RAM可用,例如静态随机存取存储器(static RAM,SRAM)、动态随机存取存储器(DRAM)、同步动态随机存取存储器(synchronous DRAM,SDRAM)、双倍数据速率同步动态随机存取存储器(double data date SDRAM,DDR SDRAM)、增强型同步动态随机存取存储器(enhanced SDRAM,ESDRAM)、同步连接动态随机存取存储器(synchlink DRAM,SLDRAM)和直接内存总线随机存取存储器(direct rambus RAM,DR RAM)。
应理解,根据本申请实施例的计算设备1400可对应于本申请实施例中的集群系统的管理装置,并可以对应于执行根据本申请实施例中图12所示方法中的相应主体,并且计算设备1400中的各个器件的上述和其它操作和/或功能分别为了实现图12的各个方法的相应流程,为了简洁,在此不再赘述。
本申请实施例提供了一种计算机存储介质,包括计算机指令,当计算机指令在被处理器执行时,使得上述任一项方法被实现。
本申请实施例提供了一种计算机程序产品,当计算机程序产品在处理器上运行时,使得上述任一项方法被实现。
本领域普通技术人员应该还可以进一步意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、计算机软件或者二者的结合来实现,为了清楚地说明硬件和软件的可互换性,在上述说明中已经按照功能一般性地描述了各示例的组成及步骤。这些功能究竟以硬件还是软件方式来执轨道,取决于技术方案的特定应用和设计约束条件。本领域普通技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。
结合本文中所公开的实施例描述的方法或算法的步骤可以用硬件、处理器执轨道的软件模块,或者二者的结合来实施。软件模块可以置于随机存储器(RAM)、内存、只读存储器(ROM)、电可编程ROM、电可擦除可编程ROM、寄存器、硬盘、可移动磁盘、CD-ROM、或技术领域内所公知的任意其它形式的存储介质中。
以上所述的具体实施方式,对本申请的目的、技术方案和有益效果进行了进一步详细说明, 所应理解的是,以上所述仅为本申请的具体实施方式而已,并不用于限定本申请的保护范围,凡在本申请的精神和原则之内,所做的任何修改、等同替换、改进等,均应包含在本申请的保护范围之内。

Claims (26)

  1. 一种集群系统的管理方法,其特征在于,所述管理方法应用于管理节点,所述集群系统包括所述管理节点和N个计算节点,所述集群系统负责处理预设区域内的业务数据,所述N为大于1的整数,所述管理方法包括:
    获取历史业务数据,所述历史业务数据为当前周期之前若干周期内的业务数据;
    确定所述历史业务数据的时空分布特征;
    基于所述历史业务数据的时空分布特征,将所述预设区域划分为M个子区域,所述历史业务数据在所述M个子区域间分布的差异值小于预设阈值,所述M为大于1的整数;
    确定所述M个子区域与所述N个计算节点的对应关系,将所述M个子区域中各个子区域的业务数据路由至对应的所述计算节点进行处理。
  2. 根据权利要求1所述的管理方法,其特征在于,所述确定所述历史业务数据的时空分布特征,包括:
    基于历史业务数据中多个时间段的业务数据的空间分布特征,确定多个时间切片分别对应的业务数据的空间分布矩阵,所述多个时间段中的相邻时间段之间连续;
    基于所述多个时间切片分别对应的业务数据的空间分布矩阵,确定所述历史业务数据的时空分布特征。
  3. 根据权利要求2所述的管理方法,其特征在于,所述基于所述多个时间切片分别对应的业务数据的空间分布矩阵,确定所述历史业务数据的时空分布特征,包括:
    将多个时间切片中各个时间切片对应的业务数据的空间分布矩阵进行归一化处理,确定归一化后的所述各个时间切片对应的业务数据的空间分布矩阵;
    基于所述归一化后的所述各个时间切片对应的业务数据的空间分布矩阵之间的相似度,将所述多个时间切片划分为多种时间切片;
    将归一化后的所述多种时间切片对应的业务数据的空间分布矩阵进行聚合处理,确定聚合分布矩阵,所述聚合分布矩阵表征所述历史业务数据的时空分布特征。
  4. 根据权利要求1-3任一项所述的管理方法,其特征在于,所述基于所述历史业务数据的时空分布特征,将所述预设区域划分为M个子区域,包括:
    基于所述历史业务数据的时空分布特征和空间邻近关系,将所述预设区域划分为M个子区域,所述历史业务数据在所述M个子区域间分布的差异值小于预设阈值。
  5. 根据权利要求4所述的管理方法,其特征在于,所述基于所述历史业务数据的时空分布特征和空间邻近关系,将所述预设区域划分为M个子区域,包括:
    确定自适应划分阈值;
    基于所述自适应划分阈值对所述历史业务数据进行初始空间网格划分;
    基于所述历史业务数据的时空分布特征,确定所述初始空间网格中各个子网格内的数据量;
    遍历所述初始空间网格中各个子网格,基于所述初始空间网格中各个子网格内的数据量和自适应划分阈值,对所述初始空间网格中各个子网格进行剖分或聚合,确定所述历史业务数据的自适应空间网格划分;
    基于所述自适应空间网格中各个子网格的邻近关系和所述自适应空间网格中各个子网格内的数据量,将所述预设区域划分为M个子区域,所述M个子区域内的数据量分布均 衡。
  6. 根据权利要求5所述的管理方法,其特征在于,所述基于所述自适应划分阈值对所述历史业务数据进行初始空间网格划分,包括:
    基于所述自适应划分阈值和所述预设区域的位置信息,确定初始划分层级;
    基于所述初始划分层级和基于空间分布的层级计算模型,对所述历史业务数据进行初始空间网格划分。
  7. 根据权利要求5或6所述的管理方法,其特征在于,所述基于所述自适应网格中各个子网格的邻近关系和所述自适应网格中各个子网格内的数据量,将所述预设区域划分为M个子区域,包括:
    基于所述历史业务数据的自适应空间网格划分,确定所述自适应空间网格对应的无向图,其中,所述无向图的各个节点基于所述自适应空间网格中各个子网格确定,所述各个节点的权重基于所述自适应空间网格中各个子网格内的数据量确定,连接所述相邻节点的边基于所述自适应空间网格中各个子网格的邻近关系确定,各个所述边的权重基于所述自适应空间网格中各个相邻子网格的距离确定;
    基于所述自适应空间网格对应的无向图,确定所述无向图的邻接矩阵;
    基于所述自适应网格和所述无向图的邻接矩阵,将所述预设区域划分为M个子区域,所述M个子区域间的数据量分布的差异值小于预设阈值。
  8. 根据权利要求1-7任一项所述的管理方法,其特征在于,所述确定所述M个子区域与所述N个计算节点的对应关系,包括:
    确定所述N大于或等于所述M;
    将所述N个计算节点划分为M个子集群,所述M个子集群与所述M个子区域一一对应。
  9. 根据权利要求1-7任一项所述的管理方法,其特征在于,所述确定所述M个子区域与所述N个计算节点的对应关系,包括:
    确定所述N小于所述M;
    统计所述M个子区域中各个子区域在多个时间段的平均业务数据量;
    以组间的所述平均业务数据量的差异值最小为目标,将所述M个子区域划分为N组子区域;
    所述N组子区域与所述N个计算节点一一对应。
  10. 根据权利要求1-9任一项所述的管理方法,其特征在于,所述管理方法还包括:
    基于所述当前周期的历史业务数据的时空分布特征和所述上一周期的历史业务数据的时空分布特征,确定所述预设区域的划分是否需要更新。
  11. 根据权利要求10所述的管理方法,其特征在于,所述基于所述当前周期的历史业务数据的时空分布特征和所述上一周期的历史业务数据的时空分布特征,确定所述预设区域的划分是否需要更新,包括:
    比较所述当前周期的历史业务数据的时空分布特征和所述上一周期的历史业务数据的时空分布特征的相似度是否大于预设阈值;
    若是,则确定所述预设区域的划分需要更新,若否,则确定所述预设区域的划分不需要更新。
  12. 根据权利要求1-11任一项所述的管理方法,其特征在于,所述多个计算节点中 各个计算节点缓存所述各个计算节点对应的子区域内的与所述业务相关的数据。
  13. 一种集群系统的管理装置,其特征在于,所述管理装置应用于集群系统,所述集群系统包括所述管理节点和N个计算节点,所述集群系统负责处理预设区域内的负载请求,所述N为大于1的整数,所述装置包括:
    获取模块,用于获取历史业务数据,所述历史业务数据为当前周期之前若干周期内的业务数据;
    确定模块,用于确定所述历史业务数据的时空分布特征;
    划分模块,用于基于所述历史业务数据的时空分布特征,将所述预设区域划分为M个子区域,所述历史业务数据在所述多个子区域间分布的差异值小于预设阈值,所述M为大于1的整数;
    路由模块,用于确定所述M个子区域与所述N个计算节点的对应关系,将所述M个子区域中各个子区域的业务数据路由至对应的所述计算节点进行处理。
  14. 根据权利要求13所述的装置,其特征在于,所述确定模块具体用于,基于历史业务数据中多个时间段的业务数据的空间分布特征,确定多个时间切片分别对应的业务数据的空间分布矩阵,所述多个时间段中的相邻时间段之间连续;
    基于所述多个时间切片分别对应的业务数据的空间分布矩阵,确定所述历史业务数据的时空分布特征。
  15. 根据权利要求14所述的装置,其特征在于,所述基于所述多个时间切片分别对应的业务数据的空间分布矩阵,确定所述历史业务数据的时空分布特征,包括:
    将多个时间切片中各个时间切片对应的业务数据的空间分布矩阵进行归一化处理,确定归一化后的所述各个时间切片对应的业务数据的空间分布矩阵;
    基于所述归一化后的所述各个时间切片对应的业务数据的空间分布矩阵之间的相似度,将所述多个时间切片划分为多种时间切片;
    将归一化后的所述多种时间切片对应的业务数据的空间分布矩阵进行聚合处理,确定聚合分布矩阵,所述聚合分布矩阵表征所述历史业务数据的时空分布特征。
  16. 根据权利要求13-15任一项所述的装置,其特征在于,所述划分模块具体用于,基于所述历史业务数据的时空分布特征和空间邻近关系,将所述预设区域划分为M个子区域,所述历史业务数据在所述M个子区域间分布的差异值小于预设阈值。
  17. 根据权利要求16所述的装置,其特征在于,所述基于所述历史业务数据的时空分布特征和空间邻近关系,将所述预设区域划分为M个子区域,包括:
    确定自适应划分阈值;
    基于所述自适应划分阈值对所述历史业务数据进行初始空间网格划分;
    基于所述历史业务数据的时空分布特征,确定所述初始空间网格中各个子网格内的数据量;
    遍历所述初始空间网格中各个子网格,基于所述初始空间网格中各个子网格内的数据量和自适应划分阈值,对所述初始空间网格中各个子网格进行剖分或聚合,确定所述历史业务数据的自适应空间网格划分;
    基于所述自适应空间网格中各个子网格的邻近关系和所述自适应空间网格中各个子 网格内的数据量,将所述预设区域划分为M个子区域,所述M个子区域内的数据量分布均衡。
  18. 根据权利要求17所述的装置,其特征在于,所述基于所述自适应划分阈值对所述历史业务数据进行初始空间网格划分,包括:
    基于所述自适应划分阈值和所述预设区域的位置信息,确定初始划分层级;
    基于所述初始划分层级和基于空间分布的层级计算模型,对所述历史业务数据进行初始空间网格划分。
  19. 根据权利要求17或18所述的装置,其特征在于,所述基于所述自适应网格中各个子网格的邻近关系和所述自适应网格中各个子网格内的数据量,将所述预设区域划分为M个子区域,包括:
    基于所述历史业务数据的自适应空间网格划分,确定所述自适应空间网格对应的无向图,其中,所述无向图的各个节点基于所述自适应空间网格中各个子网格确定,所述各个节点的权重基于所述自适应空间网格中各个子网格内的数据量确定,连接所述相邻节点的边基于所述自适应空间网格中各个子网格的邻近关系确定,各个所述边的权重基于所述自适应空间网格中各个相邻子网格的距离确定;
    基于所述自适应空间网格对应的无向图,确定所述无向图的邻接矩阵;
    基于所述自适应网格和所述无向图的邻接矩阵,将所述预设区域划分为M个子区域,所述M个子区域间的数据量分布的差异值小于预设阈值。
  20. 根据权利要求13-19任一项所述的装置,其特征在于,所述路由模块具体用于,
    确定所述N大于或等于所述M;
    将所述N个计算节点划分为M个子集群,所述M个子集群与所述M个子区域一一对应。
  21. 根据权利要求13-19任一项所述的装置,其特征在于,所述路由模块具体用于,
    确定所述N小于所述M;
    统计所述M个子区域中各个子区域在多个时间段的平均业务数据量;
    以组间的所述平均业务数据量的差异值最小为目标,将所述M个子区域划分为N组子区域;
    所述N组子区域与所述N个计算节点一一对应。
  22. 根据权利要求13-21任一项所述的装置,其特征在于,所述装置还包括:
    更新模块,用于基于所述当前周期的历史业务数据的时空分布特征和所述上一周期的历史业务数据的时空分布特征,确定所述预设区域的划分是否需要更新。
  23. 根据权利要求22所述的装置,其特征在于,所述更新模块具体用于,比较所述当前周期的历史业务数据的时空分布特征和所述上一周期的历史业务数据的时空分布特征的相似度是否大于预设阈值;
    若是,则确定所述预设区域的划分需要更新,若否,则确定所述预设区域的划分不需要更新。
  24. 一种集群系统,其特征在于,包括管理节点和多个计算节点,所述集群系统负责处理预设区域内的业务数据,所述管理节点包括存储器和处理器,所述存储器中存储有指令,当所述指令被处理器执行时,使得如权利要求1-12任一项所述的方法被实现。
  25. 一种计算设备,包括存储器和处理器,其特征在于,所述存储器中存储有指令,当所述指令被处理器执行时,使得如权利要求1-12任一项所述的方法被实现。
  26. 一种计算机可读存储介质,其上存储有计算机指令,其特征在于,当所述计算机指令在被处理器执行时,使得如权利要求1-12任一项所述的方法被实现。
PCT/CN2022/138000 2021-12-10 2022-12-09 一种集群系统的管理方法及装置 WO2023104192A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111508888.6 2021-12-10
CN202111508888.6A CN116257349A (zh) 2021-12-10 2021-12-10 一种集群系统的管理方法及装置

Publications (1)

Publication Number Publication Date
WO2023104192A1 true WO2023104192A1 (zh) 2023-06-15

Family

ID=86684912

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/138000 WO2023104192A1 (zh) 2021-12-10 2022-12-09 一种集群系统的管理方法及装置

Country Status (2)

Country Link
CN (1) CN116257349A (zh)
WO (1) WO2023104192A1 (zh)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116489709A (zh) * 2023-06-20 2023-07-25 中电科新型智慧城市研究院有限公司 节点调度策略确定方法、终端设备及存储介质
CN117193989A (zh) * 2023-11-07 2023-12-08 广东云下汇金科技有限公司 一种分区数据中心的数据集中调度方法及其相关设备
CN117293843A (zh) * 2023-10-08 2023-12-26 广东正力通用电气有限公司 一种城市照明系统集中管理平台及管理方法
CN117312800A (zh) * 2023-11-07 2023-12-29 广东省科学院广州地理研究所 基于潮汐分析的地理时空数据分析方法及系统

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116737392B (zh) * 2023-08-11 2023-11-10 北京智网易联科技有限公司 一种非矢量数据的处理方法、装置及计算设备

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160203416A1 (en) * 2013-08-23 2016-07-14 Telefonaktiebolaget L M Ericsson (Publ) A method and system for analyzing accesses to a data storage type and recommending a change of storage type
CN106844585A (zh) * 2017-01-10 2017-06-13 广东精规划信息科技股份有限公司 一种基于多源物联网位置感知的时空关系分析系统
CN109241161A (zh) * 2018-08-09 2019-01-18 深圳市雅码科技有限公司 一种气象数据管理方法
CN111159107A (zh) * 2019-12-30 2020-05-15 北京明略软件系统有限公司 数据处理方法和服务器集群
CN113569937A (zh) * 2021-01-20 2021-10-29 廖彩红 基于大数据的人工智能模型机器学习方法及服务器

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160203416A1 (en) * 2013-08-23 2016-07-14 Telefonaktiebolaget L M Ericsson (Publ) A method and system for analyzing accesses to a data storage type and recommending a change of storage type
CN106844585A (zh) * 2017-01-10 2017-06-13 广东精规划信息科技股份有限公司 一种基于多源物联网位置感知的时空关系分析系统
CN109241161A (zh) * 2018-08-09 2019-01-18 深圳市雅码科技有限公司 一种气象数据管理方法
CN111159107A (zh) * 2019-12-30 2020-05-15 北京明略软件系统有限公司 数据处理方法和服务器集群
CN113569937A (zh) * 2021-01-20 2021-10-29 廖彩红 基于大数据的人工智能模型机器学习方法及服务器

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116489709A (zh) * 2023-06-20 2023-07-25 中电科新型智慧城市研究院有限公司 节点调度策略确定方法、终端设备及存储介质
CN116489709B (zh) * 2023-06-20 2023-11-17 中电科新型智慧城市研究院有限公司 节点调度策略确定方法、终端设备及存储介质
CN117293843A (zh) * 2023-10-08 2023-12-26 广东正力通用电气有限公司 一种城市照明系统集中管理平台及管理方法
CN117293843B (zh) * 2023-10-08 2024-03-22 广东正力通用电气有限公司 一种城市照明系统集中管理平台及管理方法
CN117193989A (zh) * 2023-11-07 2023-12-08 广东云下汇金科技有限公司 一种分区数据中心的数据集中调度方法及其相关设备
CN117312800A (zh) * 2023-11-07 2023-12-29 广东省科学院广州地理研究所 基于潮汐分析的地理时空数据分析方法及系统
CN117312800B (zh) * 2023-11-07 2024-03-08 广东省科学院广州地理研究所 基于潮汐分析的地理时空数据分析方法及系统
CN117193989B (zh) * 2023-11-07 2024-03-15 广东云下汇金科技有限公司 一种分区数据中心的数据集中调度方法及其相关设备

Also Published As

Publication number Publication date
CN116257349A (zh) 2023-06-13

Similar Documents

Publication Publication Date Title
WO2023104192A1 (zh) 一种集群系统的管理方法及装置
CN110166282B (zh) 资源分配方法、装置、计算机设备和存储介质
Dabbagh et al. Energy-efficient resource allocation and provisioning framework for cloud data centers
US7984151B1 (en) Determining placement of user data to optimize resource utilization for distributed systems
WO2020143164A1 (zh) 一种网络资源的分配方法及设备
US11088927B2 (en) SDN controller, system and method for task scheduling, resource provisioning and service providing
Taft et al. P-store: An elastic database system with predictive provisioning
Li et al. Scalable replica selection based on node service capability for improving data access performance in edge computing environment
Xia et al. The operational cost minimization in distributed clouds via community-aware user data placements of social networks
De Souza et al. Boosting big data streaming applications in clouds with BurstFlow
Xia et al. Efficient data placement and replication for QoS-aware approximate query evaluation of big data analytics
Chen et al. Latency minimization for mobile edge computing networks
Zhang et al. C-cube: Elastic continuous clustering in the cloud
Martin et al. Predicting energy consumption with streammine3g
Li et al. Data allocation in scalable distributed database systems based on time series forecasting
Ghosh et al. Popular is cheaper: Curtailing memory costs in interactive analytics engines
WO2022089321A1 (zh) 调度接入点的方法、装置、服务器以及存储介质
Li et al. An adaptive read/write optimized algorithm for Ceph heterogeneous systems via performance prediction and multi-attribute decision making
Wang et al. Model-based scheduling for stream processing systems
Akdogan et al. Cost-efficient partitioning of spatial data on cloud
Sajjad et al. Optimizing windowed aggregation over geo-distributed data streams
Andrade et al. Optimizing cloud caches for free: A case for autonomic systems with a serverless computing approach
Chaurasia et al. A resource efficient expectation maximization clustering approach for cloud
Yoshihisa et al. A low-load distributed stream processing system for continuous conjunctive normal form queries
Qiu et al. RPPM: a request pre-processing method for real-time on-demand data broadcast scheduling

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22903609

Country of ref document: EP

Kind code of ref document: A1