WO2022121519A1 - 一种分布式数据流资源弹性伸缩增强插件及增强方法 - Google Patents

一种分布式数据流资源弹性伸缩增强插件及增强方法 Download PDF

Info

Publication number
WO2022121519A1
WO2022121519A1 PCT/CN2021/124859 CN2021124859W WO2022121519A1 WO 2022121519 A1 WO2022121519 A1 WO 2022121519A1 CN 2021124859 W CN2021124859 W CN 2021124859W WO 2022121519 A1 WO2022121519 A1 WO 2022121519A1
Authority
WO
WIPO (PCT)
Prior art keywords
decision
scaling
recommendation
plug
data stream
Prior art date
Application number
PCT/CN2021/124859
Other languages
English (en)
French (fr)
Inventor
闻立杰
宗瓒
Original Assignee
清华大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 清华大学 filed Critical 清华大学
Priority to US17/791,484 priority Critical patent/US11853801B2/en
Publication of WO2022121519A1 publication Critical patent/WO2022121519A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/445Program loading or initiating
    • G06F9/44521Dynamic linking or loading; Link editing at or after load time, e.g. Java class loading
    • G06F9/44526Plug-ins; Add-ons
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • G06N20/20Ensemble learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N7/00Computing arrangements based on specific mathematical models
    • G06N7/01Probabilistic graphical models, e.g. probabilistic networks

Definitions

  • the invention relates to the field of data stream resource allocation, in particular to a distributed data stream resource elastic scaling enhancement plug-in and an enhancement method.
  • Distributed data streaming applications usually provide long-lived real-time data processing services. Typical stream processing scenarios are often accompanied by fluctuations in data stream load. For example, the data volume of social networking website sentiment analysis services will be greatly reduced at night, and the flow of sensor data is usually related to the frequency of device usage. A sudden increase or decrease in the data flow load will have an impact on the distributed data flow that performs real-time data processing operations. When the load suddenly increases, the resources allocated for the distributed data stream may not be able to meet the computing requirements, resulting in that the processing rate cannot be consistent with the data inflow rate; when the load suddenly decreases, the distributed data stream may occupy too many resources, resulting in resource of waste. Therefore, the data flow needs an elastic scaling controller to complete the elastic scaling of resources with the load. Dataflow applications usually abstract resources into instances, and each instance contains a certain number of CPU cores and memory. The elastic scaling controller performs resource scaling operations by automatically controlling the number of instances used by the data flow.
  • the existing resource elastic scaling controller adjusts the strategy in a responsive manner, so that the amount of data flow resources can cope with the current data generation rate.
  • a data flow application consists of multiple computing nodes, and the minimum unit of resource allocation for each computing node is an "instance". By increasing or decreasing the number of instances, computing resources can be dynamically increased or decreased for data flow.
  • the data processing capability of the node can be measured as ⁇ p .
  • allocating ⁇ / ⁇ p instances to the computing node can cope with the current data inflow rate. Since the data flow may have a "one-to-one" or "many-to-one" node connection relationship, the ⁇ of each node can be calculated according to the output rate of the upstream node. Starting from the data source node and traversing the computing nodes in order according to the topological sorting order, the number of instances that should be allocated to each node can be calculated.
  • the above calculation process can complete the calculation of the optimal number of instances quickly by monitoring the traffic of each node of the data stream.
  • increasing the number of instances often cannot bring linear performance improvement, resulting in the number of instances being allocated in one step.
  • this method needs to iterate the process of "calculating the number of instances - verifying whether it is optimal" multiple times until the number of instances calculated based on the current data load is no longer available. change.
  • this computing-based controller has been able to complete elastic resource scaling faster.
  • experiments have shown that this method still requires multiple attempts to complete a resource elastic scaling.
  • the purpose of the present invention is to provide a distributed data stream resource elastic scaling enhancement plug-in and an enhancement method, so as to improve the accuracy and efficiency of resource elastic scaling.
  • the present invention provides the following scheme:
  • An enhanced plug-in for elastic scaling of distributed data stream resources is connected with a scaling controller for elastic scaling of distributed data stream resources;
  • the plug-in includes: a decider, a decision model, and a scaling operation sample library;
  • the scaling controller registers the data stream to the plug-in through the first interface; the scaling controller sends the optimal decision of resource scaling in each state to the plug-in through the second interface, and the optimal decision is: A resource allocation decision that makes the amount of data flow resources in the current state adapt to the current amount of input data;
  • the scaling operation sample library is used to record the optimal decision for scaling each state resource;
  • the decision model is used to predict the received data stream according to the optimal decision recorded in the scaling operation sample library, and generate a prediction decision;
  • the decision model is a machine learning model;
  • the decider is configured to determine a recommendation decision according to the prediction decision, and the recommendation decision is the prediction decision or a decision generated by the current scaling controller;
  • the decider uses the second the interface returns the recommendation decision to the scaling controller;
  • the scaling controller performs scaling operations on the current data stream according to the recommendation decision.
  • the plug-in is connected to the scaling controller through an HTTP interface.
  • the scaling controller is further configured to, after completing the scaling operation, determine the decision quality of the recommendation decision corresponding to the scaling operation, and feed back the decision quality to the plug-in through a third interface; the recommendation decision The quality of the decision is whether the recommendation decision is optimal, and when the recommendation decision is optimal, the plug-in stores the recommendation decision as the optimal decision in the scaling operation sample library;
  • the first interface, the second interface and the third interface are all HTTP interfaces.
  • the decider is configured to determine a recommendation decision according to the uncertainty of the prediction decision; when the uncertainty of the prediction decision is greater than a threshold, determine the decision generated by the scaling controller as a recommendation decision; when When the uncertainty of the prediction decision is not greater than the threshold, the prediction decision is determined as a recommendation decision.
  • the present invention also provides a method for enhancing an enhanced plug-in for elastic scaling of distributed data stream resources.
  • the enhancement method of the elastic scaling enhancement plug-in of the type data flow resource includes:
  • the decision model is used to generate prediction decisions based on the scaling operation sample library
  • a recommendation decision is determined based on the decider; the recommendation decision is the prediction decision or a decision generated by the current scaling controller;
  • the scaling controller is used to perform scaling operations on the current data stream based on the recommendation decision.
  • the decision model is used to generate the prediction decision based on the scaling operation sample library, which specifically includes:
  • the decision model is trained based on the scaling operation sample library to obtain a trained decision model
  • the trained decision model is used to make predictions on the current data flow to generate prediction decisions.
  • the recommendation decision is determined based on the decider, specifically including:
  • the decision generated by the scaling controller is determined as a recommended decision
  • the prediction decision is determined as a recommendation decision.
  • adopting the scaling controller to perform scaling operations on the current data stream based on the recommendation decision further includes:
  • the decision quality of the recommendation decision corresponding to the scaling operation is determined; the decision quality of the recommendation decision is whether the recommendation decision is optimal;
  • the recommendation decision is optimal
  • the recommendation decision is stored in the scaling operation sample library as an optimal decision.
  • determining the decision quality of the recommendation decision corresponding to the scaling operation specifically includes:
  • the present invention discloses the following technical effects:
  • the scaling operation sampling process used by the plug-in of the present invention can gradually collect learning samples for model training without interfering with the work of the existing scaling controller.
  • This sample collection process has no additional overhead, allowing the plugin to work "out of the box".
  • the final decision is made after comprehensively considering the model prediction quality and the decision given by the current scaling controller. This helps to ensure that the plug-in will not have a negative impact on the scaling controller, enhances the decision-making accuracy of the existing elastic scaling controller, and realizes that only one decision can be used to complete the elastic resource scaling operation. Rapid resource scaling will rapidly improve data processing capabilities when the distribution of distributed data stream resources is insufficient, and reduce resource waste when resources are allocated too much.
  • Fig. 1 is the architecture diagram of the distributed data stream resource elastic scaling enhancement plug-in according to the present invention
  • Fig. 2 is the schematic flow chart that the decision-making model of the present invention generates prediction decision
  • FIG. 3 is a schematic flow chart of the decision maker of the present invention determining a recommendation decision
  • FIG. 4 is a schematic flowchart of the feedback decision quality of the telescopic controller according to the present invention.
  • FIG. 5 is a schematic flow chart of an enhancement method for a distributed data stream resource elastic scaling enhancement plug-in according to the present invention.
  • FIG. 1 is an architectural diagram of a distributed data stream resource elastic scaling enhancement plug-in according to the present invention.
  • the distributed data stream resource elastic scaling enhancement plug-in of the present invention includes a decider 1 , a decision model 2 and a scaling operation sample library 3 .
  • the plug-in of the present invention is connected with the scaling controller for elastic scaling of distributed data stream resources through the HTTP interface, and the integration can be completed through a simple interface.
  • the plug-in of the present invention connects the three parts of the decision maker 1, the decision model 2 and the scaling operation sample library 3 with the existing scaling controller through the HTTP back end to complete data transmission and function calling.
  • the plugin After being integrated, the plugin will not affect the way the existing elastic scaling controllers work, and only provides recommended scaling decisions (the required number of instances of each computing node) in subsequent resource scaling operations. After using the recommendation decision, the elastic scaling controller needs to feed back to the plug-in whether the decision can be scaled. Through the continuous learning of the controller's decision-making, the decision-making of the plug-in recommendation is made more and more accurate.
  • the plug-in of the present invention includes three HTTP interfaces, namely: a registration data stream interface, an update decision interface and a recommendation decision quality feedback interface.
  • the register dataflow interface is used to register new dataflows to the plugin. For multi-tenant distributed dataflow frameworks, it is often necessary to run numerous dataflow jobs.
  • the plugin also supports multi-tenancy. Call the register dataflow interface and register the dataflow topology as a parameter to the plugin.
  • the topology of the data flow is expressed in JSON format, and the name of each computing node, the initial number of instances, and the node connection relationship are recorded. This JSON-formatted information will be sent as a parameter to the plugin via an HTTP request.
  • the plug-in will return an ID that can uniquely identify the data stream, which is convenient for subsequent interfaces to make decisions, update and recommend the data stream.
  • the update decision interface is used to send the corresponding operation in the current data flow state to the plug-in when the elastic scaling controller performs the resource scaling operation, so that the plug-in can learn the decision that should be made in this state.
  • the method uses two metrics to represent the data flow state. The first indicator is the current throughput of the data stream, which reflects the current data load; the second indicator is the queue length waiting to be calculated in the input queue of each node, which reflects the current resource configuration of each node in the data stream. "pressure”.
  • the elastic scaling controller usually needs to use multiple decisions to complete one resource scaling.
  • the present invention takes the result of the last decision as the optimal decision, because the decision can make the amount of data flow resources just fit the current amount of input data.
  • the scaling controller sends the optimal decision corresponding to each state to the plug-in, so that the plug-in collects the decisions that need to be made in different states and saves it to the scaling operation sample library. This action is a prerequisite for the plugin to learn the optimal decision.
  • the decision information collected by the plug-in is less, it is not enough to train an accurate decision model, so the plug-in will directly return the decision of the scaling controller; when the decision model can make predictions with low uncertainty, the plug-in will directly return the decision of the scaling controller. Return the decision predicted by the model as a recommendation decision.
  • the recommendation decision quality feedback interface is used to feed back the decision quality recommended by the plug-in, so as to judge whether the recommendation decision is optimal.
  • the present invention defines the problem of predicting the number of computing node instances as a regression problem. Fitting a machine learning model is easier because the sample dimension is low and the predicted values are integers. At the same time, since the present invention needs to judge the quality of the prediction result, the distribution estimation is used instead of the value estimation, and the Bayesian linear regression is used to learn the samples. As shown in FIG. 2 , the decision model of the present invention builds a model independently for each computing node in the data stream.
  • the update decision interface of the plug-in of the present invention supports sample updates to a single computing node or multiple computing nodes.
  • a certain node in the data flow may be more sensitive to resources (eg, the node with the most intensive computing operations in the data flow topology), so it is elastically scaled more frequently.
  • resources eg, the node with the most intensive computing operations in the data flow topology
  • the number of samples of different computing nodes may be different.
  • the sample contains the data flow state and the corresponding optimal number of instances in this state.
  • the optimal number of instances corresponding to different states is predicted, so as to achieve the purpose of resource allocation recommendation. This operation will continue to run on the plugin backend to continuously learn new samples.
  • the present invention adopts a threshold-based judgment method to enhance the stability of the plug-in.
  • the elastic scaling controller calls the update decision interface
  • the continuously trained decision model will make decisions according to the current state of the data stream, that is, predict the number of instances required by the computing node. Since the prediction result of the Bayesian linear regression adopted by this plugin is a distribution, the uncertainty can be calculated according to the output distribution information, and the uncertainty of the prediction result can be used to judge the model's confidence in the accurate prediction.
  • the decider When the uncertainty of the model output is less than or equal to the threshold ⁇ , the decider will consider the prediction to be accurate, and then use the number of predicted instances to replace the number of instances of the corresponding nodes in the decision given by the scaling controller; when the output uncertainty is When the degree is greater than the threshold ⁇ , the decider will consider that the accurate prediction is not very sure, so it will ignore the prediction result and directly use the number of instances of the scaling controller. The number of instances corresponding to each node is called a decision. After the above steps, the decider can generate the final recommendation decision and return it to the scaling controller.
  • the decision maker strategy will temporarily use the decision given by the scaling controller when the sample is insufficient, and directly give accurate resource scaling decisions after the model is gradually accurate.
  • the existing scaling controller can obtain gradually accurate decisions from the plug-in, and finally realize a decision to complete the resource scaling operation.
  • the scaling operation sample library records the optimal decision corresponding to each state of the data flow, that is, the number of instances required by each computing node of the data flow under different data loads.
  • the plug-in of the present invention firstly defines how to represent the data flow state. When a node of the data flow has insufficient computing resources, a back pressure phenomenon will occur, and the data will accumulate in the output queue of the upstream computing node. By monitoring the output queue of each computing node of the data flow and the throughput of the current data flow, the load status of the processing data of the current data flow is measured.
  • the elastic scaling controller usually needs multiple decisions to complete the scaling of resources to reduce the pressure on the data flow.
  • the scaling operation will be stored in the plug-in's scaling operation sample library through the interface to learn the optimal decision.
  • the update decision interface is called and sent to the plug-in for the plug-in to generate recommendation decisions.
  • the elastic scaling controller will execute the recommendation decision to complete the resource scaling of the current step and observe whether the decision converges.
  • the convergence result means that the quality result of the recommendation decision will be sent to the plug-in by calling the recommendation decision quality feedback interface.
  • the recommendation decision is determined to be the optimal decision in the current state.
  • the determination of the convergence mode is related to the implementation of the scaling controller. For example, the gap between throughput and incoming data traffic is less than a certain threshold, or the controller makes multiple decisions to remain the same, etc.
  • the optimal decisions executed by the elastic scaling controller will gradually be stored in the scaling operation sample library.
  • the sample contains the optimal number of instances corresponding to the computing node under a specific data flow state.
  • the present invention also provides a method for enhancing an enhanced plug-in for elastic scaling of distributed data flow resources.
  • the enhancement method of the distributed data stream resource elastic scaling enhancement plug-in of the present invention includes the following steps:
  • Step 100 Obtain the current data stream.
  • Step 200 According to the current data flow, a decision model is used to generate a prediction decision based on the scaling operation sample library. First, the decision model is trained based on the scaling operation sample library to obtain a trained decision model; then, the trained decision model is used to predict the current data flow to generate a prediction decision. As the data flow load changes, the optimal decision of the elastic scaling controller will be gradually stored in the scaling operation sample library, where the samples include the optimal number of instances corresponding to the computing node in a specific data flow state. When the decision-making information collected by the plug-in is less, it is not enough to train an accurate decision-making model. When the decision-making information collected by the plug-in is sufficient, the plug-in uses the machine learning method to generate a decision-making model, which can complete the prediction of the optimal number of instances. .
  • Step 300 Obtain the decision generated by the current scaling controller.
  • Step 400 According to the predicted decision, determine a recommendation decision based on the decider.
  • Recommended decisions are either prediction decisions or decisions generated by the current scaling controller.
  • the decision model can predict with lower uncertainty, that is, when the uncertainty of the prediction decision generated by the decision model is less than or equal to the threshold, the plugin returns the prediction decision generated by the decision model as a recommended decision; when the decision model generates When the uncertainty of the prediction decision is greater than the threshold, the plugin returns the decision generated by the scaling controller as a recommendation decision.
  • Step 500 Adopt a scaling controller to perform scaling operations on the current data stream based on the recommendation decision. After obtaining the recommendation decision obtained from the decision model of the plug-in and combining with the decider policy, the scaling controller will execute the recommendation decision to complete the resource scaling of the current step.
  • the scaling controller After the scaling controller completes the scaling operation, by judging whether the recommendation decision satisfies the convergence condition, it is determined whether the recommendation decision corresponding to the scaling operation is optimal, that is, the decision quality of the recommendation decision corresponding to the scaling operation is determined;
  • the recommendation decision When the recommendation decision satisfies the convergence condition, it is determined that the recommendation decision corresponding to the scaling operation is optimal, and the recommendation decision is stored as the optimal decision in the scaling operation sample library.

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Probability & Statistics with Applications (AREA)
  • Algebra (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

一种分布式数据流资源弹性伸缩增强插件及增强方法。该插件与用于分布式数据流资源弹性伸缩的伸缩控制器连接;插件包括:决策器(1)、决策模型(2)和伸缩操作样本库(3);伸缩控制器通过第一接口将数据流注册到插件;伸缩控制器通过第二接口将每个状态下资源伸缩的最优决策发送至插件;伸缩操作样本库(3)用于记录每个状态资源伸缩的最优决策;决策模型(2)用于根据伸缩操作样本库(3)记录的最优决策对接收到的数据流进行预测,生成预测决策;决策器(1)用于根据预测决策确定推荐决策,决策器(1)通过第二接口将推荐决策返回至伸缩控制器;伸缩控制器根据推荐决策对当前数据流进行伸缩操作。上述增强插件及增强方法可以提高资源弹性伸缩的准确度和效率。

Description

一种分布式数据流资源弹性伸缩增强插件及增强方法
本申请要求于2020年12月10日提交中国专利局、申请号为202011434620.8、发明名称为“一种分布式数据流资源弹性伸缩增强插件及增强方法”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本发明涉及数据流资源分配领域,特别是涉及一种分布式数据流资源弹性伸缩增强插件及增强方法。
背景技术
分布式数据流应用通常提供长生命周期的实时数据处理服务。典型的流处理场景通常伴随着数据流负载的波动。例如,社交网站情感分析服务在夜间的数据量将大幅降低,传感器数据的流量通常和设备使用频率有关。数据流负载突然升高或降低,将对执行实时数据处理操作的分布式数据流产生影响。当负载突然升高,为分布式数据流所分配资源可能无法满足计算需求,导致处理速率不能与数据流入速率保持一致;当负载突然降低,分布式数据流可能占用了过多的资源,导致资源的浪费。因此,数据流需要弹性伸缩控制器来完成资源随负载量的弹性伸缩。数据流应用通常将资源抽象成实例,每个实例包含一定数量的CPU核数和内存。弹性伸缩控制器通过自动控制数据流所使用的实例数量,来进行资源的伸缩操作。
现有的资源弹性伸缩控制器通过响应式的调整策略,使数据流资源量可以应对当前的数据产生速率。一般来说,数据流应用由多个计算节点组成,每个计算节点的资源分配最小单位为“实例”。通过增加或减少实例的数量,可以动态的为数据流增加或减少计算资源。
假设数据流的某计算节点数据流入速率为λ,通过观察该计算节点当前的计算状态,可以测量出该节点的数据处理能力为λ p。理论上,为该计算节点分配λ/λ p个实例,就可以应对当前的数据流入速率。由于数据流可能存在“一对一”或“多对一”的节点连接关系,因此每个节点的λ可以根据上游节点的输出速率计算得出。从数据源节点开始,根据拓扑排序的 顺序依次遍历计算节点,就可以计算出每个节点应分配的实例数量。
以上计算过程可以通过对数据流各个节点的流量进行监控,完成快速的最优实例数量的计算,但由于实际情况下,增加实例数量往往不能带来线性的性能提升,导致实例数量的分配无法一步完成。由于分布式程序网络传输开销或异构机器的计算能力不同等因素,该方法需要迭代多次“计算实例数量-验证是否最优”的过程,直至根据当前数据负载计算得出的实例数量不再发生变化。与基于规则的弹性伸缩控制器相比,这种基于计算的控制器已经能够更快的完成弹性资源伸缩。但实验证明,该方法仍然需要多次尝试来完成一次资源弹性伸缩。
发明内容
本发明的目的是提供一种分布式数据流资源弹性伸缩增强插件及增强方法,以提高资源弹性伸缩的准确度和效率。
为实现上述目的,本发明提供了如下方案:
一种分布式数据流资源弹性伸缩增强插件,所述插件与用于分布式数据流资源弹性伸缩的伸缩控制器连接;所述插件包括:决策器、决策模型和伸缩操作样本库;
所述伸缩控制器通过第一接口将数据流注册到所述插件;所述伸缩控制器通过第二接口将每个状态下资源伸缩的最优决策发送至所述插件,所述最优决策为使得当前状态下数据流资源量适配当前输入数据量的资源分配决策;
所述伸缩操作样本库用于记录每个状态资源伸缩的最优决策;所述决策模型用于根据所述伸缩操作样本库记录的最优决策对接收到的数据流进行预测,生成预测决策;所述决策模型为机器学习模型;所述决策器用于根据所述预测决策确定推荐决策,所述推荐决策为所述预测决策或当前伸缩控制器生成的决策;所述决策器通过所述第二接口将所述推荐决策返回至所述伸缩控制器;
所述伸缩控制器根据所述推荐决策对当前数据流进行伸缩操作。
可选的,所述插件通过HTTP接口与所述伸缩控制器连接。
可选的,所述伸缩控制器还用于完成伸缩操作后,确定所述伸缩操作 对应的推荐决策的决策质量,并将所述决策质量通过第三接口反馈至所述插件;所述推荐决策的决策质量为所述推荐决策是否为最优,当所述推荐决策为最优时,所述插件将所述推荐决策作为最优决策存储至所述伸缩操作样本库;
所述第一接口、所述第二接口和所述第三接口均为HTTP接口。
可选的,所述决策器用于根据所述预测决策的不确定度确定推荐决策;当所述预测决策的不确定度大于阈值时,将所述伸缩控制器生成的决策确定为推荐决策;当所述预测决策的不确定度不大于阈值时,将所述预测决策确定为推荐决策。
本发明还提供一种分布式数据流资源弹性伸缩增强插件的增强方法,所述分布式数据流资源弹性伸缩增强插件的增强方法应用于上述的分布式数据流资源弹性伸缩增强插件,所述分布式数据流资源弹性伸缩增强插件的增强方法包括:
获取当前数据流;
根据当前数据流,采用决策模型基于伸缩操作样本库生成预测决策;
获取当前伸缩控制器生成的决策;
根据所述预测决策,基于决策器确定推荐决策;所述推荐决策为所述预测决策或当前伸缩控制器生成的决策;
基于所述推荐决策采用所述伸缩控制器对当前数据流进行伸缩操作。
可选的,所述根据当前数据流,采用决策模型基于伸缩操作样本库生成预测决策,具体包括:
基于所述伸缩操作样本库对所述决策模型进行训练,得到训练好的决策模型;
采用训练好的决策模型对当前数据流进行预测,生成预测决策。
可选的,所述根据所述预测决策,基于决策器确定推荐决策,具体包括:
基于所述决策器,判断所述预测决策的不确定度是否大于阈值;
当所述预测决策的不确定度大于阈值时,将所述伸缩控制器生成的决 策确定为推荐决策;
当所述预测决策的不确定度不大于阈值时,将所述预测决策确定为推荐决策。
可选的,所述基于所述推荐决策采用所述伸缩控制器对当前数据流进行伸缩操作,之后还包括:
当伸缩控制器完成伸缩操作后,确定所述伸缩操作对应的推荐决策的决策质量;所述推荐决策的决策质量为所述推荐决策是否为最优;
当所述推荐决策为最优时,将所述推荐决策作为最优决策存储至所述伸缩操作样本库。
可选的,所述当伸缩控制器完成伸缩操作后,确定所述伸缩操作对应的推荐决策的决策质量,具体包括:
通过判断所述推荐决策是否满足收敛条件,确定所述伸缩操作对应的推荐决策是否为最优;当所述推荐决策满足收敛条件时,确定所述伸缩操作对应的推荐决策最优;当所述推荐决策不满足收敛条件时,确定所述伸缩操作对应的推荐决策不是最优。
根据本发明提供的具体实施例,本发明公开了以下技术效果:
本发明插件使用的伸缩操作采样过程,可以在不干扰现有伸缩控制器工作的前提下,逐渐收集学习样本,用于模型训练。该样本收集过程无额外的开销,使该插件可以“开箱即用”。而且在使用机器学习模型拟合样本后,没有直接使用模型预测值作为最终结果,而是在综合考虑模型预测质量和当前的伸缩控制器给出的决策后,做出最终的决策。这有利于保证该插件不会为伸缩控制器带来负面的影响,可以增强现有的弹性伸缩控制器的决策精度,实现仅使用一个决策即可完成弹性资源伸缩操作。快速的资源伸缩,在分布式数据流资源分配不足时,将快速提高数据处理能力;在资源分配过多时,将减少资源的浪费。
说明书附图
为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将对实施例中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出 创造性劳动性的前提下,还可以根据这些附图获得其他的附图。
图1为本发明分布式数据流资源弹性伸缩增强插件的架构图;
图2为本发明决策模型生成预测决策的流程示意图;
图3为本发明决策器确定推荐决策的流程示意图;
图4为本发明伸缩控制器反馈决策质量的流程示意图;
图5为本发明分布式数据流资源弹性伸缩增强插件的增强方法的流程示意图。
符号说明:
决策器-1,决策模型-2,伸缩操作样本库-3。
具体实施方式
下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。
为使本发明的上述目的、特征和优点能够更加明显易懂,下面结合附图和具体实施方式对本发明作进一步详细的说明。
图1为本发明分布式数据流资源弹性伸缩增强插件的架构图。如图1所示,本发明分布式数据流资源弹性伸缩增强插件包括决策器1、决策模型2和伸缩操作样本库3。本发明的插件通过HTTP接口与用于分布式数据流资源弹性伸缩的伸缩控制器连接,通过简单的接口即可完成集成。本发明的插件通过HTTP后端将决策器1、决策模型2和伸缩操作样本库3三个部分与现有的伸缩控制器连接,完成数据传输和功能调用。在被集成后,该插件不会影响现有的弹性伸缩控制器的工作方式,仅在后续的资源伸缩操作中提供推荐的伸缩决策(各计算节点的所需实例数量)。弹性伸缩控制器需要在使用该推荐决策后,将该决策是否可以完成伸缩反馈至插件。通过不断的学习控制器的决策,使插件推荐的决策越来越精准。
本发明的插件包括3个HTTP接口,分别为:注册数据流接口、更新决策接口和推荐决策质量反馈接口。注册数据流接口用于将新的数据流注 册到插件。对于多租户的分布式数据流框架,通常需要运行众多的数据流作业。该插件同样支持多租户。调用注册数据流接口,并将数据流拓扑结构作为参数注册至插件。数据流的拓扑结构使用JSON格式表示,记录各个计算节点名称、初始实例数量以及节点连接关系。该JSON格式的信息将被作为参数通过HTTP请求发送至插件。插件将返回能够唯一标识数据流的ID,方便后续的接口进行该数据流的决策更新与推荐。
现有的弹性伸缩控制器在数据流资源不足或过多时进行资源调整,例如提高或降低计算节点的并行度。更新决策接口用于在弹性伸缩控制器进行资源伸缩操作时,将当前数据流状态下对应的操作发送至插件,供插件学习在该状态下应当作出的决策。该方法使用两个指标来表示数据流状态。第一个指标为数据流当前吞吐量,反映了当前数据负载量大小;第二个指标为每个节点输入队列中等待被计算的队列长度,反映了在当前资源配置下,数据流各节点的“压力”。弹性伸缩控制器通常需要使用多次决策来完成一次资源伸缩,本发明将最后一次决策的结果作为最优决策,因为该决策可以使数据流资源量恰好适配当前的输入数据量。伸缩控制器通过将各状态下对应的最优决策发送至插件,使插件收集到在不同状态下需要做出的决策,并保存至伸缩操作样本库。该操作是使插件可以学习最佳决策的前提。在插件收集到的决策信息较少时,不足以训练出准确的决策模型,因此插件将直接返回伸缩控制器的决策;当决策模型可以以较低的不确定度(Uncertainty)进行预测时,插件将模型预测的决策返回,作为推荐决策。
推荐决策质量反馈接口用于反馈插件推荐的决策质量,以便判断是否该推荐决策为最优。
本发明将预测计算节点实例数量的问题定义为回归问题。由于样本维度较低,且预测值为整数,因此较为容易使用机器学习模型进行拟合。同时由于本发明需要对预测结果的质量加以判断,因此使用分布估计来代替值估计,使用贝叶斯线性回归进行样本的学习。如图2所示,本发明的决策模型针对数据流中的每个计算节点单独构建模型。本发明插件的更新决策接口支持对单个计算节点或多个计算节点的样本更新。在实际场景中, 可能数据流中的某个节点对资源更敏感(如数据流拓扑结构中计算操作最为密集的节点),因此被更频繁的进行弹性伸缩。这导致伸缩操作样本库中,不同计算节点的样本数量可能不同。样本包含数据流状态及该状态下对应的最优实例数量。通过单独对每个计算节点训练模型,预测不同状态下对应的最优实例数量,达到资源配置推荐的目的。该操作将在插件后端持续运行,对新的样本进行持续的学习。
由于决策模型本身无法保证预测结果的准确性,因此,本发明采用一种基于阈值的判断方法来增强该插件的稳定性。如图3所示,当弹性伸缩控制器调用更新决策接口时,不断被训练的决策模型将会根据当前数据流的状态做出决策,即预测计算节点所需的实例数量。由于该插件采用的贝叶斯线性回归的预测结果为分布,依据输出的分布信息即可以计算不确定度,并使用预测结果的不确定度来判断模型对准确预测的把握大小。当模型输出的不确定度小于或等于阈值η时,决策器将认为该预测是准确的,进而使用该预测实例数量代替伸缩控制器给出的决策中对应节点的实例数量;当输出的不确定度大于阈值η时,决策器将认为准确预测的把握不大,因此将忽略该预测结果,直接使用伸缩控制器的实例数量。将各节点所对应的实例数量称为一个决策。经过以上步骤,决策器可以生成最终的推荐决策,并返回给伸缩控制器。
该决策器策略,将在样本不足时暂时使用伸缩控制器给出的决策,在模型逐渐准确后,直接给出精准的资源伸缩决策。现有的伸缩控制器在集成该插件后,可以从该插件得到逐渐准确的决策,并最终实现一个决策即可完成资源伸缩操作。
伸缩操作样本库记录了数据流在每种状态下所对应的最优决策,即在不同的数据负载下,数据流的每个计算节点所需的实例数量。本发明插件为了完成对不同状态下的最优决策的学习,首先定义了如何表示数据流状态。当数据流的某个节点计算资源不足时,将会产生反压现象,此时数据将会堆积在上游计算节点的输出队列。通过监控数据流各个计算节点的输出队列以及当前数据流的吞吐量,来衡量当前数据流的处理数据的负载状态。弹性伸缩控制器通常需要多个决策完成资源的伸缩,以降低数据流的 压力。而该伸缩操作将通过接口存储至插件的伸缩操作样本库,用来学习最优决策。
何时进行资源伸缩操作,依赖于弹性伸缩控制器的实现。判断是否需要进行资源伸缩操作的常见方法包括比较数据输入流量和数据流应用吞吐量差异、监测数据处理延迟变化等。当弹性伸缩控制器判断需要进行伸缩操作时,将根据其伸缩策略的得到决策D。伸缩控制器生成决策的常见方法包括基于规则的资源重分配策略,或基于排队论模型的资源使用量建模。本插件将伸缩控制策略视为黑盒,不关注其通过何种方式获得决策D。由于资源伸缩控制器通常需要多个决策才能达到收敛,因此本发明关注如何更准确的一步达到最优决策(即决策满足收敛条件,如数据流吞吐量与输入流量匹配)。如图4所示,控制器生成的决策在执行之前,被调用更新决策接口,发送至本插件,供插件生成推荐决策。在获取由插件的决策模型并结合决策器策略得到的推荐决策后,弹性伸缩控制器将执行该推荐决策,以完成当前步骤的资源伸缩,并观察该决策是否收敛。收敛结果即该推荐决策的质量结果将通过调用推荐决策质量反馈接口发送至插件,当该推荐决策收敛时,确定该推荐决策为当前状态的最优决策。收敛方式的判定与伸缩控制器的实现有关。例如,吞吐量与输入数据流量差距小于某个阈值,或控制器做出多次决策保持不变等等。
随着数据流负载的变化,弹性伸缩控制器执行的最优决策将逐渐的被存储至伸缩操作样本库。样本包含了在特定的数据流状态下,计算节点对应的最优实例数量。
基于上述架构图,本发明还提供一种分布式数据流资源弹性伸缩增强插件的增强方法,图5为本发明分布式数据流资源弹性伸缩增强插件的增强方法的流程示意图。如图5所示,本发明分布式数据流资源弹性伸缩增强插件的增强方法包括以下步骤:
步骤100:获取当前数据流。
步骤200:根据当前数据流,采用决策模型基于伸缩操作样本库生成预测决策。首先,基于伸缩操作样本库对所述决策模型进行训练,得到训练好的决策模型;然后,采用训练好的决策模型对当前数据流进行预测, 生成预测决策。随着数据流负载的变化,弹性伸缩控制器的最优决策将逐渐的被存储至伸缩操作样本库,其中的样本包含了在特定的数据流状态下,计算节点对应的最优实例数量。在插件收集到的决策信息较少时,不足以训练出准确的决策模型,当插件收集到的决策信息足够时,本插件使用机器学习方法,生成决策模型,可以完成对最优实例数量的预测。
步骤300:获取当前伸缩控制器生成的决策。
步骤400:根据预测决策,基于决策器确定推荐决策。推荐决策为预测决策或当前伸缩控制器生成的决策。当决策模型可以以较低的不确定度进行预测时,即决策模型生成的预测决策的不确定度小于或等于阈值时,插件将决策模型生成的预测决策返回,作为推荐决策;当决策模型生成的预测决策的不确定度大于阈值时,插件将伸缩控制器生成的决策返回,作为推荐决策。
步骤500:基于推荐决策采用伸缩控制器对当前数据流进行伸缩操作。在获取由插件的决策模型并结合决策器策略得到的推荐决策后,伸缩控制器将执行该推荐决策,以完成当前步骤的资源伸缩。
当伸缩控制器完成伸缩操作后,通过判断所述推荐决策是否满足收敛条件,确定所述伸缩操作对应的推荐决策是否为最优,即确定所述伸缩操作对应的推荐决策的决策质量;
当所述推荐决策满足收敛条件时,确定所述伸缩操作对应的推荐决策最优,将所述推荐决策作为最优决策存储至所述伸缩操作样本库。
本说明书中各个实施例采用递进的方式描述,每个实施例重点说明的都是与其他实施例的不同之处,各个实施例之间相同相似部分互相参见即可。
本文中应用了具体个例对本发明的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本发明的方法及其核心思想;同时,对于本领域的一般技术人员,依据本发明的思想,在具体实施方式及应用范围上均会有改变之处。综上所述,本说明书内容不应理解为对本发明的限制。

Claims (9)

  1. 一种分布式数据流资源弹性伸缩增强插件,其特征在于,所述插件与用于分布式数据流资源弹性伸缩的伸缩控制器连接;所述插件包括:决策器、决策模型和伸缩操作样本库;
    所述伸缩控制器通过第一接口将数据流注册到所述插件;所述伸缩控制器通过第二接口将每个状态下资源伸缩的最优决策发送至所述插件,所述最优决策为使得当前状态下数据流资源量适配当前输入数据量的资源分配决策;
    所述伸缩操作样本库用于记录每个状态资源伸缩的最优决策;所述决策模型用于根据所述伸缩操作样本库记录的最优决策对接收到的数据流进行预测,生成预测决策;所述决策模型为机器学习模型;所述决策器用于根据所述预测决策确定推荐决策,所述推荐决策为所述预测决策或当前伸缩控制器生成的决策;所述决策器通过所述第二接口将所述推荐决策返回至所述伸缩控制器;
    所述伸缩控制器根据所述推荐决策对当前数据流进行伸缩操作。
  2. 根据权利要求1所述的分布式数据流资源弹性伸缩增强插件,其特征在于,所述插件通过HTTP接口与所述伸缩控制器连接。
  3. 根据权利要求2所述的分布式数据流资源弹性伸缩增强插件,其特征在于,所述伸缩控制器还用于完成伸缩操作后,确定所述伸缩操作对应的推荐决策的决策质量,并将所述决策质量通过第三接口反馈至所述插件;所述推荐决策的决策质量为所述推荐决策是否为最优,当所述推荐决策为最优时,所述插件将所述推荐决策作为最优决策存储至所述伸缩操作样本库;
    所述第一接口、所述第二接口和所述第三接口均为HTTP接口。
  4. 根据权利要求1所述的分布式数据流资源弹性伸缩增强插件,其特征在于,所述决策器用于根据所述预测决策的不确定度确定推荐决策;当所述预测决策的不确定度大于阈值时,将所述伸缩控制器生成的决策确定为推荐决策;当所述预测决策的不确定度不大于阈值时,将所述预测决策确定为推荐决策。
  5. 一种分布式数据流资源弹性伸缩增强插件的增强方法,其特征在 于,所述分布式数据流资源弹性伸缩增强插件的增强方法应用于权利要求1-4任一项所述的分布式数据流资源弹性伸缩增强插件,所述分布式数据流资源弹性伸缩增强插件的增强方法包括:
    获取当前数据流;
    根据当前数据流,采用决策模型基于伸缩操作样本库生成预测决策;
    获取当前伸缩控制器生成的决策;
    根据所述预测决策,基于决策器确定推荐决策;所述推荐决策为所述预测决策或当前伸缩控制器生成的决策;
    基于所述推荐决策采用所述伸缩控制器对当前数据流进行伸缩操作。
  6. 根据权利要求5所述的分布式数据流资源弹性伸缩增强插件的增强方法,其特征在于,所述根据当前数据流,采用决策模型基于伸缩操作样本库生成预测决策,具体包括:
    基于所述伸缩操作样本库对所述决策模型进行训练,得到训练好的决策模型;
    采用训练好的决策模型对当前数据流进行预测,生成预测决策。
  7. 根据权利要求5所述的分布式数据流资源弹性伸缩增强插件的增强方法,其特征在于,所述根据所述预测决策,基于决策器确定推荐决策,具体包括:
    基于所述决策器,判断所述预测决策的不确定度是否大于阈值;
    当所述预测决策的不确定度大于阈值时,将所述伸缩控制器生成的决策确定为推荐决策;
    当所述预测决策的不确定度不大于阈值时,将所述预测决策确定为推荐决策。
  8. 根据权利要求5所述的分布式数据流资源弹性伸缩增强插件的增强方法,其特征在于,所述基于所述推荐决策采用所述伸缩控制器对当前数据流进行伸缩操作,之后还包括:
    当伸缩控制器完成伸缩操作后,确定所述伸缩操作对应的推荐决策的决策质量;所述推荐决策的决策质量为所述推荐决策是否为最优;
    当所述推荐决策为最优时,将所述推荐决策作为最优决策存储至所述伸缩操作样本库。
  9. 根据权利要求5所述的分布式数据流资源弹性伸缩增强插件的增强方法,其特征在于,所述当伸缩控制器完成伸缩操作后,确定所述伸缩操作对应的推荐决策的决策质量,具体包括:
    通过判断所述推荐决策是否满足收敛条件,确定所述伸缩操作对应的推荐决策是否为最优;当所述推荐决策满足收敛条件时,确定所述伸缩操作对应的推荐决策最优;当所述推荐决策不满足收敛条件时,确定所述伸缩操作对应的推荐决策不是最优。
PCT/CN2021/124859 2020-12-10 2021-10-20 一种分布式数据流资源弹性伸缩增强插件及增强方法 WO2022121519A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US17/791,484 US11853801B2 (en) 2020-12-10 2021-10-20 Plug-in for enhancing resource elastic scaling of distributed data flow and method for enhancing plug-in for enhancing resource elastic scaling of distributed data flow

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202011434620.8A CN112416602B (zh) 2020-12-10 2020-12-10 一种分布式数据流资源弹性伸缩增强插件及增强方法
CN202011434620.8 2020-12-10

Publications (1)

Publication Number Publication Date
WO2022121519A1 true WO2022121519A1 (zh) 2022-06-16

Family

ID=74775464

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/124859 WO2022121519A1 (zh) 2020-12-10 2021-10-20 一种分布式数据流资源弹性伸缩增强插件及增强方法

Country Status (3)

Country Link
US (1) US11853801B2 (zh)
CN (1) CN112416602B (zh)
WO (1) WO2022121519A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116089477A (zh) * 2023-04-10 2023-05-09 荣耀终端有限公司 分布式训练方法及系统

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112416602B (zh) * 2020-12-10 2022-09-16 清华大学 一种分布式数据流资源弹性伸缩增强插件及增强方法
CN113393399A (zh) * 2021-06-22 2021-09-14 武汉云漫文化传媒有限公司 一种用于Maya的颜色指定增强插件及其颜色增强方法

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106534307A (zh) * 2016-11-14 2017-03-22 青岛银行股份有限公司 基于负载均衡动态配置插件的云环境弹性负载均衡方法
US20180205616A1 (en) * 2017-01-18 2018-07-19 International Business Machines Corporation Intelligent orchestration and flexible scale using containers for application deployment and elastic service
CN110941489A (zh) * 2018-09-21 2020-03-31 北京京东尚科信息技术有限公司 流处理引擎的伸缩方法和装置
CN111130834A (zh) * 2018-11-01 2020-05-08 大唐移动通信设备有限公司 一种网络弹性策略的处理方法及装置
CN112416602A (zh) * 2020-12-10 2021-02-26 清华大学 一种分布式数据流资源弹性伸缩增强插件及增强方法

Family Cites Families (26)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7889651B2 (en) * 2007-06-06 2011-02-15 International Business Machines Corporation Distributed joint admission control and dynamic resource allocation in stream processing networks
US8799916B2 (en) * 2011-02-02 2014-08-05 Hewlett-Packard Development Company, L. P. Determining an allocation of resources for a job
US9262216B2 (en) * 2012-02-14 2016-02-16 Microsoft Technologies Licensing, LLC Computing cluster with latency control
US11488040B2 (en) * 2014-05-22 2022-11-01 The Bank Of New York Mellon System and methods for prediction communication performance in networked systems
DE112015002433T5 (de) * 2014-05-23 2017-03-23 Datarobot Systeme und Techniken zur prädikativen Datenanalytik
US9848041B2 (en) * 2015-05-01 2017-12-19 Amazon Technologies, Inc. Automatic scaling of resource instance groups within compute clusters
US10481948B2 (en) * 2015-08-25 2019-11-19 Box, Inc. Data transfer in a collaborative file sharing system
US10509683B2 (en) * 2015-09-25 2019-12-17 Microsoft Technology Licensing, Llc Modeling resource usage for a job
US10419530B2 (en) * 2015-11-02 2019-09-17 Telefonaktiebolaget Lm Ericsson (Publ) System and methods for intelligent service function placement and autoscale based on machine learning
US11036552B2 (en) * 2016-10-25 2021-06-15 International Business Machines Corporation Cognitive scheduler
US20180255122A1 (en) * 2017-03-02 2018-09-06 Futurewei Technologies, Inc. Learning-based resource management in a data center cloud architecture
CN107580023B (zh) * 2017-08-04 2020-05-12 山东大学 一种动态调整任务分配的流处理作业调度方法及系统
US20190138974A1 (en) * 2017-09-18 2019-05-09 Sunago Systems, Inc. Systems and devices for parcel transportation management
CA3026684A1 (en) * 2017-12-06 2019-06-06 Mehdi MERAI System and method for automatically improving gathering of data using a data gathering device
US11645277B2 (en) * 2017-12-11 2023-05-09 Google Llc Generating and/or utilizing a machine learning model in response to a search request
US10719363B2 (en) * 2018-01-22 2020-07-21 Vmware, Inc. Resource claim optimization for containers
US10628762B2 (en) * 2018-04-09 2020-04-21 Microsoft Technology Licensing, Llc Learning power grid characteristics to anticipate load
EP3857821A1 (en) * 2018-09-26 2021-08-04 Telefonaktiebolaget LM Ericsson (publ) Method, first agent and computer program product controlling computing resources in a cloud network for enabling a machine learning operation
US20200320428A1 (en) * 2019-04-08 2020-10-08 International Business Machines Corporation Fairness improvement through reinforcement learning
KR20190110073A (ko) * 2019-09-09 2019-09-27 엘지전자 주식회사 인공 지능 모델을 갱신하는 인공 지능 장치 및 그 방법
US11182216B2 (en) * 2019-10-09 2021-11-23 Adobe Inc. Auto-scaling cloud-based computing clusters dynamically using multiple scaling decision makers
US11567495B2 (en) * 2019-12-11 2023-01-31 Toyota Motor Engineering & Manufacturing North America, Inc. Methods and systems for selecting machine learning models to predict distributed computing resources
US11676039B2 (en) * 2020-02-21 2023-06-13 International Business Machines Corporation Optimal interpretable decision trees using integer linear programming techniques
CN111880899B (zh) * 2020-07-27 2022-08-16 山东迪特智联信息科技有限责任公司 一种基于云原生架构的大数据流处理方法和装置
CN112000473A (zh) * 2020-08-12 2020-11-27 中国银联股份有限公司 深度学习模型的分布式训练方法以及装置
US11687380B2 (en) * 2020-09-10 2023-06-27 International Business Machines Corporation Optimizing resource allocation for distributed stream processing systems

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106534307A (zh) * 2016-11-14 2017-03-22 青岛银行股份有限公司 基于负载均衡动态配置插件的云环境弹性负载均衡方法
US20180205616A1 (en) * 2017-01-18 2018-07-19 International Business Machines Corporation Intelligent orchestration and flexible scale using containers for application deployment and elastic service
CN110941489A (zh) * 2018-09-21 2020-03-31 北京京东尚科信息技术有限公司 流处理引擎的伸缩方法和装置
CN111130834A (zh) * 2018-11-01 2020-05-08 大唐移动通信设备有限公司 一种网络弹性策略的处理方法及装置
CN112416602A (zh) * 2020-12-10 2021-02-26 清华大学 一种分布式数据流资源弹性伸缩增强插件及增强方法

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116089477A (zh) * 2023-04-10 2023-05-09 荣耀终端有限公司 分布式训练方法及系统
CN116089477B (zh) * 2023-04-10 2023-08-08 荣耀终端有限公司 分布式训练方法及系统

Also Published As

Publication number Publication date
US20230129969A1 (en) 2023-04-27
US11853801B2 (en) 2023-12-26
CN112416602A (zh) 2021-02-26
CN112416602B (zh) 2022-09-16

Similar Documents

Publication Publication Date Title
WO2022121519A1 (zh) 一种分布式数据流资源弹性伸缩增强插件及增强方法
CN108259367B (zh) 一种基于软件定义网络的服务感知的流策略定制方法
JP7094352B2 (ja) タスク並列処理の実現方法、装置、機器及び媒体
WO2016107180A1 (zh) 网络数据流类型检测方法及装置
CN110198339B (zh) 一种基于QoE感知的边缘计算任务调度方法
CN104901989B (zh) 一种现场服务提供系统及方法
US7941387B2 (en) Method and system for predicting resource usage of reusable stream processing elements
WO2022121518A1 (zh) 一种分布式计算作业的参数配置优化方法及系统
US20180248768A1 (en) Identification of mutual influence between cloud network entities
CN103986766A (zh) 自适应负载均衡作业任务调度方法及装置
KR102027303B1 (ko) 분산 클라우드 환경에서 퍼지값 재조정에 따른 마이그레이션 시스템 및 방법
CN108476175B (zh) 使用对偶变量的传送sdn流量工程方法与系统
US20140258382A1 (en) Application congestion control
CN105700955A (zh) 服务器系统的资源分配方法
JP2017530482A (ja) 計算リソースの新たな構成を決定するための構成方法、機器、システム及びコンピュータ可読媒体
CN111291883A (zh) 数据处理方法及数据处理装置
CN103825963B (zh) 虚拟服务迁移方法
CN111400045B (zh) 一种负载均衡方法及装置
US10216606B1 (en) Data center management systems and methods for compute density efficiency measurements
US11095570B2 (en) Scaling a number of segments in a stream of data based on communication load and trend direction
CN115185683A (zh) 一种基于动态优化模型的云平台流处理资源分配方法
TWI546681B (zh) 伺服器系統的資源分配方法
CN109213778B (zh) 一种流数据滑动窗口聚集查询方法
CN114565201A (zh) 数据中心的管理系统、方法、装置和存储介质
JP7371686B2 (ja) 要求通信品質推定装置、要求通信品質推定方法、及びプログラム

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21902226

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21902226

Country of ref document: EP

Kind code of ref document: A1