WO2021253239A1 - 云服务系统的资源配置确定方法及装置 - Google Patents
云服务系统的资源配置确定方法及装置 Download PDFInfo
- Publication number
- WO2021253239A1 WO2021253239A1 PCT/CN2020/096398 CN2020096398W WO2021253239A1 WO 2021253239 A1 WO2021253239 A1 WO 2021253239A1 CN 2020096398 W CN2020096398 W CN 2020096398W WO 2021253239 A1 WO2021253239 A1 WO 2021253239A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- model
- service
- resource configuration
- performance
- cloud
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 85
- 238000004088 simulation Methods 0.000 claims abstract description 153
- 208000018910 keratinopathic ichthyosis Diseases 0.000 claims abstract description 58
- 238000012549 training Methods 0.000 claims abstract description 29
- 230000002452 interceptive effect Effects 0.000 claims description 49
- 238000012360 testing method Methods 0.000 claims description 45
- 230000000007 visual effect Effects 0.000 claims description 34
- 238000012795 verification Methods 0.000 claims description 27
- 230000004044 response Effects 0.000 claims description 26
- 230000008569 process Effects 0.000 claims description 23
- 238000012545 processing Methods 0.000 claims description 11
- 238000003860 storage Methods 0.000 claims description 7
- 238000004590 computer program Methods 0.000 claims description 2
- 238000010200 validation analysis Methods 0.000 claims 1
- 238000010586 diagram Methods 0.000 description 23
- 238000013468 resource allocation Methods 0.000 description 14
- 230000006870 function Effects 0.000 description 10
- 238000004519 manufacturing process Methods 0.000 description 7
- 230000006399 behavior Effects 0.000 description 5
- 238000012986 modification Methods 0.000 description 5
- 230000004048 modification Effects 0.000 description 5
- 238000012544 monitoring process Methods 0.000 description 5
- 238000005457 optimization Methods 0.000 description 5
- 238000004891 communication Methods 0.000 description 4
- 238000011161 development Methods 0.000 description 3
- 230000007246 mechanism Effects 0.000 description 3
- 238000004458 analytical method Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000011156 evaluation Methods 0.000 description 2
- 230000003993 interaction Effects 0.000 description 2
- 238000011056 performance test Methods 0.000 description 2
- 230000005477 standard model Effects 0.000 description 2
- 230000003247 decreasing effect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000001960 triggered effect Effects 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/50—Network service management, e.g. ensuring proper service fulfilment according to agreements
- H04L41/5003—Managing SLA; Interaction between SLA and QoS
- H04L41/5009—Determining service level performance parameters or violations of service level contracts, e.g. violations of agreed response time or mean time between failures [MTBF]
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/08—Configuration management of networks or network elements
- H04L41/0803—Configuration setting
- H04L41/0806—Configuration setting for initial configuration or provisioning, e.g. plug-and-play
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/08—Configuration management of networks or network elements
- H04L41/0803—Configuration setting
- H04L41/0823—Configuration setting characterised by the purposes of a change of settings, e.g. optimising configuration for enhancing reliability
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/14—Network analysis or design
- H04L41/145—Network analysis or design involving simulating, designing, planning or modelling of a network
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/50—Network service management, e.g. ensuring proper service fulfilment according to agreements
- H04L41/5003—Managing SLA; Interaction between SLA and QoS
- H04L41/5019—Ensuring fulfilment of SLA
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/08—Configuration management of networks or network elements
- H04L41/0896—Bandwidth or capacity management, i.e. automatically increasing or decreasing capacities
- H04L41/0897—Bandwidth or capacity management, i.e. automatically increasing or decreasing capacities by horizontal or vertical scaling of resources, or by migrating entities, e.g. virtual resources or entities
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/22—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks comprising specially adapted graphical user interfaces [GUI]
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L43/00—Arrangements for monitoring or testing data switching networks
- H04L43/50—Testing arrangements
Definitions
- the present disclosure generally relates to the field of cloud services, and particularly relates to a method and device for determining resource configuration of a cloud service system.
- cloud providers have provided a wealth of cloud services, which will help service providers or service providers to run their business systems or service systems on the cloud, covering the entire life cycle of the business system or service system.
- service providers or service providers need to know how to make full use of cloud services with better resource allocation, especially for those business systems or service systems that require a large amount of cloud services and infrastructure resources.
- the performance modeling method is a method widely used in system evaluation in different fields, and has been applied to the optimization of resource allocation of cloud service systems.
- performance modeling and simulation require in-depth understanding of computer systems and mathematical or statistical skills.
- the creation of the performance model is time-consuming and error-prone, especially for complex systems with interactive services.
- most performance modeling methods can only be applied to the resource allocation optimization of steady-state systems, but not to dynamic or elastic systems.
- the present disclosure provides a method and device for determining the resource configuration of a cloud service system using a system simulation tool.
- the process of determining the resource configuration of the cloud service system can be simplified and more convenient.
- a method for determining the resource configuration of a cloud service system using a system simulation tool includes a model library, and the model library includes a basic hardware model set and a cloud infrastructure model set. And a set of basic software models, the method includes: obtaining the required model of each service of the cloud service system from the model library; obtaining the resource limit setting of each required model; creating the service work between the obtained various models Flow to obtain the system simulation model of the cloud service system.
- the service workflow indicates the interactive workflow between the models and the execution sequence of the models within a single service; training data is used to train each of the system simulation models
- the service performance model of the service includes part or all of the test data collected by the cloud service system in the test environment, and the service performance model defines the performance of a single request running on the service in the system simulation model Resource consumption; use the service performance model and the system simulation model to perform system simulation under a given resource configuration set and the resource limit setting to obtain system performance KPIs; and maximize the system performance KPIs in the resource configuration set
- the optimal resource configuration is determined as the resource configuration of the cloud service system (30).
- a system simulation tool including a model library, a service performance model training module, and a discrete event simulation engine
- a standardized model in the model library is used to create a system simulation model
- a service performance model training module is used to test the cloud service system
- the test data collected in the environment automatically trains the model parameters of the system simulation model, and uses the system simulation model obtained above to perform model simulation to determine the optimal resource configuration, which can make the process of determining the resource configuration of the cloud service system more simplified and convenient.
- the cloud service system includes an interactive elastic cloud service system
- the cloud infrastructure model set may include an auto-scaling service model.
- the method further includes: obtaining the target performance KPI of the system simulation model; Determining the resource configuration with the best performance KPI as the resource configuration of the cloud service system includes: determining the resource configuration with the best system performance KPI in the resource configuration set after removal processing as the resource configuration of the cloud service system.
- the determined resource configuration can be adapted to various application scenarios or application requirements of the cloud service system.
- the system simulation tool may have a visual operation interface, and the visual operation interface includes a model display area.
- Obtaining the required model of each service of the cloud service system from the model library may include: obtaining the selected model in response to a model selection operation for the model library in the model display area.
- the required models of each service of the cloud service system can be obtained through model selection operations, which can make the system simulation process more intuitive and concise.
- the system simulation tool has a visual operation interface, and the visual operation interface includes a model acquisition area.
- Obtaining the required model of each service of the cloud service system from the model library may include: in response to inputting the service identification information or service configuration information of each service of the cloud service system in the model acquisition area, downloading from all services Obtain the required models for each service in the model library.
- the model acquisition can be greatly reduced. Process workload and error rate, thereby improving the accuracy and efficiency of system simulation.
- the visual operation interface may further include a model editing area.
- the method may further include: presenting each acquired model in the model editing area.
- Creating a service workflow between the acquired models to obtain the system simulation model of the cloud service system may include: responding to link operations between corresponding endpoints of the acquired models in the model editing area , To create a service workflow between each model.
- the system simulation process can be more intuitive and simplified.
- creating a service workflow between the acquired models to obtain a system simulation model of the cloud service system may include: creating the acquired service workflow according to the operation process information of each service The service workflow between the various models in order to obtain the system simulation model of the cloud service system.
- the background module of the system simulation tool automatically creates the service workflow between the models according to the operation process information of each service, without the user's manual link processing, which can greatly reduce the workload and errors of the service workflow creation process Therefore, the accuracy and efficiency of system simulation can be improved.
- the model library further includes a service model set, and the service model set includes a standard service workflow model, a standard service performance model, and/or a standard cloud service performance model, and the method It also includes: in response to the standard services included in the various services of the cloud service system, the corresponding standard service workflow model, standard service performance model, and/or standard cloud service performance model are collected from the service models.
- the corresponding standard service workflow model, standard service performance model, and/or standard cloud service performance model are collected from the service models.
- Model and service performance module without re-simulating the model for the service, which can simplify the system simulation process and improve the accuracy and efficiency of the system simulation.
- the test data includes verification data, before using the service performance model and the system simulation model to perform system simulation under a given resource configuration set to obtain the system performance KPI
- the method further includes: using the verification data to perform model verification, wherein, when the model verification fails, the process of creating the system simulation model and the service performance model is executed again.
- the system simulation model verification can be performed by using test data to ensure the model simulation accuracy of the system simulation model.
- the method may further include: performing a performance test on the determined resource configuration. Using this method, it is possible to ensure that the determined resource configuration has the desired performance, thereby preventing inappropriate resource configuration from being provided to the cloud service system in the production environment to execute the production plan.
- an apparatus for determining the resource configuration of a cloud service system is applied to a system simulation tool, the system simulation tool includes a model library, and the model library includes a basic hardware model set A cloud infrastructure model set and a basic software model set, the device includes: a model acquisition unit configured to acquire the required models of each service of the cloud service system from the model library; a resource limit setting acquisition unit, configured To obtain the resource limit settings of each required model; the service workflow creation unit is configured to create a service workflow between the acquired models to obtain a system simulation model of the cloud service system, and the service workflow Indicate the interactive workflow between the various models and the execution order of the models within a single service; the service performance model training unit is configured to use training data to train the service performance model of each service in the system simulation model, the training data Including some or all of the test data collected by the cloud service system in the test environment, the service performance model defines the resource consumption of running a single request on the service in the system simulation model; the system simulation unit is configured
- the device may further include: a target performance KPI acquisition unit configured to acquire the target performance KPI of the system simulation model; and a resource configuration removal unit configured to obtain the target performance KPI from the system simulation model;
- the resource configuration centrally removes resource configurations in which the system performance KPI exceeds the target performance KPI, wherein the resource configuration determining unit is configured to determine the resource configuration with the best system performance KPI in the resource configuration set after removal processing as the cloud The resource allocation of the service system.
- the system simulation tool has a visual operation interface, and the visual operation interface includes a model display area.
- the model acquisition unit is configured to acquire the selected model in response to a model selection operation for the model library in the model display area.
- the system simulation tool has a visual operation interface, and the visual operation interface includes a model acquisition area.
- the model obtaining unit is configured to obtain the required model of each service from the model library in response to inputting service identification information or service configuration information of each service of the cloud service system in the model obtaining area.
- the visual operation interface includes a model editing area
- the device further includes: a model presentation unit configured to present the acquired model in the model editing area
- the service workflow creation unit is configured to create a service workflow between each model in response to a link operation between corresponding endpoints of each acquired model in the model editing area.
- the service workflow creation unit is configured to create the acquired service workflow between the various models according to the operation process information of each service.
- the model library further includes a service model set
- the service model set includes a standard service workflow model, a standard service performance model, and/or a standard cloud service performance model.
- the acquiring unit is further configured to respond to the standard services included in the various services of the cloud service system, and acquire the corresponding standard service workflow model, standard service performance model and/or standard cloud service performance model from the service model set.
- the test data includes verification data
- the device may further include: a model verification unit configured to use the service performance model and the system simulation model to provide Before performing system simulation under a certain resource configuration set to obtain system performance KPIs, use the verification data to perform model verification. Operations of the resource restriction setting acquisition unit, the service workflow creation unit, and the service performance model training unit.
- a computing device including: at least one processor; and a memory coupled with the at least one processor, configured to store instructions, when the instructions are used by the at least one processor During execution, the at least one processor is caused to execute the method for determining resource configuration as described above.
- a machine-readable storage medium that stores executable instructions that, when executed, cause the machine to execute the method for determining resource configuration as described above.
- a computer program product that is tangibly stored on a computer-readable medium and includes computer-executable instructions that, when executed, cause at least one The processor executes the resource configuration determination method as described above.
- Figure 1 shows an example schematic diagram of an interactive elastic cloud service system.
- Fig. 2 shows an exemplary schematic diagram of a resource configuration determination architecture for implementing resource configuration determination of an interactive elastic cloud service system according to an embodiment of the present disclosure.
- Fig. 3 shows an example schematic diagram of a model library of a system simulation tool according to an embodiment of the present disclosure.
- Fig. 4 shows an example schematic diagram of a processing mechanism of an automatic scaling service according to an embodiment of the present disclosure.
- Fig. 5 shows an example flowchart of a method for determining the resource configuration of an interactive elastic cloud service system according to an embodiment of the present disclosure.
- Fig. 6 shows a schematic diagram of an example of a visual operation interface of a system simulation tool according to an embodiment of the present disclosure.
- Fig. 7 shows a schematic diagram of another example of a visual operation interface of a system simulation tool according to an embodiment of the present disclosure.
- Fig. 8 shows a schematic diagram of another example of a visual operation interface of a system simulation tool according to an embodiment of the present disclosure.
- FIG. 9 shows an example schematic diagram of resource restriction setting according to an embodiment of the present disclosure.
- Fig. 10 shows a schematic diagram of a result after service workflow creation according to an embodiment of the present disclosure.
- FIG. 11 shows an example schematic diagram of simulation results under different resource configurations according to an embodiment of the present disclosure.
- Fig. 12 shows a block diagram of an apparatus for determining a resource configuration according to an embodiment of the present disclosure.
- FIG. 13 shows a schematic diagram of a computing device for implementing a process of determining a resource configuration of a cloud service system according to an embodiment of the present disclosure.
- the term “including” and its variations mean open terms, meaning “including but not limited to”.
- the term “based on” means “based at least in part on.”
- the terms “one embodiment” and “an embodiment” mean “at least one embodiment.”
- the term “another embodiment” means “at least one other embodiment.”
- the terms “first”, “second”, etc. may refer to different or the same objects. Other definitions can be included below, whether explicit or implicit. Unless clearly indicated in the context, the definition of a term is consistent throughout the specification.
- Cloud service refers to obtaining required services through the network in an on-demand and easily scalable manner. This kind of service can be IT, software and Internet-related services, but also other services. Cloud services can put the software, hardware, and data needed by the enterprise on the network, and use different IT equipment to connect to each other at any time and place to achieve data access, computing and other purposes. Examples of cloud services may include public cloud (Public Cloud) and private cloud (Private Cloud).
- FIG. 1 shows an example schematic diagram of an interactive elastic cloud service system 100.
- the interactive elastic cloud system 100 includes an interactive cloud service A 110-1, an interactive cloud service B 110-2, an interactive cloud service C 110-3, and an automatic scaling service 120.
- FIG. 1 is only an illustrative example of an interactive elastic cloud service system. In other examples, more or fewer interactive cloud services may be included.
- Interactive cloud services A 110-1, B 110-2, and C 110-3 provide different service capabilities, and each interactive cloud service A 110-1, B 110-2, and C 110-3 respectively call other clouds The service interface or through the network to send packets to other cloud services to achieve mutual interaction.
- the auto-scaling service (Auto-scaling Service) 120 may be provided by cloud providers such as AWS and Facebook Cloud, and used to implement the elastic characteristics of each interactive service A 110-1, B 110-2, and C110-3. With the auto-scaling service 120, the deployment of each interactive service A 110-1, B 110-2, and C 110-3 can be changed by satisfying certain conditions. For example, when the CPU utilization is high, the availability and performance of the service can be improved by increasing the number of virtual machines.
- the cloud service system needs to run on the cloud throughout its life cycle, including the development and operation of the cloud service system. In this case, you need to know how to make full use of cloud services with better resource allocation, especially those cloud service systems that require a large amount of cloud services and infrastructure resources.
- the performance modeling method is applied to the resource allocation optimization of cloud service system.
- performance modeling methods performance modeling and simulation require in-depth understanding of computer systems and mathematical or statistical skills.
- the creation of the performance model is time-consuming and error-prone, especially for complex systems with interactive services.
- most performance modeling methods can only be applied to the resource allocation optimization of steady-state systems, but not to dynamic or elastic systems.
- the present disclosure provides a solution for using a system simulation tool to determine the resource configuration of a cloud service system.
- the system simulation tool includes a model library, which includes a basic hardware model set, a cloud infrastructure model set, and a basic software model set.
- the standardized model in the model library is used to create a system simulation model
- the service performance model training module is used to automatically train the model parameters of the system simulation model using the test data collected by the cloud service system in the test environment.
- After obtaining the system simulation model as above use the obtained system simulation model to perform model simulation, thereby determining the optimal resource allocation. In this way, the process of determining the resource configuration of the cloud service system can be simplified and more convenient.
- Fig. 2 shows a schematic diagram of an example environment 1 for realizing resource configuration determination of an interactive elastic cloud service system according to an embodiment of the present disclosure.
- the example environment 1 includes an interactive elastic cloud system 10 in a test environment, a system simulation tool 20, and an interactive elastic cloud system 30 in a production environment.
- the interactive elastic cloud system 10 also includes a load generator 103.
- the structures and functions of the interactive service A 101-1, the interactive cloud service B 101-2, the interactive cloud service C 101-3, and the auto-scaling service 102 are the same as those of the corresponding components shown in FIG. 1.
- the load generator 103 is configured to simulate concurrent users in a production environment. During the test, the load generator 103 may, for example, generate different concurrent requests or different request content, and a ramp-up/ramp-down strategy.
- Benchmark tests can be performed on interactive service A 101-1, interactive cloud service B 101-2, and interactive cloud service C 101-3 under different loads, thereby obtaining test data 40.
- benchmark tests can be performed with 500/1000/1500 concurrent users or threads.
- the test data 40 may include system resource status and service performance data.
- system resource status may include, but are not limited to, CPU utilization and network input/output per second.
- service performance data may include, but are not limited to, QPS, response time, etc.
- the test data 40 can be collected using some available system monitoring tools or services.
- system monitoring tools or services may include, but are not limited to, cloud monitoring services provided by cloud providers, such as CloudWatch of AWS, CloudMonitor of Facebook Cloud, etc., and OSS service monitoring tools, such as SkyWalking, Zipkin, etc.
- cloud monitoring tools or services collect and store test data, and provide API interfaces for data query/read for other components (for example, the system simulation tool 20) to perform data query/read.
- the system simulation tool 20 includes a model library 201 and a resource configuration determining device 202.
- the model library 201 includes a plurality of model sets, and the models in each model set may be standardized models. The models in each model set can be combined to get more complex models. Each model can be reused for different simulation schemes, or configured for different simulation schemes. In addition, each model can be replaced with a more suitable model.
- FIG. 3 shows an example schematic diagram of a model library 300 of a system simulation tool according to an embodiment of the present disclosure.
- the model library 300 may include a basic hardware model set 310, a cloud infrastructure model set 320, a basic software model set 330, and a service model set 340.
- the basic hardware model set 310, the cloud infrastructure model set 320, the basic software model set 330, and the service model set 340 may be arranged in the model library 300 in a layered manner.
- the basic hardware model set 310 includes a resource model of computer hardware.
- the computer hardware may include, for example, a CPU 311, a bandwidth (Bandwidth) 312, a disk (Disk) 313, and a memory (Memory) 314.
- the basic hardware model can be used to create more complex models and can be extended. For example, when different CPUs have different scheduling strategies, multiple CPU scheduling models can be extended to support different hardware provided by cloud providers.
- the basic hardware model is the basis of each service model.
- the basic hardware model can simulate the behavior of the hardware, including how resources are consumed in a concurrent state, how long it will take to use certain resources, and so on. Therefore, the model accuracy of the basic hardware model is a key factor that affects the final system simulation results.
- the cloud infrastructure model set 320 may include a basic service model provided by a cloud provider, which is a resource for running interactive services and a target of resource optimization. Since interactive elastic services run on the resources of the cloud infrastructure, resource constraints are important for modeling and evaluation. Examples of cloud infrastructure may include, but are not limited to, virtual machines (VM), containers (Container), and auto-scaling services (Auto-scaling Service), for example. Each cloud infrastructure can be modeled and configurable.
- the cloud infrastructure model set 320 may include a virtual machine (VM) model 321, a container (Container) model 322, and an auto-scaling service (Auto-scaling Service) model 323. Each cloud infrastructure model can be integrated with other models for use.
- Cloud providers provide automatic scaling services for virtual machine users, so that virtual machine users can make full use of their virtual machines. In some situations, some virtual machines are idle and unused for a long time. If the user releases these virtual machines, the service may not maintain the load when the load of the service suddenly rises. Auto-scaling services can be used to achieve the elastic characteristics of user services.
- the auto-scaling model simulates the behavior of the auto-scaling service and provides the following functions: (1) Auto-scaling strategy configuration function, for example, configure KPI and threshold; (2) Automatic check function, for example, automatically detect whether the configured KPI reaches the specified value Threshold; (3) Scaling behavior simulation function; (4) Resource parameter modification, for example, modifying the number of virtual machines served.
- Auto-scaling strategy configuration function for example, configure KPI and threshold
- Automatic check function for example, automatically detect whether the configured KPI reaches the specified value Threshold
- Scaling behavior simulation function for example, modifying the number of virtual machines served.
- Fig. 4 shows an example schematic diagram of a processing mechanism of an automatic scaling service according to an embodiment of the present disclosure.
- the user can define the conditions under which the virtual machine should be increased or decreased. For example, if the CPU utilization reaches 80%, one virtual machine should be added to improve the availability and performance of the service.
- the basic software model layer 330 may include an event generator model 331, an event processor model 332, an event transmitter model 333, and a resource pool model 334.
- Event is the basic concept of discrete event simulation.
- An event is an object that includes attributes describing the event, and can be sent from one service to another service.
- the event generator 331 may be the opposite end of the load generator 103 in the test environment.
- the event processor 332 can configure the logic and delay of the service. With the event transmitter 333, events can be sent from one service to another service.
- the resource pool model 334 is used to simulate the resource consumption of the software and the resource restrictions in the software, such as latches, connection restrictions, token restrictions, and so on.
- the service model set 340 may include a standard service workflow model 341, a standard service performance model 342, and a standard cloud service performance model 343.
- the service workflow model is a locally created service workflow model.
- the service performance model is a service performance model established for locally created services and is used for service performance analysis.
- the service performance model may, for example, output the resource consumption of running a single request on the service under different input loads.
- the cloud service performance model is a service performance model established for the cloud service provided by the cloud provider, and is provided to perform service performance analysis on the cloud service provided by the cloud provider.
- the cloud service performance model may, for example, output the response time of the service under different input loads.
- the term "standard model” means that because the model is a common model and adapts to many application scenarios, it is modularized into a standard model for modular use.
- the resource configuration determining device 202 is configured to use the model library to create a system simulation model, and use the test data collected by the cloud service system in the test environment to automatically train the service performance model of the system simulation model, and the service performance model is integrated with the system simulation model. . After obtaining the system simulation model as described above, the resource configuration determining device 202 uses the obtained system simulation model to perform model simulation, and determines the optimal resource configuration 50 according to the model simulation result.
- the resource configuration determining device 202 After the resource configuration is determined, the resource configuration determining device 202 provides the determined resource configuration 50 to the interactive elastic cloud service system 30 in the production environment to execute the production plan.
- FIG. 5 shows an example flowchart of a method 500 for determining the resource configuration of an interactive elastic cloud service system according to an embodiment of the present disclosure. The method is executed by the resource configuration determining device 202.
- the required models of each service of the interactive elastic cloud service system are obtained from the model library.
- the model acquisition process may be initiated in response to a resource configuration determination request, and the resource configuration determination request may include service configuration information of the interactive elastic cloud service system or service identification information of each service.
- the resource configuration determining device 202 can automatically obtain the required model of each service from the model library according to the service configuration information of the interactive elastic cloud service system or the service identification information of each service.
- system simulation tool 20 may have a visual operation interface, such as the visual operation interfaces 600, 700, and 800 shown in FIGS. 6, 7 and 8.
- the visual operation interface 600 shown in FIG. 6 includes a model display area 620 and a model editing area 630.
- the basic hardware model set 611, the cloud infrastructure model set 613, the basic software model set 615, and the service model set 617 are displayed in the model display area 620 in a layered manner.
- the model acquisition process can be completed by performing a selection operation on the models in the model library, and the selection operation may be performed by a user or automatically performed by a machine, for example.
- the selection operation may be, for example, clicking or dragging, as shown in FIG. 6.
- the resource configuration determining device 202 obtains the selected model.
- the visual operation interface 700 shown in FIG. 7 includes a model display area 720 and a model editing area 730.
- the basic hardware model set 711, the cloud infrastructure model set 713, the basic software model set 715, and the service model set 717 are displayed in the model display area 720 in the form of a menu.
- the module can be selected by means of a drop-down menu.
- the visual operation interface 800 shown in FIG. 8 includes a model acquisition area 820 and a model editing area 830.
- the model acquisition area 820 for the basic hardware model set 811, the cloud infrastructure model set 813, the basic software model set 815, and the service model set 817, there are information input fields for inputting the service identification of each service of the cloud service system. Information or service configuration information.
- the resource configuration determining device 202 obtains the required model of each service from the model library.
- an information input column is set for each model set.
- only one information input column may be provided.
- the obtained various models may be presented in the model editing area 630/730/830.
- FIG. 9 shows an example schematic diagram of resource restriction setting according to an embodiment of the present disclosure.
- a model in the service for example, after clicking the model "GW-CPU-Process", you can pop up a dialog box to input resource limit settings, for example, the number of VMs is limited to 4, and CPU The number of Cores is limited to 2.
- a service workflow between the obtained models is created to obtain a system simulation model of the cloud service system.
- the service workflow is used to indicate the interaction workflow between various models and the execution sequence of the models within a single service.
- events will be sent from one service to another, and the entire simulation model will be triggered to run the event flow.
- Fig. 10 shows a schematic diagram of a result after service workflow creation according to an embodiment of the present disclosure.
- the resource configuration determining device 202 may create a service workflow between the acquired models according to the operation process information of each service.
- a link operation can be performed on the corresponding endpoint of each acquired model in the model editing area.
- the resource configuration determining device 202 creates a service workflow between the models, thereby obtaining a system simulation model.
- the training data is used to train the service performance model of the system simulation model, which defines the resource consumption of running a single request on the service in the system simulation model.
- the training data may include part or all of the test data collected by the cloud service system in the test environment.
- the input of the service performance model may include the input parameters of the load test, for example, the load of the API request.
- the output of the service performance model can include the resource consumption of a single request, for example, MI (million instructions) on the CPU, memory increase, and packet size on the network.
- MI million instructions
- the output of the service performance model can be characterized by the response time of the service.
- the service performance model and the system simulation model are used to perform system simulation under a given resource configuration set and resource limit setting to obtain system performance KPIs.
- a service performance model that describes the resource consumption of a single request or the response time of a service, it is possible to calculate system behavior and system performance KPIs under different loads through low-level models such as hardware models and software models.
- the resource configuration with the best system performance KPI in the resource configuration set is determined as the resource configuration of the cloud service system.
- FIG. 11 shows an example schematic diagram of simulation results under different common bandwidths according to an embodiment of the present disclosure.
- the public bandwidth is set to 300Mbps
- the QPS and response time of the service will reach the best performance.
- the optimal public bandwidth is 300 Mbps
- a public bandwidth of 300 Mbps is determined as the resource configuration of the cloud service system.
- the target performance KPI of the system simulation model can also be set.
- the resource configuration with the best system performance KPI in the resource configuration set after the removal processing is determined as the resource configuration of the cloud service system.
- the test data includes verification data.
- the method may further include: using verification data to perform model verification, where in the model When the verification fails, the process of creating the system simulation model and service performance model is executed again.
- benchmark tests for different concurrency such as 200/500/1000/1500 concurrency
- select data from the 200/500 concurrent test data to build a system simulation model.
- compare the 1000/1500 concurrent simulation results with the test data to check whether the system simulation model can well predict the system behavior under different loads.
- the method may further include: performing a performance test on the determined resource configuration. For example, NFR testing is performed on the determined resource configuration.
- the method may further include: in response to the standard services included in each service of the cloud service system, obtaining the corresponding standard service workflow model from the service model set, Standard service performance model and/or standard cloud service performance model.
- an interactive elastic cloud service system is used as an example for description.
- other types of cloud service systems may also be used.
- model library may not include the service model set.
- FIG. 12 shows a block diagram of an apparatus 1200 for determining a resource configuration according to an embodiment of the present disclosure.
- the resource configuration determining device 1200 includes a model obtaining unit 1210, a resource restriction setting obtaining unit 1220, a service workflow creation unit 1230, a service performance model training unit 1240, a system simulation unit 1250, and a resource configuration determining unit 1260.
- the model acquiring unit 1210 is configured to acquire the required models of each service of the cloud service system from the model library.
- the operation of the model acquisition unit 1210 may refer to the operation of the block 510 described above with reference to FIG. 5.
- the resource limit setting obtaining unit 1220 is configured to obtain the resource limit setting of each required model.
- the operation of the resource limit setting acquisition unit 1220 may refer to the operation of block 520 described above with reference to FIG. 5.
- the service workflow creation unit 1230 is configured to create a service workflow between the acquired models to obtain a system simulation model of the cloud service system.
- the service workflow indicates the interactive workflow between the various models and within a single service.
- the execution order of the model may refer to the operation of the block 530 described above with reference to FIG. 5.
- the service performance model training unit 1240 is configured to use training data to train the service performance model of the system simulation model.
- the training data includes part or all of the test data collected by the cloud service system in a test environment.
- the service performance model It is used to define the resource consumption of running a single request on the service in the system simulation model.
- the operation of the service performance model training unit 1240 may refer to the operation of block 540 described above with reference to FIG. 5,
- the system simulation unit 1250 is configured to use the service performance model and the system simulation model to perform system simulation under a given resource configuration set and resource limit setting to obtain system performance KPIs.
- the operation of the system simulation unit 1250 may refer to the operation of the block 550 described above with reference to FIG. 5.
- the resource configuration determining unit 1260 is configured to determine the resource configuration with the best system performance KPI in the resource configuration set as the resource configuration of the cloud service system.
- the operation of the resource configuration determining unit 1260 may refer to the operation of the block 560 described above with reference to FIG. 5.
- the resource configuration determining apparatus 1200 may further include a target performance KPI acquisition unit (not shown) and a resource configuration removal unit (not shown).
- the target performance KPI obtaining unit is configured to obtain the target performance KPI of the system simulation model.
- the resource configuration removing unit is configured to remove resource configurations whose system performance KPI exceeds the target performance KPI from the resource configuration set.
- the resource configuration determining unit 1260 determines the resource configuration with the best system performance KPI in the resource configuration set after removal processing as the resource configuration of the cloud service system.
- system simulation tool may have a visual operation interface, and the visual operation interface includes a model display area.
- the model acquisition unit 1210 acquires the selected model.
- the system simulation tool has a visual operation interface, and the visual operation interface includes a model acquisition area.
- the model obtaining unit 1210 obtains the required model of each service from the model library.
- model library further includes a service model set
- model obtaining unit 1210 is further configured to respond to the standard service included in each service of the cloud service system, and obtain the corresponding standard service workflow from the service model set.
- Model, standard service performance model and/or standard cloud service performance model are further configured to respond to the standard service included in each service of the cloud service system, and obtain the corresponding standard service workflow from the service model set.
- the visual operation interface may also include a model editing area.
- the resource configuration determining apparatus 1200 further includes a model presentation unit (not shown).
- the model presentation unit is configured to present the acquired model in the model editing area.
- the service workflow creation unit 1230 creates a service workflow between the various models.
- the service workflow creation unit 1230 may be configured to create the acquired service workflow between the various models according to the operation process information of each service.
- the resource configuration determining apparatus 1200 may further include a model verification unit (not shown).
- the model verification unit is configured to use the verification data to perform model verification before using the service performance model and the system simulation model to perform system simulation under a given resource configuration set and resource limit settings to obtain system performance KPIs.
- the model verification fails, perform the operations of the model acquisition unit, the resource limit setting acquisition unit, the service workflow creation unit, and the service performance model training unit again.
- the above device for determining resource configuration can be implemented by hardware, or by software or a combination of hardware and software.
- FIG. 13 shows a schematic diagram of a computing device for implementing a process of determining a resource configuration of a cloud service system according to an embodiment of the present disclosure.
- the computing device 1300 may include at least one processor 1310, a memory (for example, non-volatile memory) 1320, a memory 1330, and a communication interface 1340, and at least one processor 1310, a memory 1320, a memory 1330, and a communication interface.
- the interfaces 1340 are connected together via a bus 1360.
- At least one processor 1310 executes at least one computer-readable instruction (that is, the above-mentioned element implemented in the form of software) stored or encoded in the memory.
- computer-executable instructions are stored in the memory, which, when executed, cause at least one processor 1310 to: obtain the required model of each service of the cloud service system from the model library; obtain the resource limit of each required model Setting; Create a service workflow between the acquired models to obtain a system simulation model of the cloud service system, the service workflow indicating the interactive workflow between the various models and the execution order of the models within a single service; via the service
- the performance model training module uses training data to train the service performance model of the system simulation model.
- the training data includes some or all of the test data collected by the cloud service system in the test environment.
- the service performance model defines each of the system simulation models.
- the resource consumption of a single request on the service use the service performance model and the system simulation model through the discrete event simulation engine to perform system simulation under a given resource configuration set and resource limit settings to obtain system performance KPIs; and a system that centralizes resource configuration
- the resource configuration with the best performance KPI is determined as the resource configuration of the cloud service system.
- a program product such as a machine-readable medium (for example, a non-transitory machine-readable medium) is provided.
- the machine-readable medium may have instructions (ie, the above-mentioned elements implemented in the form of software), which, when executed by a machine, cause the machine to perform the various operations and functions described above in conjunction with FIGS. 1-12 in the various embodiments of this specification.
- a system or device equipped with a readable storage medium may be provided, and the software program code for realizing the function of any one of the above-mentioned embodiments is stored on the readable storage medium, and the computer or device of the system or device The processor reads out and executes the instructions stored in the readable storage medium.
- the program code itself read from the readable medium can implement the function of any one of the above embodiments, so the machine readable code and the readable storage medium storing the machine readable code constitute the present invention a part of.
- Examples of readable storage media include floppy disks, hard disks, magneto-optical disks, optical disks (such as CD-ROM, CD-R, CD-RW, DVD-ROM, DVD-RAM, DVD-RW, DVD-RW), magnetic tape, Volatile memory card and ROM.
- the program code can be downloaded from a server computer or cloud via a communication network.
- the device structure described in the foregoing embodiments may be a physical structure or a logical structure. That is, some units may be realized by the same physical entity, or some units may be realized by multiple physical entities, or may be implemented by multiple physical entities. Some components in independent devices are implemented together.
- the hardware unit or module can be implemented mechanically or electrically.
- a hardware unit, module, or processor may include a permanent dedicated circuit or logic (such as a dedicated processor, FPGA or ASIC) to complete the corresponding operation.
- the hardware unit or processor may also include programmable logic or circuits (such as general-purpose processors or other programmable processors), which may be temporarily set by software to complete corresponding operations.
- the specific implementation mechanical, or dedicated permanent circuit, or temporarily set circuit
Landscapes
- Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Stored Programmes (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
- Debugging And Monitoring (AREA)
Abstract
本公开提供用于使用系统仿真工具确定云服务系统的资源配置的方法和装置。系统仿真工具包括模型库。在进行资源配置确定时,从模型库中获取云服务系统的各个服务的所需模型;创建所获取的各个模型之间的服务工作流,以得到云服务系统的系统仿真模型。此外,使用训练数据训练系统仿真模型的服务性能模型。此外,使用服务性能模型和系统仿真模型,在给定资源配置集和资源限制设置下执行系统仿真来得到系统性能KPI。此外,资源配置集中的系统性能KPI最佳的资源配置被确定为云服务系统的资源配置。利用该方法,可以使得云服务系统的资源配置确定更加简化和方便。
Description
本公开通常涉及云服务领域,尤其涉及云服务系统的资源配置确定方法及装置。
随着大数据和物联网技术的发展,在云上提供服务成为一种趋势。例如AWS,Alibaba Cloud的云提供商已经提供了丰富的云服务,这将会帮助业务提供商或服务提供商在云上运行其业务系统或服务系统,涵盖业务系统或服务系统的整个生命周期,包括业务系统或服务系统的开发和运营。在这种情况下,业务提供商或服务提供商需要知道如何以较优的资源配置来充分利用云服务,尤其是对于那些需要大量云服务和基础架构资源的业务系统或服务系统。性能建模方法是一种在不同领域广泛应用于系统评估的方法,并且已被应用于云服务系统的资源配置优化。
但是在性能建模方法中,性能建模和仿真需要对计算机系统和数学或统计技能有深入的了解。此外,由于性能模型需要考虑许多参数和条件,从而使得性能模型的创建既耗时又容易出错,尤其是具有交互服务的复杂系统。另外,大多数性能建模方法只能适用于稳态系统的资源配置优化,而不能适用于动态或弹性系统。
发明内容
鉴于上述,本公开提供一种用于使用系统仿真工具确定云服务系统的资源配置的方法和装置。利用该方法及装置,可以使得云服务系统的资源配置确定过程更加简化和方便。
根据本公开的一个方面,提供一种用于使用系统仿真工具确定云服务系统的资源配置的方法,所述系统仿真工具包括模型库,所述模型库包括基本硬件模型集、云基础设施模型集和基本软件模型集,所述方法包括: 从所述模型库中获取云服务系统的各个服务的所需模型;获取各个所需模型的资源限制设置;创建所获取的各个模型之间的服务工作流,以得到所述云服务系统的系统仿真模型,所述服务工作流指示各个模型之间的交互工作流以及单个服务内的模型执行顺序;使用训练数据来训练所述系统仿真模型中的各个服务的服务性能模型,所述训练数据包括所述云服务系统在测试环境下收集的测试数据中的部分或全部数据,所述服务性能模型定义所述系统仿真模型中的服务上运行单个请求的资源消耗;使用所述服务性能模型和所述系统仿真模型,在给定资源配置集和所述资源限制设置下执行系统仿真来得到系统性能KPI;以及将所述资源配置集中的系统性能KPI最佳的资源配置确定为所述云服务系统(30)的资源配置。
利用该方法,通过使用包括模型库、服务性能模型训练模块和离散事件仿真引擎的系统仿真工具,利用模型库中的标准化模型创建系统仿真模型,利用服务性能模型训练模块来使用云服务系统在测试环境下收集的测试数据自动训练出系统仿真模型的模型参数,并且利用如上得到的系统仿真模型进行模型仿真来确定出最优资源配置,可以使得云服务系统的资源配置确定过程更加简化和方便。
可选地,在上述方面的一个示例中,所述云服务系统包括交互式弹性云服务系统,所述云基础设施模型集可以包括自动伸缩服务模型。利用该方法,可以实现针对交互式弹性云服务系统的资源配置确定。
可选地,在上述方面的一个示例中,所述方法还包括:获取系统仿真模型的目标性能KPI;以及从资源配置集中去除系统性能KPI超过目标性能KPI的资源配置,将资源配置集中的系统性能KPI最佳的资源配置确定为云服务系统的资源配置包括:将经过去除处理后的资源配置集中的系统性能KPI最佳的资源配置确定为云服务系统的资源配置。利用该方法,通过在不同应用场景或应用需求下的目标性能KPI,可以使得所确定出的资源配置可以适应于云服务系统的各种应用场景或应用需求。
可选地,在上述方面的一个示例中,系统仿真工具可以具有可视化操作界面,所述可视化操作界面包括模型显示区。从所述模型库中获取所述云服务系统的各个服务的所需模型可以包括:响应于在所述模型显示区中的针对所述模型库的模型选择操作,获取所选择的模型。利用该方法,通 过模型选择操作来获取云服务系统的各个服务的所需模型,可以使得系统仿真过程更加直观和简洁。
可选地,在上述方面的一个示例中,所述系统仿真工具具有可视化操作界面,所述可视化操作界面包括模型获取区。从所述模型库中获取所述云服务系统的各个服务的所需模型可以包括:响应于在所述模型获取区输入所述云服务系统的各个服务的服务标识信息或服务配置信息,从所述模型库中获取各个服务的所需模型。利用该方法,通过输入各个服务的服务标识信息或服务配置信息,并由系统仿真工具的后台模块来自动获取云服务系统的各个服务的所需模型,而无需用户人工选择,可以大大降低模型获取过程的工作量和出错率,从而提升系统仿真的准确率和效率。
可选地,在上述方面的一个示例中,所述可视化操作界面还可以包括模型编辑区。所述方法还可以包括:将所获取的各个模型呈现在所述模型编辑区中。创建所获取的各个模型之间的服务工作流,以得到所述云服务系统的系统仿真模型可以包括:响应于所述模型编辑区中的针对所获取的各个模型的对应端点之间的链接操作,创建各个模型之间的服务工作流。利用该方法,通过在模型编辑区中执行服务中的模型端点之间的链接操作并由后台模块自动创建各个模型之间的服务工作流,可以使得系统仿真过程更加直观和简化。
可选地,在上述方面的一个示例中,创建所获取的各个模型之间的服务工作流,以得到所述云服务系统的系统仿真模型可以包括:根据各个服务的操作流程信息,创建所获取的各个模型之间的服务工作流,以得到所述云服务系统的系统仿真模型。利用该方法,由系统仿真工具的后台模块来根据各个服务的操作流程信息自动创建各个模型之间的服务工作流,而无需用户人工链接处理,可以大大降低服务工作流创建过程的工作量和出错率,从而提升系统仿真的准确率和效率。
可选地,在上述方面的一个示例中,所述模型库还包括服务模型集,所述服务模型集包括标准服务工作流模型、标准服务性能模型和/或标准云服务性能模型,所述方法还包括:响应于所述云服务系统的各个服务中包括标准服务,从所述服务模型集中获取对应的标准服务工作流模型、标准服务性能模型和/或标准云服务性能模型。利用该方法,通过在模型库中保 存各种常用服务的服务工作流模型和服务性能模型,使得在待仿真的云服务系统包括常用服务时,可以从服务模型集中直接获取对应的标准服务工作流模型和服务性能模块,而无需针对该服务重新进行模型仿真,从而可以简化系统仿真过程,并且提升系统仿真的准确率和效率。
可选地,在上述方面的一个示例中,所述测试数据包括验证数据,在使用所述服务性能模型和所述系统仿真模型,在给定资源配置集下执行系统仿真来得到系统性能KPI之前,所述方法还包括:使用所述验证数据来进行模型验证,其中,在模型验证未通过时,再次执行系统仿真模型和服务性能模型的创建过程。利用该方法,通过使用测试数据来进行系统仿真模型验证,可以确保系统仿真模型的模型仿真精度。
可选地,在上述方面的一个示例中,所述方法还可以包括:对所确定的资源配置进行性能测试。利用该方法,可以确保所确定的资源配置具有期望性能,从而防止将不合适的资源配置提供给处于生产环境的云服务系统来执行生产计划。
根据本公开的另一方面,提供一种用于确定云服务系统的资源配置的装置,所述装置应用于系统仿真工具,所述系统仿真工具包括模型库,所述模型库包括基本硬件模型集、云基础设施模型集和基本软件模型集,所述装置包括:模型获取单元,被配置为从所述模型库中获取云服务系统的各个服务的所需模型;资源限制设置获取单元,被配置为获取各个所需模型的资源限制设置;服务工作流创建单元,被配置为创建所获取的各个模型之间的服务工作流,以得到所述云服务系统的系统仿真模型,所述服务工作流指示各个模型之间的交互工作流以及单个服务内的模型执行顺序;服务性能模型训练单元,被配置为使用训练数据来训练所述系统仿真模型中的各个服务的服务性能模型,所述训练数据包括所述云服务系统在测试环境下收集的测试数据中的部分或全部数据,所述服务性能模型定义所述系统仿真模型中的服务上运行单个请求的资源消耗;系统仿真单元,被配置为使用所述服务性能模型和所述系统仿真模型,在给定资源配置集和所述资源限制设置下执行系统仿真来得到系统性能KPI;以及资源配置确定单元,被配置为将所述资源配置集中的系统性能KPI最佳的资源配置确定为所述云服务系统的资源配置。
可选地,在上述方面的一个示例中,所述装置还可以包括:目标性能KPI获取单元,被配置为获取所述系统仿真模型的目标性能KPI;以及资源配置去除单元,被配置为从所述资源配置集中去除系统性能KPI超过目标性能KPI的资源配置,其中,所述资源配置确定单元被配置为将经过去除处理后的资源配置集中的系统性能KPI最佳的资源配置确定为所述云服务系统的资源配置。
可选地,在上述方面的一个示例中,所述系统仿真工具具有可视化操作界面,所述可视化操作界面包括模型显示区。所述模型获取单元被配置为响应于在所述模型显示区中的针对所述模型库的模型选择操作,获取所选择的模型。
可选地,在上述方面的一个示例中,所述系统仿真工具具有可视化操作界面,所述可视化操作界面包括模型获取区。所述模型获取单元被配置为响应于在所述模型获取区输入所述云服务系统的各个服务的服务标识信息或服务配置信息,从所述模型库中获取各个服务的所需模型。
可选地,在上述方面的一个示例中,所述可视化操作界面包括模型编辑区,所述装置还包括:模型呈现单元,被配置为将所获取的模型呈现在所述模型编辑区,所述服务工作流创建单元被配置为响应于所述模型编辑区中的针对所获取的各个模型的对应端点之间的链接操作,创建各个模型之间的服务工作流。
可选地,在上述方面的一个示例中,所述服务工作流创建单元被配置为根据各个服务的操作流程信息,创建所获取的各个模型之间的服务工作流。
可选地,在上述方面的一个示例中,所述模型库还包括服务模型集,所述服务模型集包括标准服务工作流模型、标准服务性能模型和/或标准云服务性能模型,所述模型获取单元还被配置为响应于所述云服务系统的各个服务中包括标准服务,从所述服务模型集中获取对应的标准服务工作流模型、标准服务性能模型和/或标准云服务性能模型。
可选地,在上述方面的一个示例中,所述测试数据包括验证数据,所述装置还可以包括:模型验证单元,被配置为在使用所述服务性能模型和所述系统仿真模型,在给定资源配置集下执行系统仿真来得到系统性能KPI 之前,使用所述验证数据来进行模型验证,其中,在模型验证未通过时,在模型验证未通过时,再次执行所述模型获取单元、所述资源限制设置获取单元、所述服务工作流创建单元和所述服务性能模型训练单元的操作。
根据本公开的另一方面,提供一种计算设备,包括:至少一个处理器;以及与所述至少一个处理器耦合的存储器,被配置为存储指令,当所述指令被所述至少一个处理器执行时,使得所述至少一个处理器执行如上所述的资源配置确定方法。
根据本公开的另一方面,提供一种机器可读存储介质,其存储有可执行指令,所述指令当被执行时使得所述机器执行如上所述的资源配置确定方法。
根据本公开的另一方面,提供一种计算机程序产品,所述计算机程序产品被有形地存储在计算机可读介质上并且包括计算机可执行指令,所述计算机可执行指令在被执行时使至少一个处理器执行如上所述的资源配置确定方法。
通过参照下面的附图,可以实现对于本说明书内容的本质和优点的进一步理解。在附图中,类似组件或特征可以具有相同的附图标记。
图1示出了交互式弹性云服务系统的示例示意图。
图2示出了根据本公开的实施例的用于实现交互式弹性云服务系统的资源配置确定的资源配置确定架构的示例示意图。
图3示出了根据本公开的实施例的系统仿真工具的模型库的示例示意图。
图4示出了根据本公开的实施例的自动伸缩服务的处理机制的示例示意图。
图5示出了根据本公开的实施例的用于确定交互式弹性云服务系统的资源配置的方法的示例流程图。
图6示出了根据本公开的实施例的系统仿真工具的可视化操作界面的一个示例的示意图。
图7示出了根据本公开的实施例的系统仿真工具的可视化操作界面的 另一示例的示意图。
图8示出了根据本公开的实施例的系统仿真工具的可视化操作界面的另一示例的示意图。
图9示出了根据本公开的实施例的资源限制设置的示例示意图。
图10示出了根据本公开的实施例的经过服务工作流创建后的结果示意图。
图11示出了根据本公开的实施例的不同资源配置下的仿真结果的示例示意图。
图12示出了根据本公开的实施例的资源配置确定装置的方框图。
图13示出了根据本公开的实施例用于实现云服务系统的资源配置确定过程的计算设备的示意图。
附图标记
100 交互式弹性云服务系统
110-1,101-1,301-1 云服务A
110-2,101-2,301-2 云服务B
110-3,101-3,301-3 云服务C
103 负载生成器
120,102,202,302 Auto-scaling Service
1 示例环境
10 处于测试环境中的交互式弹性云服务系统
20 系统仿真工具
201 模型库
202 资源配置确定装置
30 处于生产环境中的交互式弹性云服务系统
40 测试数据
50 资源配置
300 模型库
310,611,711,811 基本硬件模型集
311 CPU
312 Bandwidth
313 Disk
314 Memory
320,613,713,813 云基础架构模型集
321 VM
322 Container
323 Auto-scaling
330,615,715,815 基本软件模型集
331 事件生成器
332 事件处理器
333 事件传输器
334 资源池
340,617,717,817 服务模型集
341 标准服务工作流模型
342 标准服务性能模型
343 标准云服务性能模型
400 自动伸缩服务处理机制
410 服务模型
420 自动伸缩服务
430 VM模型
440 KPI值
450 资源参数修改
500 资源配置确定过程
510 模型获取
520 资源限制设置获取
530 服务工作流创建
540 服务性能模型训练
550 系统仿真
560 资源配置确定
600,700,800 可视化操作界面
620,720 模型显示区
820 模型获取区
630,730,830 模型编辑区
1200 资源配置确定装置
1210 模型获取单元
1220 资源限制设置获取单元
1230 服务工作流创建单元
1240 服务性能模型训练单元
1250 系统仿真单元
1260 资源配置确定单元
1300 计算设备
1310 处理器
1320 存储器
1330 内存
1340 通信接口
1360 总线
现在将参考示例实施方式讨论本文描述的主题。应该理解,讨论这些实施方式只是为了使得本领域技术人员能够更好地理解从而实现本文描述的主题,并非是对权利要求书中所阐述的保护范围、适用性或者示例的限制。可以在不脱离本说明书内容的保护范围的情况下,对所讨论的元素的功能和排列进行改变。各个示例可以根据需要,省略、替代或者添加各种过程或组件。例如,所描述的方法可以按照与所描述的顺序不同的顺序来执行,以及各个步骤可以被添加、省略或者组合。另外,相对一些示例所描述的特征在其它例子中也可以进行组合。
如本文中使用的,术语“包括”及其变型表示开放的术语,含义是“包括但不限于”。术语“基于”表示“至少部分地基于”。术语“一个实施例”和“一实施例”表示“至少一个实施例”。术语“另一个实施例”表示“至少一个其他实施例”。术语“第一”、“第二”等可以指代不同的或相同的对 象。下面可以包括其他的定义,无论是明确的还是隐含的。除非上下文中明确地指明,否则一个术语的定义在整个说明书中是一致的。
云服务是指通过网络以按需、易扩展的方式获得所需服务。这种服务可以是与IT、软件和互联网相关的服务,也可是其他服务。云服务可以将企业所需的软硬件、资料都放到网络上,在任何时间、地点,使用不同的IT设备互相连接,实现数据存取、运算等目的。云服务的示例可以包括公共云(Public Cloud)和私有云(Private Cloud)。
图1示出了交互式弹性云服务系统100的示例示意图。如图1所示,交互式弹性云系统100包括交互式云服务A 110-1、交互式云服务B 110-2和交互式云服务C 110-3以及自动伸缩服务120。要说明的是,图1中示出的仅仅是交互式弹性云服务系统的例示示例。在其它示例中,可以包括更多或更少的交互式云服务。
交互式云服务A 110-1、B 110-2和C 110-3分别提供不同的服务能力,并且各个交互式云服务A 110-1、B 110-2和C 110-3分别通过调用其它云服务的接口或者通过网络来向其它云服务发送分组来实现彼此之间的交互。
自动伸缩服务(Auto-scaling Service)120可以由例如AWS、阿里云等云提供商提供,并且被使用来实现各个交互式服务A 110-1、B 110-2和C110-3的弹性特征。利用自动伸缩服务120,通过满足某些条件,可以改变各个交互式服务A 110-1、B 110-2和C 110-3的部署。例如,在CPU利用率处于高利用率时,通过增加虚拟机数目来提高服务的可用性和性能。
云服务系统需要在整个生命周期内在云上运行,包括云服务系统的开发和运营。在这种情况下,需要知道如何以较优的资源配置来充分利用云服务,尤其是那些需要大量云服务和基础架构资源的云服务系统。
性能建模方法被应用于云服务系统的资源配置优化。但是在性能建模方法中,性能建模和仿真需要对计算机系统和数学或统计技能有深入的了解。此外,由于性能模型需要考虑许多参数和条件,从而使得性能模型的创建既耗时又容易出错,尤其是具有交互服务的复杂系统。另外,大多数性能建模方法只能适用于稳态系统的资源配置优化,而不能适用于动态或弹性系统。
鉴于上述,本公开提供了一种用于使用系统仿真工具确定云服务系统 的资源配置的方案。在该方案中,系统仿真工具包括模型库,所述模型库包括基本硬件模型集、云基础设施模型集和基本软件模型集。在进行资源配置确定时,利用模型库中的标准化模型创建系统仿真模型,并且利用服务性能模型训练模块来使用云服务系统在测试环境下收集的测试数据自动训练出系统仿真模型的模型参数。在如上得到系统仿真模型后,利用所得到的系统仿真模型进行模型仿真,由此确定出最优资源配置。按照这种方式,可以使得云服务系统的资源配置确定过程更加简化和方便。
图2示出了根据本公开的实施例的用于实现交互式弹性云服务系统的资源配置确定的资源配置确定的示例环境1的示意图。
如图2所示,示例环境1包括处于测试环境中的交互式弹性云系统10、系统仿真工具20和处于生产环境中的交互式弹性云系统30。
除了包括交互式服务A 101-1、交互式云服务B 101-2和交互式云服务C 101-3以及自动伸缩服务102之外,交互式弹性云系统10还包括负载生成器103。交互式服务A 101-1、交互式云服务B 101-2和交互式云服务C 101-3以及自动伸缩服务102的结构和功能与图1中示出的对应组件的结构和功能相同。
负载生成器103被配置为模拟生产环境中的并发用户。在进行测试时,负载生成器103例如可以生成不同的并发请求或不同的请求内容,以及生产提速(ramp-up)/生产降速(ramp-down)策略。
可以在不同的负载下对交互式服务A 101-1、交互式云服务B 101-2和交互式云服务C 101-3执行基准测试,由此得到测试数据40。例如,可以在500/1000/1500个并发用户或线程的情况下执行基准测试。
测试数据40可以包括系统资源状态和服务性能数据。系统资源状态的示例例如可以包括但不限于CPU利用率,每秒的网络输入/输出量。服务性能数据的示例例如可以包括但不限于QPS,响应时间等。
测试数据40可以利用一些可用的系统监测工具或服务来收集。系统监测工具或服务的示例可以包括但不限于由云提供商提供的云监测服务,比如,AWS的CloudWatch,阿里云的CloudMonitor等,以及OSS服务监测工具,比如SkyWalking,Zipkin等。这些系统监测工具或服务收集并存储测试数据,并且提供用于数据查询/读取的API接口来供其他组件(例如, 系统仿真工具20)进行数据查询/读取。
系统仿真工具20包括模型库201和资源配置确定装置202。模型库201包括多个模型集,并且各个模型集中的模型可以是标准化的模型。各个模型集中的模型可以被组合来得到更多的复杂模型。每个模型可以被再用于不同的仿真方案,或者针对不同的仿真方案进行配置。此外,每个模型还可以利用更合适的模型来替换。
图3示出了根据本公开的实施例的系统仿真工具的模型库300的示例示意图。如图3所示,模型库300可以包括基本硬件模型集310、云基础架构模型集320、基本软件模型集330和服务模型集340。在一个示例中,基本硬件模型集310、云基础架构模型集320、基本软件模型集330和服务模型集340可以以分层的方式布置在模型库300中。
基本硬件模型集310包括计算机硬件的资源模型,所述计算机硬件例如可以包括CPU 311、带宽(Bandwidth)312、磁盘(Disk)313和存储器(Memory)314。基本硬件模型可以被使用来创建更多的复杂模型,并且可以被扩展。例如,在不同的CPU具有不同的调度策略的情况下,可以扩展出多个CPU调度模型来支持云提供商所提供的不同硬件。
基本硬件模型是每个服务模型的基础。基本硬件模型可以仿真硬件的行为,包括在并发状态下资源如何消耗,使用某些资源量将耗费多少时间等等,从而基本硬件模型的模型准确性是影响最终的系统仿真结果的关键因素。
云基础架构模型集320可以包括云提供商提供的基本服务模型,其是运行交互式服务的资源以及资源优化的目标。由于交互式弹性服务在云基础架构的资源上运行,资源的限制对于建模和评估很重要。云基础架构的示例例如可以包括但不限于虚拟机(VM)、容器(Container)和自动伸缩服务(Auto-scaling Service)。每个云基础架构可以被建模以及是可配置的。相应地,云基础架构模型集320可以包括虚拟机(VM)模型321、容器(Container)模型322和自动伸缩服务(Auto-scaling Service)模型323。每个云基础架构模型可以与其它模型集成在一起使用。
云提供商针对虚拟机用户提供自动伸缩服务,使得虚拟机用户能够充分地使用他们的虚拟机。在一些情形下,一些虚拟机长时间是空闲的并且 未被使用,如果用户释放这些虚拟机,则在服务的负载突然上升时,该服务可以不保持负载。自动伸缩服务可以被使用来实现用户服务的弹性特征。
自动伸缩模型仿真自动伸缩服务的行为,并且提供下述功能:(1)自动伸缩策略配置功能,例如,配置KPI和阈值;(2)自动检查功能,例如,自动检测所配置的KPI是否达到该阈值;(3)伸缩行为仿真功能;(4)资源参数修改,例如,修改服务的虚拟机数量。
图4示出了根据本公开的实施例的自动伸缩服务的处理机制的示例示意图。如图4所示,用户可以定义虚拟机应该增加或减少的条件。例如,如果CPU利用率达到80%,则应该增加1个虚拟机来提升服务的可用性和性能。
在进行自动伸缩服务处理时,首先配置KPI和阈值。随后,自动检测所配置的KPI是否达到阈值。如果达到该阈值,则执行资源参数修改,例如,修改服务的虚拟机数量。
基本软件模型层330可以包括事件生成器模型331、事件处理器模型332、事件传输器模型333和资源池模型334。事件是离散事件仿真的基本概念。事件是包括描述该事件的属性的对象,并且可以从一个服务发送到另一服务。
事件生成器331可以是测试环境中的负载生成器103的对端。事件处理器332能够配置服务的逻辑和延迟。利用事件传输器333,可以将事件从一个服务发送到另一服务。资源池模型334用于仿真软件的资源消耗以及软件内的资源限制,比如,锁存、连接限制、令牌限制等。
服务模型集340可以包括标准服务工作流模型341、标准服务性能模型342和标准云服务性能模型343。服务工作流模型是本地创建的服务工作流模型。服务性能模型是针对本地创建的服务建立的服务性能模型,并且被使用来进行服务性能分析。所述服务性能模型例如可以输出在不同输入负载下在服务上运行单个请求的资源消耗。云服务性能模型是针对云提供商提供的云服务建立的服务性能模型,并且被提供来针对云提供商提供的云服务进行服务性能分析。所述云服务性能模型例如可以输出不同输入负载下服务的响应时间。在本说明书中,术语“标准模型”是指由于该模型是常用模型并且适应于许多应用场景,从而被模块化为标准模型,以供模块 化使用。
资源配置确定装置202被配置为使用模型库来创建系统仿真模型,并且使用云服务系统在测试环境下收集的测试数据自动训练出系统仿真模型的服务性能模型,服务性能模型与系统仿真模型集成使用。在如上得到系统仿真模型后,资源配置确定装置202利用所得到的系统仿真模型进行模型仿真,并根据模型仿真结果,确定出最优资源配置50。
在确定出资源配置后,资源配置确定装置202将所确定的资源配置50提供给处于生产环境下的交互式弹性云服务系统30来执行生产计划。
图5示出了根据本公开的实施例的用于确定交互式弹性云服务系统的资源配置的方法500的示例流程图。所述方法由资源配置确定装置202执行。
如图5所示,在块510,从模型库中获取交互式弹性云服务系统的各个服务的所需模型。
在一个示例中,所述模型获取过程例如可以是响应于资源配置确定请求而发起,所述资源配置确定请求可以包括交互式弹性云服务系统的服务配置信息或者各个服务的服务标识信息。相应地,资源配置确定装置202可以根据交互式弹性云服务系统的服务配置信息或者各个服务的服务标识信息,自动地从模型库中获取各个服务的所需模型。
在一个示例中,系统仿真工具20可以具有可视化操作界面,如图6、图7和图8中示出的可视化操作界面600、700和800。
图6中示出的可视化操作界面600包括模型显示区620和模型编辑区630。基本硬件模型集611、云基础架构模型集613、基本软件模型集615和服务模型集617按照分层的方式显示在模型显示区620。
在图6中示出的示例中,可以通过针对模型库中的模型执行选择操作来完成模型获取过程,所述选择操作例如可以由用户执行或者由机器自动执行。所述选择操作例如可以是点击或者拖曳,如图6中所示。相应地,响应于在模型显示区中的针对模型库的模型选择操作,资源配置确定装置202获取所选择的模型。
图7中示出的可视化操作界面700包括模型显示区720和模型编辑区730。基本硬件模型集711、云基础架构模型集713、基本软件模型集715 和服务模型集717以菜单的形式显示在模型显示区720。相应地,可以通过下拉菜单的方式来选择模块。
图8中示出的可视化操作界面800包括模型获取区820和模型编辑区830。在模型获取区820中,针对基本硬件模型集811、云基础架构模型集813、基本软件模型集815和服务模型集817,分别具有信息输入栏,以供输入云服务系统的各个服务的服务标识信息或服务配置信息。相应地,响应于在模型获取区输入云服务系统的各个服务的服务标识信息或服务配置信息,资源配置确定装置202从模型库中获取各个服务的所需模型。要说明的是,在图8中,针对各个模型集分别设置一个信息输入栏。可选地,针对所有模型集,可以仅仅设置一个信息输入栏。
可选地,如上获取各个服务的所需模型后,可以将所获取的各个模型呈现在模型编辑区630/730/830中。
在块520,针对各个服务的所需模型进行资源限制设置。图9示出了根据本公开的实施例的资源限制设置的示例示意图。如图9所示,在点击服务中的一个模型,例如,点击模型“GW-CPU-Process”后,可以弹出对话框来输入资源限制设置,例如,VM的个数限制为4个,以及CPU Core的数目限制为2个。
在如上得到各个服务的所需模型后,在块530,创建所获取的各个模型之间的服务工作流,以得到云服务系统的系统仿真模型。在本公开中,服务工作流用于指示各个模型之间的交互工作流以及单个服务内的模型执行顺序。在完成服务工作流创建后,事件将会从一个服务发送到另一服务,并且整个仿真模型将被触发来运行事件流。图10示出了根据本公开的实施例的经过服务工作流创建后的结果示意图。
在一个示例中,资源配置确定装置202可以根据各个服务的操作流程信息,创建所获取的各个模型之间的服务工作流。
在另一示例中,在所获取的各个模型被呈现在模型编辑区中的情况下,可以通过在模型编辑区中针对所获取的各个模型的对应端点执行链接操作。相应地,响应于模型编辑区中的针对所获取的各个模型的对应端点之间的链接操作,资源配置确定装置202创建各个模型之间的服务工作流,由此得到系统仿真模型。
在如上得到系统仿真模型后,在块540,使用训练数据来训练系统仿真模型的服务性能模型,所述服务性能模型定义系统仿真模型中的服务上运行单个请求的资源消耗。这里,训练数据可以包括云服务系统在测试环境下收集的测试数据中的部分或全部数据。例如,服务性能模型的输入可以包括负载测试的输入参数,例如,API的请求的负载等。服务性能模型的输出可以包括单个请求的资源消耗,例如,CPU上的MI(million instructions),内存增加,网络上的分组尺寸。此外,在服务是云提供商提供的云服务(例如,Redis,Table Store)的情况下,由于该服务被作为黑盒子服务来进行工作,服务性能模型的输出可以利用服务的响应时间来表征。
在块550,使用服务性能模型和系统仿真模型,在给定资源配置集和资源限制设置下执行系统仿真来得到系统性能KPI。利用描述单个请求的资源消耗或服务的响应时间的服务性能模型,可以通过比如硬件模型和软件模型的底层模型来计算出不同负载下的系统行为和系统性能KPI。
在块560,将资源配置集中的系统性能KPI最佳的资源配置确定为云服务系统的资源配置。
图11示出了根据本公开的实施例的不同公共带宽下的仿真结果的示例示意图。如图11所示,在公共带宽被设置为300Mbps时,服务的QPS和响应时间将达到最佳性能。随着带宽越大,服务性能也不会提高。由此,最佳公共带宽是300Mbps,从而将公共带宽为300Mbps确定为云服务系统的资源配置。
此外,在一个示例中,还可以设置系统仿真模型的目标性能KPI。相应地,从资源配置集中去除系统性能KPI超过目标性能KPI的资源配置。然后,将经过去除处理后的资源配置集中的系统性能KPI最佳的资源配置确定为云服务系统的资源配置。
此外,可选地,在一个示例中,所述测试数据包括验证数据。在使用服务性能模型和系统仿真模型,在给定资源配置集和资源限制设置下执行系统仿真来得到系统性能KPI之前,所述方法还可以包括:使用验证数据来进行模型验证,其中,在模型验证未通过时,再次执行系统仿真模型和服务性能模型的创建过程。
例如,可以针对不同的并发性进行了基准测试,例如200/500/1000/1500 并发,由此得到测试数据。随后,从200/500并发的测试数据中选择数据来构建系统仿真模型。在构建好系统仿真模型后,将1000/1500并发下的仿真结果与测试数据进行比较,以检查系统仿真模型是否能够很好地预测不同负载下的系统行为。
此外,可选地,所述方法还可以包括:对所确定的资源配置进行性能测试。例如,对所确定的资源配置进行NFR测试。
此外,可选地,在模型库包括服务模型集的情况下,所述方法还可以包括:响应于云服务系统的各个服务中包括标准服务,从服务模型集中获取对应的标准服务工作流模型、标准服务性能模型和/或标准云服务性能模型。
此外,要说明的是,在上面的示例中,采用交互式弹性云服务系统作为示例来进行描述。在本说明书的其它实施例中,也可以采用其它类型的云服务系统。
此外,要说明的是,在本说明书的其它实施例中,模型库也可以不包括服务模型集。
图12示出了根据本公开的实施例的资源配置确定装置1200的方框图。如图12所示,资源配置确定装置1200包括模型获取单元1210、资源限制设置获取单元1220、服务工作流创建单元1230、服务性能模型训练单元1240、系统仿真单元1250和资源配置确定单元1260。
模型获取单元1210被配置为从模型库中获取云服务系统的各个服务的所需模型。模型获取单元1210的操作可以参考上面参照图5描述的块510的操作。
资源限制设置获取单元1220被配置为获取各个所需模型的资源限制设置。资源限制设置获取单元1220的操作可以参考上面参照图5描述的块520的操作。
服务工作流创建单元1230被配置为创建所获取的各个模型之间的服务工作流,以得到云服务系统的系统仿真模型,所述服务工作流指示各个模型之间的交互工作流以及单个服务内的模型执行顺序。服务工作流创建单元1230的操作可以参考上面参照图5描述的块530的操作。
服务性能模型训练单元1240被配置为使用训练数据来训练系统仿真模 型的服务性能模型,所述训练数据包括云服务系统在测试环境下收集的测试数据中的部分或全部数据,所述服务性能模型用于定义所述系统仿真模型中的服务上运行单个请求的资源消耗。服务性能模型训练单元1240的操作可以参考上面参照图5描述的块540的操作,
系统仿真单元1250被配置为使用服务性能模型和系统仿真模型,在给定资源配置集和资源限制设置下执行系统仿真来得到系统性能KPI。系统仿真单元1250的操作可以参考上面参照图5描述的块550的操作。
资源配置确定单元1260被配置为将资源配置集中的系统性能KPI最佳的资源配置确定为云服务系统的资源配置。资源配置确定单元1260的操作可以参考上面参照图5描述的块560的操作。
此外,在一个示例中,资源配置确定装置1200还可以包括目标性能KPI获取单元(未示出)和资源配置去除单元(未示出)。所述目标性能KPI获取单元被配置为获取系统仿真模型的目标性能KPI。所述资源配置去除单元被配置为从资源配置集中去除系统性能KPI超过目标性能KPI的资源配置。相应地,资源配置确定单元1260将经过去除处理后的资源配置集中的系统性能KPI最佳的资源配置确定为云服务系统的资源配置。
此外,可选地,系统仿真工具可以具有可视化操作界面,所述可视化操作界面包括模型显示区。响应于在模型显示区中的针对模型库的模型选择操作,模型获取单元1210获取所选择的模型。
此外,可选地,系统仿真工具具有可视化操作界面,所述可视化操作界面包括模型获取区。响应于在模型获取区输入云服务系统的各个服务的服务标识信息或服务配置信息,模型获取单元1210从模型库中获取各个服务的所需模型。
此外,可选地,在模型库还包括服务模型集的情况下,模型获取单元1210还被配置为响应于云服务系统的各个服务中包括标准服务,从服务模型集中获取对应的标准服务工作流模型、标准服务性能模型和/或标准云服务性能模型。
此外,可选地,所述可视化操作界面还可以包括模型编辑区。相应地,资源配置确定装置1200还包括模型呈现单元(未示出)。所述模型呈现单元被配置为将所获取的模型呈现在模型编辑区。相应地,响应于模型编辑 区中的针对所获取的各个模型的对应端点之间的链接操作,服务工作流创建单元1230创建各个模型之间的服务工作流。
此外,可选地,在一个示例中,服务工作流创建单元1230可以被配置为根据各个服务的操作流程信息,创建所获取的各个模型之间的服务工作流。
此外,可选地,在测试数据包括验证数据的情况下,资源配置确定装置1200还可以包括模型验证单元(未示出)。所述模型验证单元被配置为在使用服务性能模型和系统仿真模型,在给定资源配置集和资源限制设置下执行系统仿真来得到系统性能KPI之前,使用验证数据来进行模型验证。在模型验证未通过时,再次执行模型获取单元、资源限制设置获取单元、服务工作流创建单元和服务性能模型训练单元的操作。
如上参照图1到图12,对根据本公开的资源配置确定方法和资源配置确定装置进行了描述。上面的资源配置确定装置可以采用硬件实现,也可以采用软件或者硬件和软件的组合来实现。
图13示出了根据本公开的实施例用于实现云服务系统的资源配置确定过程的计算设备的示意图。如图13所示,计算设备1300可以包括至少一个处理器1310、存储器(例如,非易失性存储器)1320、内存1330和通信接口1340,并且至少一个处理器1310、存储器1320、内存1330和通信接口1340经由总线1360连接在一起。至少一个处理器1310执行在存储器中存储或编码的至少一个计算机可读指令(即,上述以软件形式实现的元素)。
在一个实施例中,在存储器中存储计算机可执行指令,其当执行时使得至少一个处理器1310:从模型库中获取云服务系统的各个服务的所需模型;获取各个所需模型的资源限制设置;创建所获取的各个模型之间的服务工作流,以得到云服务系统的系统仿真模型,所述服务工作流指示各个模型之间的交互工作流以及单个服务内的模型执行顺序;经由服务性能模型训练模块使用训练数据来训练系统仿真模型的服务性能模型,训练数据包括云服务系统在测试环境下收集的测试数据中的部分或全部数据,服务性能模型定义所述系统仿真模型中的各个服务上的单个请求的资源消耗;经由离散事件仿真引擎使用服务性能模型和系统仿真模型,在给定资源配置集和资源限制设置下执行系统仿真来得到系统性能KPI;以及将资源配置 集中的系统性能KPI最佳的资源配置确定为云服务系统的资源配置。
应该理解,在存储器中存储的计算机可执行指令当执行时使得至少一个处理器1310进行本说明书的各个实施例中以上结合图1-12描述的各种操作和功能。
根据一个实施例,提供了一种比如机器可读介质(例如,非暂时性机器可读介质)的程序产品。机器可读介质可以具有指令(即,上述以软件形式实现的元素),该指令当被机器执行时,使得机器执行本说明书的各个实施例中以上结合图1-12描述的各种操作和功能。具体地,可以提供配有可读存储介质的系统或者装置,在该可读存储介质上存储着实现上述实施例中任一实施例的功能的软件程序代码,且使该系统或者装置的计算机或处理器读出并执行存储在该可读存储介质中的指令。
在这种情况下,从可读介质读取的程序代码本身可实现上述实施例中任何一项实施例的功能,因此机器可读代码和存储机器可读代码的可读存储介质构成了本发明的一部分。
可读存储介质的实施例包括软盘、硬盘、磁光盘、光盘(如CD-ROM、CD-R、CD-RW、DVD-ROM、DVD-RAM、DVD-RW、DVD-RW)、磁带、非易失性存储卡和ROM。可选择地,可以由通信网络从服务器计算机上或云上下载程序代码。
本领域技术人员应当理解,上面公开的各个实施例可以在不偏离发明实质的情况下做出各种变形和修改。因此,本发明的保护范围应当由所附的权利要求书来限定。
需要说明的是,上述各流程和各系统结构图中不是所有的步骤和单元都是必须的,可以根据实际的需要忽略某些步骤或单元。各步骤的执行顺序不是固定的,可以根据需要进行确定。上述各实施例中描述的装置结构可以是物理结构,也可以是逻辑结构,即,有些单元可能由同一物理实体实现,或者,有些单元可能分由多个物理实体实现,或者,可以由多个独立设备中的某些部件共同实现。
以上各实施例中,硬件单元或模块可以通过机械方式或电气方式实现。例如,一个硬件单元、模块或处理器可以包括永久性专用的电路或逻辑(如专门的处理器,FPGA或ASIC)来完成相应操作。硬件单元或处理器还可 以包括可编程逻辑或电路(如通用处理器或其它可编程处理器),可以由软件进行临时的设置以完成相应操作。具体的实现方式(机械方式、或专用的永久性电路、或者临时设置的电路)可以基于成本和时间上的考虑确定。
上面结合附图阐述的具体实施方式描述了示例性实施例,但并不表示可以实现的或者落入权利要求书的保护范围的所有实施例。在整个本说明书中使用的术语“示例性”意味着“用作示例、实例或例示”,并不意味着比其它实施例“优选”或“具有优势”。出于提供对所描述技术的理解的目的,具体实施方式包括具体细节。然而,可以在没有这些具体细节的情况下实施这些技术。在一些实例中,为了避免对所描述的实施例的概念造成难以理解,公知的结构和装置以框图形式示出。
本公开内容的上述描述被提供来使得本领域任何普通技术人员能够实现或者使用本公开内容。对于本领域普通技术人员来说,对本公开内容进行的各种修改是显而易见的,并且,也可以在不脱离本公开内容的保护范围的情况下,将本文所定义的一般性原理应用于其它变型。因此,本公开内容并不限于本文所描述的示例和设计,而是与符合本文公开的原理和新颖性特征的最广范围相一致。
Claims (21)
- 一种用于使用系统仿真工具(20)确定云服务系统(30)的资源配置的方法(500),所述系统仿真工具(20)包括模型库(201)和资源配置确定装置(202),所述模型库(201)包括基本硬件模型集(310)、云基础设施模型集(320)和基本软件模型集(330),所述方法包括:从所述模型库(201)中获取(510)云服务系统(30)的各个服务的所需模型;获取(520)各个所需模型的资源限制设置;创建(530)所获取的各个模型之间的服务工作流,以得到所述云服务系统(30)的系统仿真模型,所述服务工作流指示各个模型之间的交互工作流以及单个服务内的模型执行顺序;使用训练数据来训练(540)所述系统仿真模型中的各个服务的服务性能模型,所述训练数据包括所述云服务系统(30)在测试环境下收集的测试数据(40)中的部分或全部数据,所述服务性能模型定义所述系统仿真模型中的服务上运行单个请求的资源消耗;使用所述服务性能模型和所述系统仿真模型,在给定资源配置集和所述资源限制设置下执行系统仿真(550)来得到系统性能KPI;以及将所述资源配置集中的系统性能KPI最佳的资源配置确定(560)为所述云服务系统(30)的资源配置(50)。
- 如权利要求1所述的方法(500),其中,所述云服务系统(30)包括交互式弹性云服务系统,以及所述云基础设施模型集(320)包括自动伸缩服务模型。
- 如权利要求1所述的方法(500),还包括:获取所述系统仿真模型的目标性能KPI;以及从所述资源配置集中去除系统性能KPI超过目标性能KPI的资源配置,将所述资源配置集中的系统性能KPI最佳的资源配置确定(560)为所述云服务系统(30)的资源配置(50)包括:将经过去除处理后的资源配置集中的系统性能KPI最佳的资源配置确定为所述云服务系统(30)的资源配置(50)。
- 如权利要求1所述的方法(500),其中,所述系统仿真工具(20)具有可视化操作界面(600,700),所述可视化操作界面(600,700)包括模型显示区(620,720),从所述模型库(201)中获取所述云服务系统(30)的各个服务的所需模型包括:响应于在所述模型显示区(620,720)中的针对所述模型库(201)的模型选择操作,获取所选择的模型。
- 如权利要求1所述的方法(500),其中,所述系统仿真工具(20)具有可视化操作界面(800),所述可视化操作界面(800)包括模型获取区(820),从所述模型库(201)中获取所述云服务系统(30)的各个服务的所需模型包括:响应于在所述模型获取区(820)输入所述云服务系统(30)的各个服务的服务标识信息或服务配置信息,从所述模型库(201)中获取各个服务的所需模型。
- 如权利要求4或5所述的方法(500),其中,所述可视化操作界面还包括模型编辑区(630,730,830),所述方法还包括:将所获取的各个模型呈现在所述模型编辑区(630,730,830)中,创建(530)所获取的各个模型之间的服务工作流,以得到所述云服务系统的系统仿真模型包括:响应于在所述模型编辑区(630,730,830)中的针对所获取的各个模型的对应端点之间的链接操作,创建各个模型之间的服务工作流。
- 如权利要求1所述的方法(500),其中,创建(530)所获取的各个模型之间的服务工作流,以得到所述云服务系统(30)的系统仿真模型包括:根据各个服务的操作流程信息,创建所获取的各个模型之间的服务工 作流,以得到所述云服务系统(30)的系统仿真模型。
- 如权利要求1所述的方法(500),其中,所述模型库(201)还包括服务模型集(340),所述服务模型集(340)包括标准服务工作流模型(341)、标准服务性能模型(342)和/或标准云服务性能模型(343),所述方法还包括:响应于所述云服务系统(30)的各个服务中包括标准服务,从所述服务模型集(340)中获取对应的标准服务工作流模型(341)、标准服务性能模型(342)和/或标准云服务性能模型(343)。
- 如权利要求1所述的方法(500),其中,所述测试数据(40)包括验证数据,在使用所述服务性能模型和所述系统仿真模型,在给定资源配置集下执行系统仿真来得到系统性能KPI之前,所述方法还包括:使用所述验证数据来进行模型验证,其中,在模型验证未通过时,再次执行所述系统仿真模型和所述服务性能模型的创建过程。
- 如权利要求1到9中任一所述的方法(500),还包括:对所确定的资源配置进行性能测试。
- 一种用于确定云服务系统(30)的资源配置的装置(1200),所述装置应用于系统仿真工具(20),所述系统仿真工具(20)包括模型库(201),所述模型库(201)包括基本硬件模型集(310)、云基础设施模型集(320)和基本软件模型集(330),所述装置(1200)包括:模型获取单元(1210),被配置为从所述模型库(201)中获取云服务系统(30)的各个服务的所需模型;资源限制设置获取单元(1220),被配置为获取各个所需模型的资源限制设置;服务工作流创建单元(1230),被配置为创建所获取的各个模型之间的服务工作流,以得到所述云服务系统(30)的系统仿真模型,所述服务工 作流指示各个模型之间的交互工作流以及单个服务内的模型执行顺序;服务性能模型训练单元(1240),被配置为使用训练数据来训练所述系统仿真模型中的各个服务的服务性能模型,所述训练数据包括所述云服务系统(30)在测试环境下收集的测试数据(40)中的部分或全部数据,所述服务性能模型定义所述系统仿真模型中的服务上运行单个请求的资源消耗;系统仿真单元(1250),被配置为使用所述服务性能模型和所述系统仿真模型,在给定资源配置集和所述资源限制设置下执行系统仿真来得到系统性能KPI;以及资源配置确定单元(1260),被配置为将所述资源配置集中的系统性能KPI最佳的资源配置确定为所述云服务系统(30)的资源配置(50)。
- 如权利要求11所述的装置(1200),还包括:目标性能KPI获取单元,被配置为获取所述系统仿真模型的目标性能KPI;以及资源配置去除单元,被配置为从所述资源配置集中去除系统性能KPI超过目标性能KPI的资源配置,其中,所述资源配置确定单元(1260)被配置为将经过去除处理后的资源配置集中的系统性能KPI最佳的资源配置确定为所述云服务系统(30)的资源配置(50)。
- 如权利要求11所述的装置(1200),其中,所述系统仿真工具(20)具有可视化操作界面(600,700),所述可视化操作界面包括模型显示区(620,720),所述模型获取单元(1210)被配置为响应于在所述模型显示区(620,720)中的针对所述模型库(201)的模型选择操作,获取所选择的模型。
- 如权利要求11所述的装置(1200),其中,所述系统仿真工具(20)具有可视化操作界面(800),所述可视化操作界面包括模型获取区(820),所述模型获取单元(1210)被配置为响应于在所述模型获取区(820) 输入所述云服务系统(30)的各个服务的服务标识信息或服务配置信息,从所述模型库(201)中获取各个服务的所需模型。
- 如权利要求13或14所述的装置(1200),其中,所述可视化操作界面(600,700,800)包括模型编辑区(630,730,830),所述装置(1200)还包括:模型呈现单元,被配置为将所获取的模型呈现在所述模型编辑区(630,730,830),所述服务工作流创建单元(1230)被配置为响应于所述模型编辑区(630,730,830)中的针对所获取的各个模型的对应端点之间的链接操作,创建各个模型之间的服务工作流。
- 如权利要求11所述的装置(1200),其中,所述服务工作流创建单元(1220)被配置为根据各个服务的操作流程信息,创建所获取的各个模型之间的服务工作流。
- 如权利要求11所述的装置(1200),其中,所述模型库(201)还包括服务模型集(340),所述服务模型集(340)包括标准服务工作流模型(341)、标准服务性能模型(342)和/或标准云服务性能模型(343),所述模型获取单元(1210)还被配置为响应于所述云服务系统(30)的各个服务中包括标准服务,从所述服务模型集(340)中获取对应的标准服务工作流模型(341)、标准服务性能模型(342)和/或标准云服务性能模型(343)。
- 如权利要求11到17中任一所述的装置(1200),其中,所述测试数据(40)包括验证数据,所述装置(1200)还包括:模型验证单元,被配置为在使用所述服务性能模型和所述系统仿真模型,在给定资源配置集和所述资源限制设置下执行系统仿真来得到系统性能KPI之前,使用所述验证数据来进行模型验证,其中,在模型验证未通过时,再次执行所述模型获取单元(1210)、所述资源限制设置获取单元(1220)、所述服务工作流创建单元(1230)和所 述服务性能模型训练单元(1240)的操作。
- 一种计算设备(1300),包括:至少一个处理器(1310);以及与所述至少一个处理器耦合的存储器(1320),被配置为存储指令,当所述指令被所述至少一个处理器(1310)执行时,使得所述至少一个处理器执行如权利要求1到10中任一所述的方法。
- 一种机器可读存储介质,其存储有可执行指令,所述指令当被执行时使得所述机器执行如权利要求1到10中任一所述的方法。
- 一种计算机程序产品,所述计算机程序产品被有形地存储在计算机可读介质上并且包括计算机可执行指令,所述计算机可执行指令在被执行时使至少一个处理器执行如权利要求1到10中任一所述的方法。
Priority Applications (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP20940588.5A EP4152715A4 (en) | 2020-06-16 | 2020-06-16 | METHOD AND DEVICE FOR DETERMINING THE RESOURCE CONFIGURATION OF A CLOUD SERVICE SYSTEM |
CN202080101241.9A CN115668895A (zh) | 2020-06-16 | 2020-06-16 | 云服务系统的资源配置确定方法及装置 |
US18/001,900 US11750471B2 (en) | 2020-06-16 | 2020-06-16 | Method and apparatus for determining resource configuration of cloud service system |
PCT/CN2020/096398 WO2021253239A1 (zh) | 2020-06-16 | 2020-06-16 | 云服务系统的资源配置确定方法及装置 |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/CN2020/096398 WO2021253239A1 (zh) | 2020-06-16 | 2020-06-16 | 云服务系统的资源配置确定方法及装置 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2021253239A1 true WO2021253239A1 (zh) | 2021-12-23 |
Family
ID=79268821
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2020/096398 WO2021253239A1 (zh) | 2020-06-16 | 2020-06-16 | 云服务系统的资源配置确定方法及装置 |
Country Status (4)
Country | Link |
---|---|
US (1) | US11750471B2 (zh) |
EP (1) | EP4152715A4 (zh) |
CN (1) | CN115668895A (zh) |
WO (1) | WO2021253239A1 (zh) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20220303219A1 (en) * | 2021-03-22 | 2022-09-22 | Fujitsu Limited | Non-transitory computer-readable recording medium, service management device, and service management method |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117453150B (zh) * | 2023-12-25 | 2024-04-05 | 杭州阿启视科技有限公司 | 录像存储调度服务多实例的实现方法 |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103986669A (zh) * | 2014-05-07 | 2014-08-13 | 华东师范大学 | 一种云计算中资源分配策略的评估方法 |
US20160357584A1 (en) * | 2015-06-04 | 2016-12-08 | International Business Machines Corporation | Hybrid simulation in a cloud computing environment |
CN106845746A (zh) * | 2016-06-15 | 2017-06-13 | 曹大海 | 一种支持大规模实例密集型应用的云工作流管理系统 |
CN109412829A (zh) * | 2018-08-30 | 2019-03-01 | 华为技术有限公司 | 一种资源配置的预测方法及设备 |
US20200089533A1 (en) * | 2018-03-13 | 2020-03-19 | Aloke Guha | Methods and systems for cloud application optimization |
CN111194539A (zh) * | 2017-12-29 | 2020-05-22 | 西门子股份公司 | 提供云平台虚拟资源的方法、装置和计算机可读存储介质 |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9223634B2 (en) * | 2012-05-02 | 2015-12-29 | Cisco Technology, Inc. | System and method for simulating virtual machine migration in a network environment |
US9137110B1 (en) * | 2012-08-16 | 2015-09-15 | Amazon Technologies, Inc. | Availability risk assessment, system modeling |
US10235480B2 (en) * | 2016-06-15 | 2019-03-19 | International Business Machines Corporation | Simulation of internet of things environment |
GB2540902B (en) * | 2016-11-10 | 2017-07-19 | Metaswitch Networks Ltd | Optimising a mapping of virtualised network functions onto physical resources in a network using dependency models |
-
2020
- 2020-06-16 EP EP20940588.5A patent/EP4152715A4/en active Pending
- 2020-06-16 CN CN202080101241.9A patent/CN115668895A/zh active Pending
- 2020-06-16 US US18/001,900 patent/US11750471B2/en active Active
- 2020-06-16 WO PCT/CN2020/096398 patent/WO2021253239A1/zh unknown
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103986669A (zh) * | 2014-05-07 | 2014-08-13 | 华东师范大学 | 一种云计算中资源分配策略的评估方法 |
US20160357584A1 (en) * | 2015-06-04 | 2016-12-08 | International Business Machines Corporation | Hybrid simulation in a cloud computing environment |
CN106845746A (zh) * | 2016-06-15 | 2017-06-13 | 曹大海 | 一种支持大规模实例密集型应用的云工作流管理系统 |
CN111194539A (zh) * | 2017-12-29 | 2020-05-22 | 西门子股份公司 | 提供云平台虚拟资源的方法、装置和计算机可读存储介质 |
US20200089533A1 (en) * | 2018-03-13 | 2020-03-19 | Aloke Guha | Methods and systems for cloud application optimization |
CN109412829A (zh) * | 2018-08-30 | 2019-03-01 | 华为技术有限公司 | 一种资源配置的预测方法及设备 |
Non-Patent Citations (2)
Title |
---|
See also references of EP4152715A4 * |
ZHU MINGFA ., QIU SHI-DA;TAO YUAN;QIN GUANG-JUN;LIU RUI: "A Resource Elastic Scaling Method of Cloud Platform Based on Benefit Analysis", JILIN NORMAL UNIVERSITY JOURNAL, NATURAL SCIENCE EDITION, vol. 37, no. 4, 30 November 2016 (2016-11-30), pages 51 - 58, XP055881167, ISSN: 1674-3873, DOI: 10.16862/j.cnki.issn1674-3873.2016.04.009 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20220303219A1 (en) * | 2021-03-22 | 2022-09-22 | Fujitsu Limited | Non-transitory computer-readable recording medium, service management device, and service management method |
US11627085B2 (en) * | 2021-03-22 | 2023-04-11 | Fujitsu Limited | Non-transitory computer-readable recording medium, service management device, and service management method |
Also Published As
Publication number | Publication date |
---|---|
CN115668895A (zh) | 2023-01-31 |
US20230188432A1 (en) | 2023-06-15 |
EP4152715A4 (en) | 2023-12-20 |
US11750471B2 (en) | 2023-09-05 |
EP4152715A1 (en) | 2023-03-22 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106940428B (zh) | 芯片验证方法、装置及系统 | |
US11483218B2 (en) | Automating 5G slices using real-time analytics | |
US20210311858A1 (en) | System and method for providing a test manager for use with a mainframe rehosting platform | |
US9342328B2 (en) | Model for simulation within infrastructure management software | |
US8898681B1 (en) | Mainframe virtualization | |
US20140365196A1 (en) | Infrastructure Model Generation System And Method | |
US20060235664A1 (en) | Model-based capacity planning | |
CN104360878B (zh) | 一种应用软件部署的方法及装置 | |
JP2017506843A (ja) | 可視化されたネットワーク運用及び保守のための方法及び装置 | |
US20180260201A1 (en) | Intelligent software deployment on an iaas platform | |
GB2523338A (en) | Testing a virtualised network function in a network | |
US20210141708A1 (en) | Systems and methods for determining optimal cost-to-serve for cloud applications in the public cloud | |
US20150046212A1 (en) | Monitoring of business processes and services using concept probes and business process probes | |
WO2021253239A1 (zh) | 云服务系统的资源配置确定方法及装置 | |
US10671506B2 (en) | Evaluating fairness in devices under test | |
CN109254922A (zh) | 一种服务器BMC Redfish功能的自动化测试方法及装置 | |
CN112199273A (zh) | 一种虚拟机压力/性能测试方法及系统 | |
Lei et al. | Performance and scalability testing strategy based on kubemark | |
Chen et al. | Stresscloud: A tool for analysing performance and energy consumption of cloud applications | |
CN110569154B (zh) | 一种芯片接口功能测试方法、系统、终端及存储介质 | |
WO2015049771A1 (ja) | コンピュータシステム | |
CN114143235A (zh) | Nfv自动测试方法、装置、设备及存储介质 | |
KR20170044320A (ko) | 분산 컴퓨팅 기반의 어플리케이션 객체 분석 방법, 이를 수행하는 어플리케이션 객체 분석 서버 및 이를 저장하는 기록매체 | |
Hwang et al. | Cloud transformation analytics services: a case study of cloud fitness validation for server migration | |
CN114328196A (zh) | 数据防泄漏系统的测试方法、装置、设备及存储介质 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 20940588 Country of ref document: EP Kind code of ref document: A1 |
|
ENP | Entry into the national phase |
Ref document number: 2020940588 Country of ref document: EP Effective date: 20221213 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |