WO2012050224A1 - Système de gestion de ressource informatique - Google Patents

Système de gestion de ressource informatique Download PDF

Info

Publication number
WO2012050224A1
WO2012050224A1 PCT/JP2011/073842 JP2011073842W WO2012050224A1 WO 2012050224 A1 WO2012050224 A1 WO 2012050224A1 JP 2011073842 W JP2011073842 W JP 2011073842W WO 2012050224 A1 WO2012050224 A1 WO 2012050224A1
Authority
WO
WIPO (PCT)
Prior art keywords
computer resource
server
control system
resource control
computer
Prior art date
Application number
PCT/JP2011/073842
Other languages
English (en)
Japanese (ja)
Inventor
英裕 最首
Original Assignee
株式会社イーシー・ワン
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 株式会社イーシー・ワン filed Critical 株式会社イーシー・ワン
Publication of WO2012050224A1 publication Critical patent/WO2012050224A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3055Monitoring arrangements for monitoring the status of the computing system or of the computing system component, e.g. monitoring if the computing system is on, off, available, not available
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/08Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
    • H04L43/0805Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters by checking availability
    • H04L43/0817Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters by checking availability by checking functioning

Definitions

  • the present invention relates to a service level management technique for stably operating computer resources that are complicated and large in scale as expressed by the term “cloud”.
  • virtualization technology has made it possible to use one computer physically as a plurality of computers. In other words, it becomes possible to make hardware hardware by using the virtualization technology, and it is possible to secure a necessary number of servers by copying a server image.
  • a distributed system that improves performance by sharing processing with multiple small servers instead of a single large server distributes a large-scale batch system that can store and retrieve large amounts of data at high speed. It has been put to practical use in various fields, such as a mechanism for improving performance.
  • Such a distributed system operates as if it were a single computer, while the functions previously performed by the system in one place are distributed among a plurality of computers.
  • cloud a main cloud service
  • Amazon Web Services (trademark) is known (see Non-Patent Document 1).
  • the present invention is intended to provide a solution that can monitor and control the status of computer resources of a monitored system in real time. It also seeks to ensure scalability and fault tolerance performance for such a solution itself.
  • a computer resource control system monitors the status of computer resources and performs control according to the status.
  • the computer resource control system determines whether or not the computer resource needs to be controlled based on the data collected from the monitoring agent, and executes the control for the computer resource when it is determined that the computer resource needs to be controlled.
  • a processing unit that outputs an instruction; a message queue for exchanging data between the monitoring agent and the processing unit; and a first monitoring agent for monitoring the status of the message queue. Based on the data collected from the first monitoring agent, it is determined whether or not it is necessary to control the computer resource.
  • a computer resource control system compares a data collected from a monitoring agent with a predefined control rule to determine whether or not an action is required for a computer resource, and a management server, An execution server that outputs an instruction to execute an action on the computer resource when it is determined that an action on the computer resource is required, and includes at least one of the servers in the computer resource control system.
  • the action includes control of any computer resource.
  • the action includes processing for increasing or decreasing the number of servers included in the computer resource control system.
  • the amount of computer resources to be input can be dynamically controlled.
  • the action includes processing such as job execution on the server, program start on the server, various setting changes, workflow control, physical server control, network device control, or cooperation between computer resources. .
  • the computer resource control system further comprises a message queue server including a message queue for asynchronously exchanging data between the monitoring agent and a server in the computer resource control system.
  • the message queue server preferably includes a monitoring agent that monitors the data exchange status. According to this, computer resources can be appropriately controlled according to the data amount of the message queue.
  • the message queue server, the management server, and the execution server are each configured by a plurality of servers, and the action includes control for these servers.
  • each server is configured by a virtual server, and an action is a process that increases or decreases the number of servers that constitute a message queue server, a process that increases or decreases the number of servers that constitute a management server, or a server that constitutes an execution server It is preferable that at least one of the processes for increasing / decreasing the number is included. According to this, since the system is configured in a distributed structure without a single point of failure, it is possible to construct a system that does not go down as a whole even if a failure occurs in any single function.
  • the message queue server sequentially receives data collected from a plurality of monitoring agents, sequentially reads data management queues read by the management server, inputs action instructions from the management server, and sequentially executes by the execution server.
  • the message queue server includes at least one of an execution queue to be read and a management queue sequentially input processing data for executing an action on a computer resource from the execution server and sequentially read by the corresponding monitoring agent.
  • the monitoring agent that monitors the data management queue, the execution queue, and the queue of the management queue. According to this, finer control becomes possible by monitoring the queue for each queue.
  • each server in the computer resource control system includes a monitoring agent that monitors the operating status of each server, and the management server determines whether or not an action is required for the computer resource based on the operating status of each server. to decide. According to this, it is possible to control computer resources such as starting and stopping of instances according to the operating status of each server.
  • the computer resource control system further includes a database server for storing data, a collection server for reading data collected from a plurality of monitoring agents from the message queue server and registering it in the database, and reading data stored in the database. It is preferable to include at least one of dashboard servers that edit and transmit to the user terminal device. According to this, the dashboard which displays a monitoring condition in real time to a user can be provided.
  • the database server, the collection server, and the dashboard server are each configured by a plurality of servers, and the action preferably includes control for these servers.
  • each server is configured by a virtual server, and an action configures a process that increases or decreases the number of servers that constitute a database server, a process that increases or decreases the number of servers that constitute a collection server, or a dashboard server At least one of the processes for increasing / decreasing the number of servers is included.
  • a fault tolerant system can be provided with a distributed structure having no single point of failure.
  • a computer resource control method is a method in which a processing apparatus included in a control system performs processing in a control system that controls computer resources.
  • the processing device determines whether or not to control the computer resource based on data collected from the monitoring agent, and executes control of the computer resource when it is determined that the computer resource needs to be controlled. Outputting instructions and exchanging data between the monitoring agent and the control system via a message queue.
  • the determining step determines whether or not the computer resource needs to be controlled based on data collected from the first monitoring agent that monitors the status of the message queue.
  • a computer resource control method is a method in which a processing apparatus included in a control system performs processing in a control system that monitors the status of computer resources and performs control according to the status.
  • the processing device compares data collected from a plurality of monitoring agents with a predefined control rule to determine whether or not an action is required for a computer resource, and when it is determined that an action is required for a computer resource And outputting an instruction for executing an action on the computer resource.
  • the control system is configured to include a plurality of servers, and at least one of the plurality of servers includes a monitoring agent.
  • system includes not only a system constituted by a physical computer but also a system virtually constructed on the computer.
  • computer resource includes all levels of hardware and software related to a computer, regardless of whether it is physically configured or configured virtually.
  • server may include both real servers and virtual servers.
  • queue may include any configuration having the feature that previously input data is output first.
  • FIG. 1 is a block diagram illustrating an embodiment of a computer resource control system. It is an example of a control rule. It is an example of a dashboard. It is a block diagram which shows the other Example of a computer resource control system. It is an example of the flowchart of the process in a computer resource control system.
  • FIG. 1 is a diagram showing a schematic configuration of a cloud computing environment (cloud environment) which is a premise of a computer resource control system according to the present invention.
  • a user terminal device 12 is connected to a cloud 10 via a network N.
  • the cloud 10 is a generic term for systems that provide users with computing resources such as software, hardware, and data storage areas as services through the network N.
  • the cloud 10 Including a plurality of server devices operated in the network. It can be said that the cloud 10 is a more comprehensive concept including ASP services, utility computing, grid computing, SaaS / PaaS, and the like.
  • the cloud 10 is on the other side of the network N and can be said to be a generic term for computer resources that provide some service to the user terminal device 12.
  • the present invention is applicable to all cloud environments including public clouds, private clouds, and hybrid clouds.
  • physical disks and physical servers distributed on the network in the cloud 10 are virtualized and logically managed.
  • the computer resource control system registers inactive resources among resources managed in a virtualized manner in the resource pool, and dynamically extracts resources from the resource pool in response to changing requests. Then, the computer resource control system assigns a task to the extracted resource, and ensures scalable service provision.
  • the user terminal device 12 is a terminal device for a user to use the cloud 10, and includes a connection environment to the network N and a browser that runs on the user terminal device 12.
  • Such user terminal devices 12 include personal computers (PCs), personal digital assistants (PDAs), tablet terminal devices, mobile phones, smartphones, and the like.
  • the network N is a communication line for transmitting and receiving data and the like between the cloud 10 and the user terminal device 12.
  • it may be any of the Internet, a LAN, a dedicated line, a packet communication network, a telephone line, a corporate network, other communication lines, combinations thereof, and the like, regardless of whether they are wired or wireless.
  • FIG. 2 is a diagram showing an overview of the virtualization technology and the decentralized technology which are the premise of the computer resource control system according to the present invention.
  • the physical computer device group 20 operates as if it is a single computer 22 while the functions and processing are distributed in the computer device group 20 by the decentralization technique.
  • the computer device group 20 operates as if an operating system (OS) 223 is running on a single virtual piece of hardware 222 via the network 221.
  • OS operating system
  • the computer device group 20 that operates like a single computer 22 can be virtually used as a plurality of computers (including servers) 24 by the virtualization technology.
  • the virtualization technology it is hardware conversion to hardware.
  • the virtualized server 24 is copied by this virtualization technology, replication of the same server can be created. Therefore, by copying the server image, the user can secure the necessary number of servers. It becomes like this. Further, in order to reduce the number of servers, the server image may be deleted.
  • KVM Kernel-based Virtual Machine
  • another virtual environment 226 such as Java Virtual Machine (JVM. Java is a registered trademark) runs, on which the middleware 227 and the application 228 run.
  • JVM Java Virtual Machine
  • the individual physical computers constituting the computer device group 20 include a processing device such as a CPU for controlling the operation and processing of the computer, a memory and storage device functioning as a work area for data storage and processing, and an input device. It is preferable to include an output interface, a communication interface, and a bus connecting them. Further, the computer device group 20 may be constituted by a single computer or may be constituted by a plurality of computers distributed on a network. Each computer causes each computer to function as various function realizing means by the processing device executing a predetermined program stored in a memory or a storage device.
  • FIG. 3 is a block diagram showing an example of a schematic configuration of the computer resource control system 1 according to the present invention.
  • the computer resource control system 1 according to the present embodiment includes a message network 32 and a processing unit 34 that processes data.
  • the computer resource control system 1 according to the present embodiment collects 341 monitoring data via a message network 32 from a monitoring agent 30 incorporated in a monitoring point to be monitored. Then, the computer resource control system 1 monitors the monitoring target 343 based on the collected monitoring data 342, predicts the demand 344 of the computer resource necessary for the monitoring target, and dynamically controls the computer resource to be monitored. 345.
  • the computer resource control 345 include increase / decrease in the number of virtual servers, job execution, program activation, various setting changes, workflow control, physical server control including power on / off, network device control, Any control over any computer resources in the cloud 10, such as coordination between servers, may be included.
  • the monitoring agent 30, the message network 32, and the processing unit 34 are all preferably configured inside the cloud 10.
  • the cloud 10 for constructing the computer resource control system 1 according to the present invention can be constructed in any environment regardless of whether it is a public cloud or a private cloud, as long as an API for managing server resources and the like is implemented. It is also possible to construct a combination of multiple environments. Amazon Web Services is an example of a cloud environment that can be constructed.
  • the monitoring agent 30 is a small software module and is incorporated in a monitoring point to be monitored, and the monitoring agent 30 collects monitoring information.
  • Examples of monitoring targets include computer resource monitoring in the computer resource control system 1, application monitoring, log file monitoring, process monitoring, and job monitoring. It can also be applied to monitoring unique sensor networks and factory lines.
  • the monitoring agent 30 transmits the collected monitoring information data to the message network 32.
  • the monitoring agent 30 can dynamically change the contents from the processing unit 34 side.
  • the message network 32 is for realizing data exchange between the monitoring agent 30 embedded in the monitoring point and the processing unit 34.
  • the message network 32 is for realizing data exchange between the monitoring agent 30 embedded in the monitoring point and the processing unit 34.
  • the number of monitoring targets becomes large, data may be lost due to the throughput on the processing unit 34 side.
  • the processing unit 34 collects data from the monitoring agent 30 and stores it in the database. Further, it has a function of controlling the computer resources to be monitored in the cloud environment based on the data from the monitoring agent 30.
  • the operation of the processing unit 34 is defined by DSL (Domain Specific Language) created by the user.
  • the data acquired by the monitoring agent 30 may include numerical data, character data, log files, and other data related to any event that has occurred in the monitoring target.
  • the term measurement value means any data acquired by the monitoring agent 30.
  • each element of the process part 34 is distributed like the below-mentioned Example. As a result, the processing unit 34 has a structure without a single point of failure, and has a structure that can compensate for performance degradation by increasing the number of virtualized computer resources.
  • FIG. 4 is a block diagram showing an embodiment of the computer resource control system 1 according to the present invention.
  • the computer resource control system 1 includes a message queue server 41, a collection server 42, a management server 43, an execution server 44, a database server 45, and a dashboard server 46. Including. Each of these servers is preferably distributed by a plurality of virtual servers having the same server image.
  • the computer resource control system 1 receives monitoring data from the monitoring agent 30 incorporated in the monitoring target application server 40 included in the monitoring target cluster 50 in the environment of the cloud 10. Further, the computer resource control system 1 provides a dashboard 48 that can be browsed by a browser to the user terminal device 12.
  • the message network 32 in FIG. 3 corresponds to the message queue server 41. 3 corresponds to the collection server 42, the management server 43, the execution server 44, the database server 45, and the dashboard server 46.
  • the computer resource control system 1 and the monitoring target application server 40 operate on the cloud 10.
  • the computer resource control system 1 can monitor a plurality of monitoring target application servers 40.
  • Each monitoring target application server 40 may be distributed by a plurality of replications. That is, each monitoring target application server 40 may be configured by a plurality of virtual servers, and the number of virtual servers constituting each monitoring target application server 40 may be dynamically changed. For example, assuming that there are 1,000 real servers to be monitored and 10 virtual servers are started up for each real server, the total number of virtual servers is 10,000. Assuming that there are 20 monitoring points for each virtual server, there are a total of 200,000 monitoring points. Further, the monitoring target is not limited to only an application server in a narrow sense that provides a service related to an application. It goes without saying that various servers, applications, processes, jobs, and other computer resources that exist in the environment of the cloud 10 can be monitored.
  • a monitoring agent 30 for measuring monitoring data is incorporated in the monitoring point of the monitoring target application server 40 to be monitored.
  • the user installs a monitoring agent program in an instance of the monitoring target application server 40 to be monitored in advance.
  • the computer resource control system 1 uses the monitoring agent 30 to manage monitoring targets in a predetermined logical unit called a monitoring target cluster 50.
  • the monitoring agent 30 includes one of a system agent 401 and a log file agent 402.
  • the system agent 401 is a module that manages a process being executed. The process being executed includes not only the OS and middleware but also applications.
  • the system agent 401 captures changes and behaviors occurring in the process, notifies the collected data to the computer resource control system 1 periodically or in response to a predetermined trigger, etc.
  • the log file agent 402 is a module that monitors files written in the monitoring target. There are many applications that use log files to monitor the status of applications, and by monitoring such logs, it becomes possible to monitor them according to the intention of the application developer.
  • the information collected by the log file agent 402 is notified to the computer resource control system 1 periodically or non-periodically in response to a predetermined trigger or the like, similar to the system agent 401.
  • the monitoring target it is preferable to collectively monitor from the OS level situation to middleware and applications such as JVM and monitoring target applications.
  • the monitoring agent 30 can determine the usage status of a specific service, middleware congestion, CPU load, to which server a job is assigned, the progress of each job, and the blacklisted server. Can be monitored. Further, by dynamically changing the monitoring point, the target to be monitored may be changed according to the operation status of the monitoring target. Thus, processing such as adding the inventory amount of the campaign product to the monitoring target during the campaign or switching the service content when the campaign product is sold can be performed without stopping the monitoring target.
  • the message queue server 41 asynchronously exchanges data between the monitoring agent 30 incorporated in the monitoring target application server 40 and the computer resource control system 1 and data exchange between services in the computer resource control system 1. Provides a message queue to do.
  • data exchange between the monitoring agent 30 and the collection server 42, the management server 43, the execution server 44, and the dashboard server 46, and the collection server 42, the management server 43, the execution server 44, and the dash Data exchange between the board server 46 is performed asynchronously via a message queue in the message queue server 41.
  • data exchange includes not only data exchange but also task exchange.
  • the message queue in the message queue server 41 includes a data collection queue 411, a data management queue 412, a management queue 413, and an execution queue 414.
  • Each queue preferably holds data in a first-in first-out (FIFO: First In First Out) list structure.
  • FIFO First In First Out
  • Each queue can have a redundant configuration, and message queue data can be prevented from being lost by performing inter-queue communication.
  • Data collected from the monitoring agent 30 to be monitored is sequentially input to the data collection queue 411 and read sequentially by the collection server 42.
  • Data collected from the monitoring agent 30 to be monitored is sequentially input to the data management queue 412 and read sequentially by the management server 43.
  • data (including tasks) for controlling the monitoring target server is sequentially input from the management server 43, the execution server 44, and the dashboard server 46, and the management target monitoring agent 30 sequentially Read out.
  • Execution instructions such as instance activation control and warning transmission are sequentially input from the management server 43 to the execution queue 414 and sequentially read by the execution server 44.
  • the collection server 42 executes processing for registering data transmitted from the monitoring agent 30 in the distributed database.
  • the collection server 42 sequentially retrieves data input to the data collection queue 411 and passes it to the database server 45.
  • the management server 43 refers to the control rules set in advance based on the data transmitted from the monitoring agent 30, and performs a process of determining whether or not an action such as instance activation control or warning transmission is necessary.
  • the management server 43 sequentially takes out the data input to the data management queue 412 and compares it with a preset control rule.
  • the control rule may include a plurality of control rules.
  • Each control rule preferably includes a definition of a server group to be managed, a setting of a threshold value of information collected by the monitoring agent, and a definition of control contents when the threshold value is exceeded.
  • Each control rule may include a change in the setting contents of the monitoring agent.
  • control rule a rule defined by the computer resource control system 1 in advance as a default may be used, or a user may define it in advance.
  • a rule editor is provided for the user to define control rules.
  • This rule editor can set the control rules of the cluster that is the unit to be monitored, such as control for planned fluctuation, control for passive fluctuation according to data, monitoring agent setting change, warning setting, etc.
  • Various control rules can be set according to the conditions. Since the control rule can be described using, for example, a domain specific language (DSL) based on Ruby, it can be described with an intuitive and easy-to-understand rule. Further, the control rule may be set by a graphical editor. In this case, even a user who is not familiar with DSL can intuitively write the rule.
  • DSL domain specific language
  • FIG. 5 is an example of a control rule.
  • the example in the figure defines the condition and control contents “If the pending thread has been in a state exceeding the specified state for more than 5 seconds within the instance, increase the number of instances from the same server image.”
  • a control rule for planned fluctuations “When what time of day, how many minutes, how many servers here. And when time comes, set the number of servers to the original number. You can stipulate the contents such as "Return.”
  • a control rule for passive fluctuation based on collected data for example, it can be defined that the number of servers to be allocated is increased or decreased based on the amount of data to be processed.
  • set different scale rules for control rules for each application If you want to switch the service content when the product is sold out, set a rule that monitors the application and changes the system configuration.
  • the management server 43 autonomously learns how the increase or decrease in resources has an effect on performance, obtains an optimal control solution, and rewrites the control rule.
  • the management server 43 compares the data collected from the monitoring agent 30 with the control rule, and determines whether or not an action is required for a computer resource such as a server in the monitored system. In other words, when the monitoring data does not satisfy the condition defined in the control rule, the management server 43 determines that no action is required. On the other hand, when the monitoring data satisfies the condition stipulated in the control rule, the management server 43 determines that an action is necessary, and the action of the control content specified in the control rule is defined in the control rule. An instruction to execute to a computer resource such as a managed server is output. This instruction is input to the execution queue 414.
  • the execution server 44 performs processing for executing specific actions such as starting and stopping of instances.
  • the execution server 44 sequentially reads out action instructions from the execution queue 414, and executes various controls on predetermined computer resources (including the monitoring target application server 40) defined by the control rules in accordance with the instructions. .
  • As the content of control it is preferable to correspond to a wide range of control levels from the system level to the cloud level.
  • system level control there is a method call of a specific function of an application, an internal variable change, or the like.
  • Examples of control at the cloud level include starting, duplicating, and stopping an instance, changing an allocated resource, and changing a setting of a starting instance.
  • the virtual server can be activated / replicated / deleted or the virtual server setting can be changed.
  • the database server 45 is a database that stores various data including data collected by the monitoring agent 30.
  • This database is preferably not a mechanism based on a single database server but a distributed database structure in which a plurality of servers cooperate to increase performance. Thereby, even if the amount of data becomes enormous, the capacity can be accommodated by increasing the number of participating servers.
  • a database without a single point of failure is constructed.
  • the database performance can be maintained by adding the number of servers.
  • a distributed KVS server is applicable.
  • a distributed KVS server adds an arbitrary label (Key) to the data (Value) to be saved, saves a (Key, Value) pair, and specifies the label (Key) when retrieving the saved data Thus, corresponding data (Value) is acquired, and KVS is an abbreviation for Key-Value Store. It is a scale-out type in which data is distributed and stored on multiple servers, and a large amount of data can be handled by adding servers.
  • the database server 45 is preferably a KVS database server, but is not limited to KVS, and may use a database server of another method.
  • the dashboard server 46 is a server that provides the user terminal device 12 with a dashboard 48 for displaying various types of information to the user and accepting operations from the user.
  • the screen of the dashboard 48 is for displaying information such as predetermined monitoring items on the client device, and its appearance and function are important.
  • the screen of the dashboard 48 includes monitoring of system monitoring status and job execution status, as well as batch job flow by DSL and monitoring and control of the real-time system.
  • the user and the computer resource control system 1 Functions as an interface.
  • the dashboard server 46 receives access via the web service from the user terminal device 12, reads the data stored in the database server 45, edits the screen of the dashboard 48, and transmits the screen information to the user terminal device 12. To do.
  • the operation manager performs system monitoring, configuration management, and control settings on the dashboard 48.
  • the dashboard server 46 displays a warning on the dashboard 48 and sends an email notification when the monitoring information exceeds a preset threshold value. Thereby, the user can monitor efficiently.
  • dashboard server 46 it is possible to provide display and operation in accordance with a user request by exchanging the dashboard server 46. This makes it easy to sell the computer resource control system as part of another system or to sell it as an OEM.
  • FIG. 6 is an example of the dashboard 48.
  • Examples of the displayed screen include a metrics view, a system configuration view, a job net monitoring view, a log monitoring view, and a notification list.
  • the metrics view is a screen for monitoring data transmitted from the monitoring agent 30 in real time. A graph corresponding to the measurement item is displayed, and the graph is updated in real time. In addition, past data may be displayed.
  • the system configuration view is a screen for monitoring the system configuration to be monitored from a bird's-eye view. It displays the operating status of each server, displays the operating status of processes running in the server, displays process dependencies between servers, and so on. When the server resource exceeds the threshold or an error occurs in the application, it is displayed so that it can be detected in the system configuration view.
  • the job net monitoring view is a screen for monitoring the execution status of the batch job net managed by the computer resource control system 1. The icon color is changed according to the execution status to visually indicate the execution status.
  • the log monitoring view is a screen that monitors the output contents of the log file and browses the hit locations that should be monitored. Used for application error detection and batch job progress monitoring.
  • the user can freely set what and how the dashboard 48 displays.
  • the manager can realize a console function from the manager's viewpoint, and the system administrator can realize a user interface that meets the needs of the user, such as a monitoring control console for system operation.
  • FIG. 7 is a block diagram showing another embodiment of the computer resource control system 1 according to the present invention.
  • the configuration of the embodiment shown in the figure is almost the same as that of FIG. 4 except that the monitoring agent 30 is incorporated in the server in the computer resource control system 1.
  • each server in the computer resource control system 1 is distributed by a plurality of virtual servers having the same server image. That is, the computer resource control system 1 includes a plurality of message queue servers 41, a plurality of collection servers 42, a plurality of management servers 43, a plurality of execution servers 44, a plurality of database servers 45, a plurality of dashboards, Server 46.
  • the number of servers having the same server image may temporarily become one when a failure occurs.
  • the monitoring agent 30 incorporated in the server in the computer resource control system 1 collects the monitoring data collected from the monitoring target server in the data collection queue of the message queue server 41. 411 and the data management queue 412.
  • the subsequent processing is the same as in FIG.
  • the management server 43 performs the same processing on the data collected from the monitoring agent 30 incorporated in the server in the computer resource control system 1 as the data collected from the monitoring agent 30 incorporated in the monitoring target application server 40.
  • the necessity of action for the computer resource is determined with reference to a plurality of predefined control rules.
  • the execution server 44 controls computer resources in the computer resource control system 1 in accordance with fluctuations in the data amount and processing amount for each cluster.
  • the execution server 44 executes processing such as increasing / decreasing the number of each server in the computer resource control system 1 to maintain an optimal system configuration.
  • the monitoring agent 30 incorporated in the message queue server 41 receives the amount of data input to each queue in the message queue server 41, that is, the data collection queue 411, the data management queue 412, the management queue 413, and the execution queue 414. Monitor the amount of queues. Further, the separate monitoring agent 30 incorporated in each of the message queue server 41, the collection server 42, the management server 43, the execution server 44, the database server 45, and the dashboard server 46 is connected to each server. Monitor the operating status.
  • a rule for increasing or decreasing the number of servers of the collection server 42 according to the state of the data collection queue 411 is defined.
  • the instance activation control that is, the replication (replication) of the collection server 42 is created to increase the number of virtual servers.
  • the control content are defined.
  • the instance stop control that is, the condition and the control content that the predetermined number of replications of the collection server 42 are discarded (deleted) and the number of virtual servers is reduced.
  • Similar control rules are defined for other queues.
  • a rule for increasing or decreasing the number of management servers 43 in accordance with the state of the data management queue 412 is defined. That is, when the queue amount of the data management queue 412 exceeds a predetermined threshold, a predetermined number of replications of the management server 43 are created to increase the number of virtual servers, while the queue amount is equal to the predetermined queue amount. When the threshold value is less than or equal to the threshold, a condition and control contents are defined such that a predetermined number of replications of the management server 43 are discarded and the number of virtual servers is reduced. Also, rules for increasing or decreasing the number of execution servers 44 according to the state of the execution queue 414 are defined.
  • a predetermined number of replications of the execution server 44 are created to increase the number of virtual servers, while the queue amount is equal to the predetermined threshold value.
  • a condition and control contents are defined such that a predetermined number of replications of the execution server 44 are discarded and the number of virtual servers is reduced.
  • a rule for increasing or decreasing the number of arbitrary servers according to the state of the management queue 413 is defined.
  • a predetermined number of replications of the specific server are created, and the number of virtual servers
  • a condition and control contents are defined such that a predetermined number of replications of the specific server are discarded and the number of virtual servers is reduced.
  • a rule for increasing or decreasing the number of message queue servers 41 may be defined according to the status of the entire queue in the message queue server 41.
  • a rule for dynamically controlling server replication or destruction is defined according to the operating status of each server. That is, when the operating status of a server exceeds a predetermined threshold, replication of the server is created, and when the operating status is equal to or lower than the predetermined threshold, the server is discarded. Further, when the server shows an abnormal behavior, a warning is notified to the user. If the problem persists, the server may be restarted by discarding the server and creating a new server replication.
  • the computer resource control system 1 dynamically controls the computer resources included in the computer resource control system 1 by the same mechanism as the mechanism for monitoring the monitoring target system in the cloud environment. Is possible.
  • the computer resource control system 1 not only grasps the status of the monitoring target such as the monitored application server 40 in real time and controls it without delay, but also grasps the status of the computer resource control system 1 itself in real time without delay. Will be able to control.
  • this embodiment is composed of a distributed structure without a single point of failure, even if a failure occurs somewhere in a single function, the structure does not go down as a whole. Moreover, it has a structure that can dynamically cope with a planned or sudden increase in load, and increases the number of servers constituting the computer resource control system 1 in response to an increase in users and monitoring targets. Etc., and configured to maintain a service level by controlling computer resources.
  • the computer resource control system 1 is preferably provided as an API (Application Program Interface).
  • FIG. 8 is an example of a flowchart of processing in the computer resource control system 1.
  • the monitoring agent 30 embedded in the monitoring point in the cloud environment collects monitoring data and transmits it to the message queue server 41 (S81). Note that the monitoring agent 30 continues to send monitoring data to the message queue server 41 periodically or irregularly.
  • the message queue server 41 puts the received data into the data collection queue 411 and the data management queue 412.
  • the collection server 42 sequentially reads the monitoring data from the data collection queue 411 and registers the monitoring data in the data store of the database server 45 (S82). After completing the registration of the monitoring data, the collection server 42 reads the next monitoring data from the message queue, and repeats the process of S82.
  • the dashboard server 46 creates a dashboard 48 for browsing the status of the monitoring target in response to a request from the user, and transmits it to the user terminal device 12 via the network N (S83).
  • the user terminal device 12 displays the received dashboard 48 on the browser (S84).
  • the management server 43 reads the monitoring data from the data management queue 412 and compares it with the control rule (S85) to determine whether an action is required for the computer resource (S86). If the monitoring data does not satisfy the conditions defined in the control rule, it is determined that no action is required (S86: No). On the other hand, if the monitoring data satisfies the conditions defined in the control rule, it is determined that an action is necessary (S86: Yes), and a specific action instruction is transmitted to the execution queue 414 of the message queue server 41. (S87). Thereafter, the management server 43 reads the monitoring data from the data management queue 412 again, and repeats a series of processes from S85 to S87.
  • S85 control rule
  • the execution server 44 reads an action instruction from the execution queue 414 and transmits processing data for executing a specific action for the computer resource, such as start or stop of the instance, to the management queue 413 of the message queue server 41 ( S88). Thereafter, the execution server 44 reads the action instruction again from the execution queue and repeats the process of S88.
  • the processing data input to the management queue 413 is sequentially read by the monitoring agent 30 that is the target of the action, and actions such as server duplication and destruction are executed (S89).
  • the present invention is not limited to the above-described embodiment, and can be implemented in various other forms without departing from the gist of the present invention. For this reason, the said embodiment is only a mere illustration in all points, and is not interpreted limitedly.
  • the above-described processing steps can be executed in any order or in parallel as long as there is no contradiction in the processing contents.
  • the collection server 42, the management server 43, and the execution server 44 are configured by different servers, but any two or all of these servers are included. Needless to say, the functions may be configured in one server. For example, a processing server in which the collection server 42 and the management server 43 are combined into one may be provided, and this processing server may be distributed by a plurality of virtual servers having the same server image. Further, a processing server in which the management server 43 and the execution server 44 are combined may be provided, or a processing server in which the collection server 42, the management server 43, and the execution server 44 are combined may be provided. Then, the monitoring agent 30 may be incorporated in these processing servers, and the computer resource control system 1 may monitor and control the processing servers.
  • the monitoring agent 30 includes one of the system agent 401 and the log file agent 402.
  • the monitoring agent 30 is not limited to these, and an agent that monitors an arbitrary event is used. You may apply.
  • 1 computer resource control system 10 cloud, 12 user terminal device, 20 computer device group, 30 monitoring agent, 32 message network, 34 processing unit, 40 monitored application server, 41 message queue server, 42 collection server, 43 management server, 44 execution servers, 45 database servers, 46 dashboard servers, 48 dashboards, 50 monitored clusters, N networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computing Systems (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Environmental & Geological Engineering (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Debugging And Monitoring (AREA)

Abstract

L'invention porte sur un système qui permet d'identifier et de commander en temps réel l'état d'une ressource informatique incluse dans un système qui est surveillé. L'état du système de gestion lui-même peut également être identifié et commandé. Ce système de gestion de ressource informatique comprend une pluralité de serveurs comprenant : un serveur d'administration servant à déterminer si une action est requise ou non pour la ressource informatique sur la base de données collectées à partir d'un agent de surveillance ; et un serveur d'exécution servant à délivrer une instruction d'exécution de l'action pour la ressource informatique lorsqu'il a été déterminé qu'une action est requise pour la ressource informatique. Au moins l'un des serveurs dans le système de gestion de ressource informatique comprend l'agent de surveillance.
PCT/JP2011/073842 2010-10-15 2011-10-17 Système de gestion de ressource informatique WO2012050224A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2010232513A JP4811830B1 (ja) 2010-10-15 2010-10-15 コンピュータリソース制御システム
JP2010-232513 2010-10-15

Publications (1)

Publication Number Publication Date
WO2012050224A1 true WO2012050224A1 (fr) 2012-04-19

Family

ID=45044185

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2011/073842 WO2012050224A1 (fr) 2010-10-15 2011-10-17 Système de gestion de ressource informatique

Country Status (2)

Country Link
JP (1) JP4811830B1 (fr)
WO (1) WO2012050224A1 (fr)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2024004102A1 (fr) * 2022-06-29 2024-01-04 楽天モバイル株式会社 Détermination d'état d'un système de communication sur la base de données de valeur d'indice de performance stockées dans une file d'attente

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9992024B2 (en) 2012-01-25 2018-06-05 Fujitsu Limited Establishing a chain of trust within a virtual machine
EP2862077A4 (fr) 2012-06-15 2016-03-02 Cycle Computing Llc Procédé et système de détection et de résolution automatiques de défauts d'infrastructure dans une infrastructure de nuage
US9002982B2 (en) 2013-03-11 2015-04-07 Amazon Technologies, Inc. Automated desktop placement
US10313345B2 (en) 2013-03-11 2019-06-04 Amazon Technologies, Inc. Application marketplace for virtual desktops
US10142406B2 (en) 2013-03-11 2018-11-27 Amazon Technologies, Inc. Automated data center selection
JP6186817B2 (ja) * 2013-04-05 2017-08-30 富士通株式会社 情報処理装置、情報処理プログラム及び情報処理方法
US10623243B2 (en) 2013-06-26 2020-04-14 Amazon Technologies, Inc. Management of computing sessions
US20150019705A1 (en) * 2013-06-26 2015-01-15 Amazon Technologies, Inc. Management of computing sessions
JP7030412B2 (ja) * 2017-01-24 2022-03-07 キヤノン株式会社 情報処理システム、及び制御方法
JP2021026577A (ja) * 2019-08-07 2021-02-22 三菱電機株式会社 制御装置、演算装置、制御方法、及び制御プログラム

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2000358068A (ja) * 1999-06-15 2000-12-26 Nec Corp インテリジェントネットワークの輻輳制御システム
JP2004288183A (ja) * 2003-03-21 2004-10-14 Hewlett-Packard Development Co Lp コンピューティングリソースを自動的に割り振る方法
JP2006268193A (ja) * 2005-03-22 2006-10-05 Fuji Xerox Co Ltd 管理システム、管理センタ、管理方法
JP2008077266A (ja) * 2006-09-20 2008-04-03 Nec Corp サービス制御装置、分散サービス制御システム、サービス制御方法、及び、プログラム
JP2010033292A (ja) * 2008-07-28 2010-02-12 Nippon Telegraph & Telephone West Corp 仮想サーバリソース調整システム、リソース調整装置、仮想サーバリソース調整方法、及び、コンピュータプログラム

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH11143843A (ja) * 1997-11-06 1999-05-28 Hitachi Ltd 複数ノード構成システムの稼働状態管理方法
JP2002073576A (ja) * 2000-08-31 2002-03-12 Toshiba Corp バッチジョブ制御システム
JP3879471B2 (ja) * 2001-10-10 2007-02-14 株式会社日立製作所 計算機資源割当方法
JP2003281007A (ja) * 2002-03-20 2003-10-03 Fujitsu Ltd 動的構成制御装置および動的構成制御方法
JP2006011860A (ja) * 2004-06-25 2006-01-12 Fujitsu Ltd システム構成管理プログラム及びシステム構成管理装置
JP2007133453A (ja) * 2005-11-08 2007-05-31 Hitachi Software Eng Co Ltd メッセージキューイングサーバ及びその監視方法

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2000358068A (ja) * 1999-06-15 2000-12-26 Nec Corp インテリジェントネットワークの輻輳制御システム
JP2004288183A (ja) * 2003-03-21 2004-10-14 Hewlett-Packard Development Co Lp コンピューティングリソースを自動的に割り振る方法
JP2006268193A (ja) * 2005-03-22 2006-10-05 Fuji Xerox Co Ltd 管理システム、管理センタ、管理方法
JP2008077266A (ja) * 2006-09-20 2008-04-03 Nec Corp サービス制御装置、分散サービス制御システム、サービス制御方法、及び、プログラム
JP2010033292A (ja) * 2008-07-28 2010-02-12 Nippon Telegraph & Telephone West Corp 仮想サーバリソース調整システム、リソース調整装置、仮想サーバリソース調整方法、及び、コンピュータプログラム

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2024004102A1 (fr) * 2022-06-29 2024-01-04 楽天モバイル株式会社 Détermination d'état d'un système de communication sur la base de données de valeur d'indice de performance stockées dans une file d'attente

Also Published As

Publication number Publication date
JP2012088770A (ja) 2012-05-10
JP4811830B1 (ja) 2011-11-09

Similar Documents

Publication Publication Date Title
JP4811830B1 (ja) コンピュータリソース制御システム
US7992032B2 (en) Cluster system and failover method for cluster system
JP5440273B2 (ja) スナップショット管理方法、スナップショット管理装置、及びプログラム
JP5140633B2 (ja) 仮想化環境において生じる障害の解析方法、管理サーバ、及びプログラム
JP4920391B2 (ja) 計算機システムの管理方法、管理サーバ、計算機システム及びプログラム
US9760413B2 (en) Power efficient brokered communication supporting notification blocking
CN108369544B (zh) 计算系统中延期的服务器恢复方法和设备
JP4609380B2 (ja) 仮想サーバ管理システムおよびその方法ならびに管理サーバ装置
US11157373B2 (en) Prioritized transfer of failure event log data
JP2008293117A (ja) 仮想計算機の性能監視方法及びその方法を用いた装置
EP2645635B1 (fr) Moniteur de grappe, procédé permettant de surveiller une grappe et support d'enregistrement lisible par ordinateur
JP5427504B2 (ja) サービス実行装置、サービス実行方法
KR102176028B1 (ko) 실시간 통합 모니터링 시스템 및 그 방법
US10540202B1 (en) Transient sharing of available SAN compute capability
US11853383B2 (en) Systems and methods for generating a snapshot view of virtual infrastructure
JP2013117889A (ja) 広域分散構成変更システム
JP2012243096A (ja) ゲストos管理装置、ゲストos管理方法及びゲストos管理プログラム
JP5360000B2 (ja) 仮想サーバ管理システムおよびその方法ならびに管理サーバ装置
JP2012089109A (ja) コンピュータリソース制御システム
JP6065843B2 (ja) サービスレベル管理装置、プログラム、及び、方法
WO2019241199A1 (fr) Système et procédé de maintenance prédictive de dispositifs en réseau
JP2020038506A (ja) 情報処理システム、情報処理方法、及び、プログラム
JP4883492B2 (ja) 仮想マシン管理システムおよび計算機、並びに、プログラム
WO2022009438A1 (fr) Dispositif, système, procédé de commande de système, et programme de maintenance de serveur
CA2504336A1 (fr) Methode et dispositif de creation d'un systeme de controleur autonome

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 11832649

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC DATED 26.07.2013

122 Ep: pct application non-entry in european phase

Ref document number: 11832649

Country of ref document: EP

Kind code of ref document: A1