Detailed Description
In one embodiment, a method for dynamically partitioning a x86 server into hard partitions is suitable for an x86 server configured with a plurality of multicore Intel Xeon processors and a large amount of memory. Specifically, x86 can be used as a hardware platform, and a Linux operating system and an Oracle Database 12c Database are adopted to realize the x86 server dynamic hard partition technology. As shown in fig. 1, the method comprises the steps of:
step S110: and carrying out partition initialization processing on corresponding nodes by controlling a core engine of each node of an Oracle database deployed in an x86 server in the swarm.
The Control Groups (CGroup) is a mechanism provided by the Linux kernel that can limit, record, and isolate material resources (e.g., cpu memory i/o) used by process Groups (process Groups). Specifically, CGroup is a Linux kernel function that performs packetized management of an arbitrary process, and is itself an infrastructure that provides a function and an interface that performs packetized management of a process, and a specific resource management function such as I/O or memory allocation control is realized by this function. These specific resource management functions are referred to as CGroup subsystems or controllers. The CGroup subsystem comprises a Memory controller for controlling the Memory, a CPU controller for controlling the process scheduling and the like. The CGroup subsystem that the running kernel can use is identified by/proc/CGroup. CGroup provides a CGroup virtual file system as a user interface for group management and subsystem setup. To use the CGroup file system must be mounted, which subsystem is used is now specified by the mount option. The types of files supported by CGroup are shown in table 1.
TABLE 1
In CGroup, a task is a process of the system. A control population is a set of processes that are divided according to some criteria. The resource control in CGroup is realized by taking a control group as a unit. A process can join a control group and also migrate from one process group to another control group. The processes of a process group may use the resources allocated by CGroup in control clusters, subject to the limitations set by CGroup in control clusters.
The control clusters can be organized in a hierarchy (hierarchy), i.e., a control cluster tree. The child control clan on the control clan tree is a child of the parent control clan and inherits the specific attributes of the parent control clan. A subsystem is a resource controller, for example, a CPU subsystem is a controller that controls the allocation of CPU time. Subsystems must be attached to a hierarchy to function, and after a subsystem is attached to a hierarchy, all control groups at that hierarchy are controlled by that subsystem. Each time a new hierarchy is created in the system, all tasks in the system are initial members of a default CGroup (which may be referred to as a root CGroup, which is automatically created when the hierarchy is created, and which is later created in the hierarchy and is a descendant of this CGroup) for that hierarchy. At most, one subsystem can be added to only one hierarchy level, and a plurality of subsystems can be added to one hierarchy level. A task may be a member of multiple cgroups, but the cgroups must be at different levels. When a process (task) in the system creates a sub-process (task), the sub-task automatically becomes a member of the CGroup where the parent process is located. The subtask can then be moved to a different CGroup as needed, but it always inherits the CGroup of its parent task at the beginning.
In this embodiment, the Oracle Database is an Oracle Database 12c, a property of processor _ group (Oracle, Oracle Database 12c Release 1(12.1.0.1) New Features,2016) is introduced, and by using this property, the Oracle Database 12c can be tightly combined with Linux or Solaris, so that by using a processor set of an operating system and its related resources, the binding of the Database instance process and the memory to a specific processor and its memory resource of the operating system level is realized, and this property is mainly limited to a NUMA (Non-uniform memory access architecture) architecture adopted by the current x86 server design. NUMA is a memory designed for a multiprocessor computer, and memory access times depend on the location of the memory relative to the processor. Under NUMA, a processor accesses its own local memory more quickly than non-local memory (memory located on another processor or shared between processors). NUMA is characterized in that: the shared memory is physically distributed, and the set of all of this memory is the global address space. The processor accesses these memories at different times, obviously faster than accessing the global shared memory or remotely accessing the foreign memory. Additionally, memory in NUMA may be hierarchical: local memory, shared memory in the cluster, and global shared memory.
Based on current Intel Xeon processor technology, a current 4-way server may be configured with 64 cores, 128 threads; if a two-way server is possible, 32 core, 64 threads are configured. For many small and medium-sized enterprises, if a single database is operated on such a platform, the utilization rate is usually insufficient, and business integration is inevitable. The current core number of an 8-way server may be 192 cores, 384 threads. By combining common 4-way and 8-way server configurations, the current hardware processing capacity is equal to that of many UNIX servers in performance, and even exceeds that of the UNIX servers. However, performance surpassing cannot bring a stable operation environment for an enterprise, and a widely-used partitioning technology on a traditional UNIX server has strong practicability, because the technology can provide necessary isolation for enterprise application on the basis of fully utilizing hardware resources, so that the stability of the enterprise application is ensured.
When the Linux operating system and the Oracle database 12c run on the x86 server for service integration, dynamic hard partition of the database instance is realized to ensure the service quality of the database. The dynamic hard partition refers to that a database instance deployed on the same x86pc server exclusively occupies one data CPU, or one or more computing cores in one CPU, and when the exclusive computing resources are consumed, the database instance does not occupy the computing resources allocated by other database instances, so that the database service quality on the x86 server is guaranteed.
Specifically, the core engines in the control family are arranged at each node of the Oracle database, and the number of the core engines corresponds to the number of the nodes. The core engine belongs to a microkernel product, namely all components are built in the core engine, and the components are tightly coupled, so that the stability and reliability of the core engine are ensured finally. The core engine is responsible for the initialization of the server computer resource controller and ensures the availability of each computing resource during the operation.
Step S130: and carrying out load monitoring on the partition subjected to the initialization processing of the corresponding node through a core engine to obtain monitoring data.
And monitoring the loads of all partitions in the current node by using the core engine, wherein the acquired monitoring data is used as a basis for subsequent partition adjustment.
Step S140: and evaluating the state of the corresponding partition by the core engine according to the monitoring data and the stored management strategy to obtain an evaluation result.
The management strategy can be received in advance for storage, and a new management strategy can be received in real time for updating the stored strategy. The kind of the management policy is not unique, and the management policy may include a unified policy and a separate policy, depending on the applicable scope. The unified policy means that the policy is common to all nodes in the cluster, and each change of the policy must be copied to the whole cluster to ensure that each partition (or database instance) has consistent computing resources. An individual policy means that the policy is only valid on a single server and the policy does not have to be replicated to the entire cluster. From the use point of view, the management policies may include an expansion policy and a contraction policy. The extension policy refers to that, if the computing resource of the current partition reaches a preset upper threshold, the computing resource of the current partition generally needs to be extended for stable operation of the service, and the extension of the computing resource generally migrates according to the upper threshold. The contraction policy means that if the computing resource of the current partition is in an idle state, the computing resource of the current partition needs to be contracted when the computing resource is reserved for other partitions of the current node, and the contraction of the computing resource is usually performed according to a lower threshold value. The contraction policy is not a simple inverse operation of the expansion policy, and the contraction policy requires the release of resources by stopping the database instance before releasing the resources.
And the expansion or contraction is determined by the core engine in combination with the monitoring data to obtain an evaluation result, and then final expansion or contraction is carried out according to the evaluation result. In one embodiment, the management policy includes a correspondence between a partition status and a threshold value, and step S140 includes step 142 and step 144.
Step 142: and obtaining the target state of the corresponding partition according to the monitoring data and the corresponding relation between the partition state and the threshold value.
The expansion and contraction of the partition computing resources depend on a threshold value set by a policy manager, and the setting of the threshold value generally depends on the hardware configuration of the current node and the operation and maintenance monitoring specification. The target state refers to the state that the partition needs to be migrated. For example, the partition status includes A, B and C, and the corresponding threshold values are a, b, and C, respectively. And if the acquired monitoring data is matched with the threshold value C, the target state of the partition is C.
Step 144: and obtaining the evaluation result of the corresponding partition according to the current state and the target state of each partition.
Similarly, taking the target state of the partition as C as an example, if the current state of the partition is a, the obtained evaluation result is a- > C, and the partition needs to be migrated from the a state to the C state.
Step S150: and dynamically adjusting the corresponding partition according to the evaluation result through the core engine.
And after the evaluation result of each partition is obtained, the core engine directly dynamically adjusts the corresponding partition according to the evaluation result. Specifically, corresponding to the management policy may include an expansion policy and a contraction policy, step S150 includes: and performing expansion or contraction adjustment on the computing resources of the corresponding partitions according to the evaluation result.
In one embodiment, as shown in fig. 2, after step S110 and before step S130, the method further comprises step S120.
Step S120: and performing isolation test on the initialized partitions through the core engine.
After the partitions are configured, isolation tests of the partitions are usually required, and the following scripts can be independently run in each partition to verify the high isolation of the partitions, so that the reliability of the dynamic hard partitions of the x86 server is improved. And (S130) performing isolation test on the initialized partitions by using the core engine, and performing the isolation test after the isolation test is passed.
In one embodiment, after step S140, the method further includes the step of transmitting messages between core engines through a message bus in the control cluster.
The control family further comprises a message bus connected with each core engine, the message bus is responsible for message transmission among all nodes, and the message transmission specifically comprises transmission of a unified strategy, so that consistency of the strategy in the cluster is ensured. The message passing may also include passing partition migration instructions to ensure consistency of the partition state of other nodes within the cluster. It can be understood that if the system has only one node, a single core engine is configured, and a message bus does not need to be deployed; the message bus needs to be deployed for message passing between the core engines only when the cluster environment is deployed.
In order to better understand the above x86 server dynamic hard partition method, the following detailed explanation is made in conjunction with specific embodiments.
The implementation of the dynamic hard partition depends on CGroup of a Linux operating system and an important characteristic of a Processor _ Group _ Name of an Oracle database 12c, when the CGroup and the Oracle database are combined, a dynamic hard partition function similar to that of a UNIX server can be implemented on an x86 server, and the implementation architecture diagram of the dynamic hard partition technology is shown as 3. And deploying core engines of the dynamic hard partitions on each node, and transmitting messages among the core engines through a message bus outside the node. The X86 server dynamic hard partition is driven and managed by a core engine, the architecture of which is shown in fig. 4. The core engine includes a monitor, a state machine, a policy manager, and an executor. The monitor is responsible for monitoring the load of all partitions in the current node, the state machine evaluates the state of each partition in combination with a configured strategy, and the executor is responsible for executing the final evaluation result of the state machine on the capacity of each partition; the policy manager is responsible for configuration of the policies and replication of the policies within the entire cluster, ensuring consistency of the policies within the cluster. The core engine of each node is in an equal position, and the operation initiated by each engine can be broadcast or copied to the engines of other nodes.
Before the CGroup is used, the CGroup needs to depend on a cpu set subsystem to normally run, a core engine is responsible for initializing the subsystem, and the specific script is as follows:
#mkdir/etc/cpuset
#touch/etc/cpuset.conf
#mkdir/dev/cpuset
#mount–t cpuset cpuset/dev/cpuset
#load_partition
#service cgconfig start
#service cgred start
the load _ partition is responsible for reading the correct configuration of the partition from/etc/cpu set.
The core engine is responsible for computing the memory in the resource set to be in an exclusive state, ensuring the memory to be in a non-migratable state, and ensuring the stability and the exclusivity of the running state of the database instance, wherein the specific script is as follows:
#echo 1>/dev/cpuset/PARTITION/mem_exclusive
#echo 1>/dev/cpuset/PARTITION/memory_migrate
after the partition is initialized, the database instance can monopolize computing resources (CPU and memory) in an exclusive state, and even if the instance consumes the resources in the partition, the instance cannot apply for the resources of other partitions, and the specific script is as follows:
SQL>alter system set processor_group_name=ORACLE scope=spfile;
SQL>shutdown immediate
SQL>startup
the core engine does not introduce other third party business software and therefore does not add additional investment costs.
The monitor is responsible for monitoring and storing the load of each partition. The load monitoring adopts a system monitoring tool sar which is defaulted by the system; the method comprises the steps that a plurality of partitions usually exist on the same node, a cpu set value of the monitored partitions is obtained firstly when load is monitored, then monitoring is carried out, and finally collected load data are stored in a local workload library.
If the system is not configured with a system activity monitoring collector, the monitoring collector needs to be customized. The monitor monitoring is the first step to acquire partition information defined in the system, and can be obtained through the following scripts:
#!/bin/bash
cd/cgroup/cpuset
find./-type d|egrep-o"[[:alnum:]]"*
and then acquiring cpu information in the partition:
#!/bin/bash
cgget-g cpuset:$1|grep cpuset.cpus|cut-d':'–f2
note that the result of the fetch is the start value of a cpu core, which requires further processing
And further monitoring according to the acquired cpuiset information. For example as shown by the following commands:
#sar– P 0,1 5
if the system is configured with the system activity monitoring collector, the monitor can periodically analyze the monitoring results of the sar to obtain relevant results.
The policy manager is responsible for managing the monitoring policies and managing and replicating the expansion and contraction policies of the partitioned computer resources, and in view of the fact that different service databases may be integrated in a cluster, that is, the loads of nodes in the same cluster may not be consistent, the policy manager should distinguish different policies.
The expansion and contraction of the partition computing resources depend on a threshold value set by a policy manager, and the setting of the threshold value generally depends on the hardware configuration of the current node and the operation and maintenance monitoring specification.
As shown in table 2, in the partition state transition and normal operation and maintenance monitoring index relation table set in this example, the threshold value is increased or decreased by 25%. After reaching the preset threshold, the computational core usually needs to be expanded or contracted by a certain amount.
Status of state
| Threshold value | 1
|
Threshold value of 2
|
Threshold value of 3
|
S1
|
≤25%
|
|
|
S2
|
|
≤50%
|
|
S3
|
|
|
≤75% |
TABLE 2
As shown in table 3, the partition expansion set in this example is expanded according to the baseline progressive policy, mainly considering that the actual requirement of the traffic load is to be met when the partition performs the cross migration.
Status of state
| Extension | 1
|
Extension 2
|
Extension 3
|
S1
|
+2
|
|
|
S2
|
|
+4
|
|
S3
|
|
|
+6 |
TABLE 3
The partition contraction relationship table set in the same example is shown in table 4.
Status of state
| Shrinkage | 1
|
Shrinkage 2
|
Shrinkage 3
|
S1
|
-2
|
|
|
S2
|
|
-4
|
|
S3
|
|
|
-6 |
TABLE 4
Partition shrinking is a complex process, since computing resources are exclusive, and it is not possible to let a partition undergo dynamic and steady shrinking by simply reducing the number of computing resources, which is a reliable and stable method that releases resources by shutting down a partition (database instance), then shrinking the partition computing resources, and then restarting the partition. This is also the main shrinking mode for shrinking the computing resources of the current virtualization container.
The expansion or contraction of the computing resources of the partitions is effectively and stably controlled through the finite state machine, and the stable and controllable operation of the partitions is ensured. The state machine is responsible for adjusting the partitions according to the strategy and the monitoring data set by the strategy manager, and the specific adjusting instruction is executed by the executor and is transmitted to each node in the whole cluster. FIG. 5 is a diagram illustrating state estimation of a state machine. The following explains the partition state expansion transition and the partition state contraction transition separately.
FIG. 6 is a diagram illustrating partition state extension migration. FIG. 7 is a diagram of monitored data for a low-load partition that is always running smoothly and that does not require expansion of computing resources based on a partition expansion threshold set by a policy manager. The state migration path of the partition is as follows: s1- > S1. Fig. 8 shows monitoring data of a medium-load-fluctuation partition, where load fluctuation of the partition is obvious, and based on a partition expansion threshold set by a policy manager, a computing resource should be expanded for the partition, so as to add 2 cores of computing resources. The migration state of the partition is as follows: s1- > S2.
The state machine evaluates whether the current partition has available resources to expand according to the strategy formulated by the strategy manager, and if the current partition has available resources to expand, the state machine generates a specific instruction for computing resource expansion to an actuator:
echo min-current+2>/dev/cpuset/PARTITION/cpus
if no available resources are expanded, the state machine sends out alarm information to the executor. The expansion of the computing resources of the current partition can be completed quickly, and the partition expansion instruction can be transmitted to the whole cluster at an extremely high speed and the computing resources of each partition can be expanded quickly.
FIG. 9 is a schematic diagram of partition state shrink transition. If the resource utilization of the partition is insufficient, a large change occurs, and the recovery of the partition resources is generally needed. FIG. 10 is a graph of monitored data for a partition operating at a higher load, and if the load drops to that shown in FIG. 11, the partition needs to shrink its resources. When the resources are shrunk, a time-consuming and complex process is adopted, the resources cannot be directly recovered from the partitions, the database instance is generally required to be stopped, and then the resources are recovered; and then perform similar operations again at other nodes.
$srvctl stop instance–db orcl–instance orcl1
#echo min-current-6>/dev/cpuset/PARTITION/cpus
The executor is a final instruction execution body in the core engine and is responsible for executing a monitoring instruction issued by the monitor, executing a strategy change synchronization instruction issued by the strategy manager, executing a partition migration instruction issued by the state machine, executing an instruction delivered to the message bus by other executors, and also responsible for loading and delivering a uniform instruction to the message bus by the executor so as to ensure the consistency of partition states of other nodes in the cluster. FIG. 12 is a flow chart of the actuator instructions. The message bus load passes messages of each node.
After the partitions are configured, isolation tests of the partitions are generally required, and the following scripts can be independently run in each partition to verify the high isolation of the partitions. The specific script is as follows:
the zero load running state at the other partitions is guaranteed when the script is run. If the test is carried out under the condition that the condition is met, the other partitions have higher load running states, and the situation shows that errors exist in the configuration of low-level resources such as CGroup or cpu set. Normally, the partition load map is shown in FIG. 13.
According to the x86 server dynamic hard partitioning method, the control ethnic group is combined with the Oracle database, the core engine deployed at each node of the Oracle database is used for monitoring and evaluating data of each partition, the partition is dynamically adjusted according to the evaluation result, the dynamic hard partitioning function of the x86 server is realized, effective physical isolation capability is provided, and the service quality of the Oracle database can be effectively guaranteed.
In one embodiment, an x86 server dynamic hard partition device is suitable for an x86 server configured with a plurality of multicore Intel Xeon processors and configured with a large amount of memory. As shown in fig. 14, the apparatus includes a partition initialization module 110, a data monitoring module 130, a state evaluation module 140, and a partition adjustment module 150.
The partition initialization module 110 is configured to perform partition initialization processing on a corresponding node by controlling a core engine of each node of an Oracle database deployed in an x86 server in a swarm.
Specifically, the core engines in the control family are arranged at each node of the Oracle database, and the number of the core engines corresponds to the number of the nodes. The core engine belongs to a microkernel product, namely all components are built in the core engine, and the components are tightly coupled, so that the stability and reliability of the core engine are ensured finally. The core engine is responsible for the initialization of the server computer resource controller and ensures the availability of each computing resource during the operation.
The data monitoring module 130 is configured to perform load monitoring on the partition after the initialization processing of the corresponding node by using the core engine to obtain monitoring data.
And monitoring the loads of all partitions in the current node by using the core engine, wherein the acquired monitoring data is used as a basis for subsequent partition adjustment.
The state evaluation module 140 is configured to evaluate, by the core engine, the state of the corresponding partition according to the monitoring data and the stored management policy to obtain an evaluation result.
The management strategy can be received in advance for storage, and a new management strategy can be received in real time for updating the stored strategy. In one embodiment, the management policy includes a correspondence between a partition status and a threshold, and the status evaluation module 140 includes a target status obtaining unit and an evaluation result obtaining unit.
And the target state acquisition unit is used for acquiring the target state of the corresponding partition according to the monitoring data and the corresponding relation between the partition state and the threshold value.
The expansion and contraction of the partition computing resources depend on a threshold value set by a policy manager, and the setting of the threshold value generally depends on the hardware configuration of the current node and the operation and maintenance monitoring specification. The target state refers to the state that the partition needs to be migrated. For example, the partition status includes A, B and C, and the corresponding threshold values are a, b, and C, respectively. And if the acquired monitoring data is matched with the threshold value C, the target state of the partition is C.
The evaluation result acquisition unit is used for respectively obtaining the evaluation results of the corresponding partitions according to the current state and the target state of each partition.
Similarly, taking the target state of the partition as C as an example, if the current state of the partition is a, the obtained evaluation result is a- > C, and the partition needs to be migrated from the a state to the C state.
The partition adjusting module 150 is configured to dynamically adjust, by the core engine, the corresponding partition according to the evaluation result.
And after the evaluation result of each partition is obtained, the core engine directly dynamically adjusts the corresponding partition according to the evaluation result. Specifically, the partition adjusting module 150 adjusts the expansion or contraction of the computing resource of the corresponding partition according to the evaluation result, corresponding to the management policy which may include an expansion policy and a contraction policy.
In one embodiment, as shown in FIG. 15, the x86 server dynamic hard partition device also includes an isolation test module 120.
The isolation test module 120 is configured to, after the partition initialization module 110 performs partition initialization processing on the corresponding node, perform isolation test on the initialized partition through the core engine before the data monitoring module 130 performs load monitoring on the initialized partition of the corresponding node through the core engine to obtain monitoring data, and control the data monitoring module 130 to perform load monitoring on the initialized partition of the corresponding node through the core engine after the isolation test passes to obtain the monitoring data.
In one embodiment, the x86 server dynamic hard partition device further comprises a message transfer module.
The message transmission module is configured to perform message transmission between the core engines through a message bus in the control cluster after the state evaluation module 140 evaluates the state of the corresponding partition according to the monitoring data and the stored management policy by the core engine to obtain an evaluation result.
The control family further comprises a message bus connected with each core engine, the message bus is responsible for message transmission among all nodes, and the message transmission specifically comprises transmission of a unified strategy, so that consistency of the strategy in the cluster is ensured. The message passing may also include passing partition migration instructions to ensure consistency of the partition state of other nodes within the cluster. It can be understood that if the system has only one node, a single core engine is configured, and a message bus does not need to be deployed; the message bus needs to be deployed for message passing between the core engines only when the cluster environment is deployed.
The x86 server dynamic hard partition device combines the control ethnic group with the Oracle database, performs data monitoring and evaluation on each partition by using the core engine deployed at each node of the Oracle database, dynamically adjusts the partition according to the evaluation result, realizes the dynamic hard partition function of the x86 server, provides effective physical isolation capability, and can effectively guarantee the service quality of the Oracle database.
In an embodiment, a computer-readable storage medium has stored thereon a computer program which, when executed by a processor, carries out the steps of the above-mentioned method. The storage medium may be a floppy disk, an optical disk, a DVD, a hard disk, a flash memory, a usb disk, etc., and the specific type is not unique.
The computer readable storage medium combines the control population with the Oracle database, performs data monitoring and evaluation on each partition by using the core engine deployed at each node of the Oracle database, dynamically adjusts the partition according to the evaluation result, realizes the dynamic hard partition function of the x86 server, provides effective physical isolation capability, and can effectively guarantee the service quality of the Oracle database.
In one embodiment, a computer device includes a memory, an x86 server, and a computer program stored on the memory and executable on the x86 server, the x86 server implementing the steps of the above method when executing the program.
The computer equipment combines the control ethnic group with the Oracle database, performs data monitoring and evaluation on each partition by using the core engine deployed at each node of the Oracle database, dynamically adjusts the partition according to the evaluation result, realizes the dynamic hard partition function of the x86 server, provides effective physical isolation capability, and can effectively guarantee the service quality of the Oracle database.
The technical features of the embodiments described above may be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the embodiments described above are not described, but should be considered as being within the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
The above-mentioned embodiments only express several embodiments of the present invention, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the inventive concept, which falls within the scope of the present invention. Therefore, the protection scope of the present patent shall be subject to the appended claims.