CN115421927A - Load balancing method, computing device and storage medium - Google Patents

Load balancing method, computing device and storage medium Download PDF

Info

Publication number
CN115421927A
CN115421927A CN202211342457.1A CN202211342457A CN115421927A CN 115421927 A CN115421927 A CN 115421927A CN 202211342457 A CN202211342457 A CN 202211342457A CN 115421927 A CN115421927 A CN 115421927A
Authority
CN
China
Prior art keywords
memory
scanning
threshold
scan
load balancing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202211342457.1A
Other languages
Chinese (zh)
Other versions
CN115421927B (en
Inventor
王晓华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Uniontech Software Technology Co Ltd
Original Assignee
Uniontech Software Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Uniontech Software Technology Co Ltd filed Critical Uniontech Software Technology Co Ltd
Priority to CN202211342457.1A priority Critical patent/CN115421927B/en
Publication of CN115421927A publication Critical patent/CN115421927A/en
Application granted granted Critical
Publication of CN115421927B publication Critical patent/CN115421927B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5011Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals
    • G06F9/5016Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals the resource being the memory
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5011Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals
    • G06F9/5022Mechanisms to release resources
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • G06F9/505Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering the load
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Techniques For Improving Reliability Of Storages (AREA)

Abstract

The invention relates to the field of operating systems, in particular to a load balancing method, a computing device and a storage medium, wherein the computing device comprises a plurality of processors and a plurality of memory modules, one or more processes run in the processors, and the method comprises the following steps: setting scanning parameters of a running process in a processor; responding to a scanning request for the process, scanning the process according to the scanning parameters, and determining the page missing interruption times of each process; and according to the page missing interruption times of each process, carrying out load balancing on the memory corresponding to the process in the memory module. The invention can set scanning parameters for each process, scan the process according to the scanning parameters of each process, and perform personalized load balancing according to the page fault interruption times of each process, thereby realizing the migration of the memory under the condition of considering the difference of each process and improving the effect of load balancing.

Description

Load balancing method, computing device and storage medium
Technical Field
The present invention relates to the field of operating systems, and in particular, to a load balancing method, a computing device, and a storage medium.
Background
With the development of computer technology, people are more and more completing various works through computers with non-uniform memory access architectures. Non-uniform memory access (NUMA) is a memory architecture designed for multiprocessor computers, in which memory access time depends on the location of the memory relative to the processor. Advantages of NUMA architectures include: the single operating system is simple and convenient to copy, the application program programming mode is easy to manage, and the expansibility is strong.
Under NUMA architectures, a processor accesses its own local memory more quickly than non-local memory (located in memory of another processor, or shared memory between processors). Therefore, considering that the access mode affects the speed of executing the task by the process, the migration operation is executed on the process or the memory so as to maintain the efficiency of executing the process. However, in such a migration manner, sometimes the resources consumed by migration are too high, which is not favorable for improving the overall operation efficiency.
For this reason, a new load balancing method is required.
Disclosure of Invention
To this end, the present invention provides a load balancing method in an attempt to solve, or at least alleviate, the problems presented above.
According to an aspect of the present invention, there is provided a load balancing method, adapted to be executed in a computing device, the computing device including a plurality of processors and a plurality of memory modules, one or more processes running in the processors, the method including: setting scanning parameters of a running process in a processor; responding to a scanning request for the process, scanning the process according to the scanning parameters, and determining the page missing interruption times of each process; and according to the page fault interruption times of each process, carrying out load balancing on the memory corresponding to the process in the memory module.
Optionally, in the method according to the present invention, further comprising: setting a parameter setting interface; setting scanning parameters for a process running in a processor, comprising: and setting the scanning parameters of the process according to the parameter setting interface.
Optionally, in a method according to the invention, the scanning parameters comprise: scanning a memory threshold, the scanning the memory threshold comprising: the process scans the memory size of the memory accessed in the memory module.
Optionally, in a method according to the invention, the scanning parameters comprise: a scan interval threshold, the method further comprising: and determining a scanning request initiating interval according to the scanning interval threshold so as to initiate a scanning request according to the scanning request initiating interval.
Optionally, in a method according to the present invention, scanning the process according to the scanning parameters includes: and scanning the process according to the scanning memory threshold value, and determining the page missing interruption times of the process within the scanning interval threshold value.
Optionally, in the method according to the present invention, load balancing is performed on the memory corresponding to the process in the memory module according to the number of times of the page fault interrupt of each process, where the method includes: determining the page missing interruption times generated by accessing each memory module when the process accesses one or more memory modules according to the page missing interruption times of the process; and balancing the load of the memory according to the page fault interruption times generated by the process accessing each memory module.
Optionally, in the method according to the present invention, further comprising: in response to a process migrating from a current processor to a target processor, a second scan interval threshold is generated from a first scan interval threshold of the process, the second scan interval threshold being less than the first scan interval threshold.
Optionally, in the method according to the present invention, further comprising: and in response to the process not occurring process migration within the scan interval threshold, generating a third scan interval threshold according to the first scan interval threshold of the process, wherein the third scan interval threshold is greater than the first scan interval threshold.
Optionally, in the method according to the present invention, further comprising: and responding to the memory application of the process in the memory module, and generating a second scanning memory threshold according to the first scanning memory threshold of the process, wherein the second scanning memory threshold is greater than the first scanning memory threshold.
Optionally, in the method according to the present invention, further comprising: and responding to the process to release the memory in the memory module, and generating a third scanning memory threshold according to the first scanning memory threshold of the process, wherein the third scanning memory threshold is smaller than the first scanning memory threshold.
According to another aspect of the present invention, there is provided a computing device comprising: one or more processors; a memory; and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, the one or more programs including instructions for performing the load balancing method according to the present invention.
According to yet another aspect of the invention, there is provided a computer readable storage medium storing one or more programs, the one or more programs comprising instructions, which when executed by a computing device, cause the computing device to perform a method of load balancing according to the invention.
The invention discloses a load balancing method which is suitable for being executed in computing equipment, wherein the computing equipment comprises a plurality of processors and a plurality of memory modules, one or more processes run in the processors, and the method comprises the following steps: setting scanning parameters of a running process in a processor; responding to a scanning request for the process, scanning the process according to the scanning parameters, and determining the page missing interruption times of each process; and according to the page missing interruption times of each process, carrying out load balancing on the memory corresponding to the process in the memory module. The invention can set scanning parameters for each process, scan the process according to the scanning parameters of each process, and perform personalized load balancing according to the missing page interruption times of each process, thereby realizing the migration of the memory under the condition of considering the difference of each process and improving the effect of load balancing.
Drawings
To the accomplishment of the foregoing and related ends, certain illustrative aspects are described herein in connection with the following description and the annexed drawings, which are indicative of various ways in which the principles disclosed herein may be practiced, and all aspects and equivalents thereof are intended to be within the scope of the claimed subject matter. The above and other objects, features and advantages of the present disclosure will become more apparent from the following detailed description when read in conjunction with the accompanying drawings. Throughout this disclosure, like reference numerals generally refer to like parts or elements.
FIG. 1 shows a flow diagram of a load balancing method 100 according to an exemplary embodiment of the invention;
FIG. 2 illustrates a block diagram of a computing device 200, according to an exemplary embodiment of the invention;
FIG. 3 is a diagram illustrating a processor and memory module connection according to an exemplary embodiment of the invention.
Detailed Description
Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art. Like reference numbers generally refer to like parts or elements.
Fig. 1 shows a flow diagram of a load balancing method 100 according to an exemplary embodiment of the invention.
One load balancing method of the present invention is adapted to be executed in a computing device. FIG. 2 illustrates a block diagram of a computing device, according to an exemplary embodiment of the invention.
In a basic configuration, computing device 200 includes at least one processing unit 220 and system memory 210. According to one aspect, depending on the configuration and type of computing device, system memory 210 includes, but is not limited to, volatile storage (e.g., random access memory), non-volatile storage (e.g., read-only memory), flash memory, or any combination of such memories. According to one aspect, system memory 210 includes an operating system 211.
According to one aspect, the operating system 211, for example, is adapted to control the operation of the computing device 200. Moreover, examples are practiced in conjunction with a graphics library, other operating systems, or any other application program and are not limited to any particular application or system. This basic configuration is illustrated in fig. 2 by those components within dashed line 215. According to one aspect, computing device 200 has additional features or functionality. For example, according to one aspect, computing device 200 includes additional data storage devices (removable and/or non-removable) such as, for example, magnetic disks, optical disks, or tape.
As stated hereinabove, according to one aspect, program modules 212 are stored in system memory 210. According to one aspect, program modules 212 may include one or more applications, the invention not being limited to the type of application, e.g., applications further include: email and contacts applications, word processing applications, spreadsheet applications, database applications, slide show applications, drawing or computer-aided applications, web browser applications, and the like.
According to one aspect, examples may be practiced in a circuit comprising discrete electronic elements, a packaged or integrated electronic chip containing logic gates, a circuit utilizing a microprocessor, or on a single chip containing electronic elements or microprocessors. For example, an example may be practiced via a system on a chip (SOC) in which each or many of the components shown in fig. 2 may be integrated on a single integrated circuit. According to one aspect, such SOC devices may include one or more processing units, graphics units, communication units, system virtualization units, and various application functions, all integrated (or "burned") onto a chip substrate as a single integrated circuit. When operating via an SOC, the functions described herein may be operated via application-specific logic integrated with other components of the computing device 200 on a single integrated circuit (chip). Embodiments of the invention may also be practiced using other technologies capable of performing logical operations (e.g., AND, OR, AND NOT), including but NOT limited to mechanical, optical, fluidic, AND quantum technologies. In addition, embodiments of the invention may be practiced within a general purpose computer or in any other circuits or systems.
According to one aspect, computing device 200 may also have one or more input devices 231, such as a keyboard, mouse, pen, voice input device, touch input device, or the like. Output device(s) 232 such as a display, speakers, printer, etc. may also be included. The foregoing devices are examples and other devices may also be used. Computing device 200 may include one or more communication connections 233 that allow communication with other computing devices 240. Examples of suitable communication connections 233 include, but are not limited to: RF transmitter, receiver and/or transceiver circuitry; universal Serial Bus (USB), parallel, and/or serial ports. Computing device 200 may be communicatively connected to other computing devices 240 via communication connection 233.
Embodiments of the present invention also provide a non-transitory readable storage medium storing instructions for causing the computing device to perform a method according to embodiments of the present invention. The readable media of the present embodiments include permanent and non-permanent, removable and non-removable media, and the storage of information may be accomplished by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of readable storage media include, but are not limited to: phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other non-transitory readable storage medium.
According to one aspect, communication media is embodied by computer readable instructions, data structures, program modules, or other data in a modulated data signal (e.g., a carrier wave or other transport mechanism) and includes any information delivery media. According to one aspect, the term "modulated data signal" describes a signal that has one or more feature sets or that has been altered in such a manner as to encode information in the signal. By way of example, and not limitation, communication media includes wired media such as a wired network or direct-wired connection, and wireless media such as acoustic, radio Frequency (RF), infrared, and other wireless media.
It is noted that although the computing device depicted above shows only processing unit 220, system memory 210, input device 231, output device 232, and communication connection 233, in particular implementations, the device may include other components necessary for proper operation. In addition, those skilled in the art will appreciate that the above-described apparatus may also include only those components necessary to implement the embodiments of the present description, and not necessarily all of the components shown in the figures.
According to one embodiment of the invention, the computer device comprises a plurality of processors and a plurality of memory modules, wherein one or more processes run in the processors. FIG. 3 is a diagram illustrating a processor and memory module connection according to an exemplary embodiment of the invention. As shown in FIG. 3, the computing device includes a plurality of NUMA nodes (NUMA Node), including NUMA Node1~ NUMA Node4. Each NUMA Node includes a processor, e.g., NUMA Node1 includes CPU1. The processor in each NUMA Node is connected to a plurality of Memory modules, for example, in NUMA Node1, the CPU1 is connected to the Memory modules Memory a.1 and Memory a.2. The memory module is a memory storage area on logic, and the CPU-CPU4 is a logic CPU. Under a NUMA architecture, resources of physical processors and physical internal memory in a computing device are partitioned into a plurality of NUMA nodes.
NUMA architectures manage CPU and main memory with NUMA nodes. A NUMA architecture consists of NUMA nodes, each of which may own multiple CPUs and memory modules. The memory module included under each NUMA node is a local memory relative to the CPU under the NUMA node, and the memory modules included under the NUMA nodes of other nodes are remote memories relative to the CPU under the NUMA node. NUMA nodes are connected through an interconnection bus, and the CPU can have lower delay and higher performance when accessing the local memory under the node. For a given NUMA node, the latency of accessing memory modules on different NUMA nodes is different due to the different distances between the NUMA node and other NUMA nodes.
In the prior art, a memory manager of an operating system acquires a memory from a NUMA node on which a current process runs by default. And when the memory of the current NUMA node is insufficient, applying for the memory from the adjacent NUMA node. In order to fully utilize the computing power of the CPU, the multi-core load balancing mechanism migrates processes among different CPUs, which causes the processes to run on other nodes, but the memory is still on the original Node, resulting in the processes accessing the remote memory.
In order to solve the problem of performance degradation caused by remote memory access, a kernel of an operating system migrates the remote memory to a node where a process runs or migrates the process to a node where the process accesses the memory most through an Automatic NUMA Balancing (NUMA) mechanism, so that the probability of the process accessing the local memory is increased, and the system performance is improved.
The NUMA automatic balancing mechanism uses the feature of missing page interruption to obtain the times of the process accessing the memory module in each NUMA node. When a process applies for a Memory, a Virtual address space is allocated first, when contents in the Memory are read and written, page fault interruption is triggered, a mapping relation between a Virtual address and a physical address is established, a Virtual Memory Area (VMA) is established, and the mapping relation between the Virtual address and the physical address is managed through the VMA. The virtual memory space describes a virtual, continuous, independent address space.
The NUMA automatic balancing mechanism scans the VMA of the current process on the CPU when the clock of the scheduler reaches the preset scanning time, clears the permission of the process for accessing the memory page, and triggers a page fault interrupt when the process accesses the memory page next time.
In NUMA missing page interruption, the times of the process accessing the memory modules in different NUMA nodes can be updated, and meanwhile, the performance loss caused by remote memory access is reduced by migrating the memory pages of the accessed remote memory to the local memory of the local Node or migrating the process to the Node with the most memory access.
The memory accessed by only one process is a private memory, and the memory accessed by a plurality of processes is called a shared memory. For private memory, the NUMA auto-leveling mechanism migrates remote memory pages onto the local node. For shared memory, processes accessing the same memory form a NUMA group, the sum of the memory access counts of all the processes in the NUMA group is the memory access count of the NUMA group, and remote pages are migrated to a local node or the processes are migrated to the node with the largest memory access times according to the times of accessing different NUMA node memories by the NUMA group.
According to one embodiment of the invention, a plurality of system parameters are provided in an operating system to control the memory scanning period of a process and the size of an address space of each scanning in the whole system range:
numa _ balancing _ scan _ period _ min _ ms, which is used for controlling the shortest interval time of memory scanning of two processes;
numa _ balancing _ scan _ period _ max _ ms, which is used for controlling the longest interval time of memory scanning of two processes;
numa _ balancing _ scan _ size _ mb for controlling the size of the address space scanned.
However, under the NUMA automatic equalization mechanism in the prior art, the global NUMA _ balancing _ scan _ period and NUMA _ balancing _ scan _ size may not allow the system to obtain the optimal performance. CPU resources are consumed by page fault interrupts initiated after scanning a process address space and modifying a page table attribute (PTE), and the more frequent the scanning, the more memory scanned, the more CPU resources consumed. By scanning the process page table and performing NUMA automatic balancing, the times of accessing the remote memory by the process are reduced, the performance is improved, and the CPU consumed by accessing the remote memory is reduced.
The performance improvement of one process caused by NUMA automatic equalization can be specifically quantified through the following model: and after balancing, subtracting the CPU consumption consumed by the process page table scanning from the CPU consumption consumed by the remote memory access. However, an empirical value of numa _ balancing _ scan _ period and numa _ balancing _ scan _ size does not exist at present, so that all service applications can obtain the maximum performance improvement.
Different processes obtain different scanning frequencies and scanning address spaces with the maximum performance improvement. In a computing device, a plurality of processes are usually run, the scan cycles required by the processes of different types are different from the memory size required by each scan, and the load balancing strategy is not flexible enough due to the fact that the whole system uses the uniform numa _ balancing _ scan _ period and numa _ balancing _ scan _ size, so that the improvement of the system running efficiency is limited.
Referring to FIG. 1, step 110 is executed to set scan parameters for a process running in a processor.
According to an embodiment of the present invention, in order to set the scan parameters, a process structure of the process may be modified, in which the scan parameters are added, and a specific settable parameter field is added.
According to one embodiment of the invention, the scan parameters include: scanning a memory threshold and a scan interval threshold, the scanning the memory threshold comprising: the process scans the memory size of the memory accessed in the memory module.
According to one embodiment of the present invention, the present invention determines a scan request initiation interval according to a scan interval threshold value, so as to initiate a scan request according to the scan request initiation interval. The scan interval threshold is used to set a scan request initiation interval, that is, an interval for scanning a process twice in a row.
According to an embodiment of the present invention, the parameter field corresponding to the scan memory threshold may be specifically set to numa _ balancing _ scan _ size, and the parameter field corresponding to the scan interval threshold may be specifically set to numa _ balancing _ scan _ period. The invention does not limit the specific form of the parameter field corresponding to the scan memory threshold and the scan interval threshold.
According to one embodiment of the invention, the process structure of the process can be modified and the parameter field for setting the scanning parameters can be added when the process is created. After the parameter field is added, an initial value can be added to the corresponding position of the parameter field, so as to set the scanning parameter of the process as the default threshold value of the system. The invention does not limit the specific value of the default threshold of the system.
According to one embodiment of the invention, the method sets the parameter setting interface in advance, and when the scanning parameters of the process running in the processor are set, the scanning parameters of the process are set according to the parameter setting interface.
According to an embodiment of the present invention, the parameter setting interface is configured to obtain a value of a scan parameter in the process structure, for example, obtain a value of a parameter field corresponding to the scan parameter, and modify the value of the scan parameter in the process structure, for example, modify the value of the parameter field corresponding to the scan parameter.
According to an embodiment of the present invention, after the scanning parameters are modified according to the parameter setting interface each time, the process obtains the modified scanning parameters according to the parameter setting interface, and scans the process according to the modified scanning parameters.
According to an embodiment of the invention, after the process is created, the process structure of the process is modified, and the scanning parameters of the process are set as the default threshold of the system. The system default threshold can then be modified through the parameter setting interface to modify the system default threshold to the customized scan parameters of the process, so that the process can be scanned according to the customized scan parameters.
Then, executing step 120, in response to the scan request for the processes, scanning the processes according to the scan parameters, and determining the number of times of page fault interruption of each process; specifically, the method comprises the following steps: and scanning the process according to the scanning memory threshold value, and determining the page missing interruption times of the process within the scanning interval threshold value.
According to one embodiment of the invention, when the process is scanned according to the scanning parameters, the scanning parameters are acquired from a process structure body of the process so as to scan the process.
The scanning memory threshold value specifies the size of a memory for scanning a memory area corresponding to a process when the process is scanned; the scan interval threshold defines the interval between two adjacent scans when a process is scanned.
By performing statistics on the times of missing page and interruption of the process, the access condition of the process to the memory area, namely the evaluation on the access times of each memory module, can be determined.
Because the scanning behavior is process granularity when the process is scanned, the scanning parameters set for the process are also process granularity, and the process granularity can be set for the process scanning behavior by setting the scanning parameters of the process.
Finally, step 130 is executed to perform load balancing on the memory corresponding to the processes in the memory module according to the number of times of page fault interruption of each process.
According to one embodiment of the invention, when load balancing is carried out, the page fault interruption times generated by accessing each memory module when the process accesses one or more memory modules are determined according to the page fault interruption times of the process; and balancing the load of the memory according to the page fault interruption times generated by the process accessing each memory module.
According to an embodiment of the present invention, when load balancing is specifically performed, the memory pages that access more remote memories may be migrated to the local memory, and the present invention does not limit the specific load balancing manner.
According to an embodiment of the present invention, the scanning interval threshold and the scanning memory threshold of the process are adjusted according to whether the process migrates between different Node nodes in the current scanning period and according to the memory allocation condition.
According to one embodiment of the invention, the inventive method comprises: in response to a process migrating from a current processor to a target processor, a second scan interval threshold is generated from a first scan interval threshold of the process, the second scan interval threshold being less than the first scan interval threshold.
And responding to the fact that the process is within the scanning interval threshold value and process migration does not occur, and generating a third scanning interval threshold value according to the first scanning interval threshold value of the process, wherein the third scanning interval threshold value is larger than the first scanning interval threshold value.
The process is migrated before the nodes, which means that the mobility of the process between the nodes is enhanced, and the scanning frequency of the process can be improved.
For a process running on a Node, the default policy is that the memory is preferentially applied from the local Node. One of the factors that result in accessing remote memory is that the process is migrated to other Node nodes. If there is node migration in the scanning period, the scanning interval threshold is shortened, for example, the second scanning interval threshold is set to 3/4 of the first scanning interval threshold.
According to an embodiment of the present invention, the scan interval threshold may be set to a minimum of not less than 200ms.
If there is no migration between nodes, the scan interval threshold is lengthened, for example, the third scan interval threshold is set to 5/4 of the first scan interval threshold.
According to an embodiment of the present invention, the scan interval threshold may be set to a maximum of no more than 5s.
According to one embodiment of the invention, the inventive method comprises: and responding to the memory application of the process in the memory module, and generating a second scanning memory threshold value according to the first scanning memory threshold value of the process, wherein the second scanning memory threshold value is greater than the first scanning memory threshold value.
And responding to the process to release the memory in the memory module, and generating a third scanning memory threshold according to the first scanning memory threshold of the process, wherein the third scanning memory threshold is smaller than the first scanning memory threshold.
When a process applies for a memory, a memory area which is not allocated is only searched in a process virtual address space for allocation, and only when actual memory access occurs, page fault interruption is initiated and a physical memory is applied. When physical memory is allocated to a process in the page fault interrupt, the size of the memory scanned each time is increased. The process applies for the memory in the memory module, and if the memory needing to be scanned is increased, the scanning memory threshold value is increased; and the process releases the memory in the memory module, and then the scanning memory threshold value is reduced.
According to an embodiment of the present invention, if the process applies for the memory in the memory module, the value of the second scan memory threshold value specifically increased relative to the first scan memory threshold value is: size of newly allocated memory (first scan memory threshold/total size of memory of current process).
According to an embodiment of the present invention, if the process releases the memory in the memory module, the value of the third scan memory threshold specifically decreased from the first scan memory threshold is: size of released memory (first scan memory threshold/total size of memory of current process).
According to one embodiment of the invention, statistics of non-NUMA page fault interrupts are considered in determining the number of page fault interrupts for a process. The non-NUMA page fault interrupt is not a page fault interrupt caused by NUMA scanning, and may specifically include a page fault interrupt caused by not setting access authority of a process to a memory page, and the like. When the number of times of missing page interruption of the process is determined, the process and Node information which cause the missing page interruption are obtained, and the statistical count of the process on different nodes is increased. When the total process page fault interruption times are calculated, 0.2 × non-page fault interruption times + NUMA page fault interruption times are used as the total times.
The times of other types of page fault interruption (not the page fault interruption caused by NUMA scanning) are calculated into the statistical data of the memory access to different node memories. Under the condition that the scanning period is the same as the size of the memory scanned each time, the NUMA missing page interruption statistical precision can be improved, so that the memory page and the process can be more accurately migrated, and the performance of the process is improved.
According to an embodiment of the present invention, when migrating a memory page, if the local Node memory of a target migration Node is insufficient, migration to a neighboring Node is considered. When the residual memory of the memory module of the local node reaches the lowest threshold value of the memory of the current node, the node closest to the current node is determined, and the memory page is migrated to the nearby node, so that the time for the process to access the remote memory can be reduced, and the performance of the process is improved.
The invention discloses a load balancing method which is suitable for being executed in computing equipment, wherein the computing equipment comprises a plurality of processors and a plurality of memory modules, one or more processes run in the processors, and the method comprises the following steps: setting scanning parameters of a running process in a processor; responding to a scanning request for the process, scanning the process according to the scanning parameters, and determining the page missing interruption times of each process; and according to the page missing interruption times of each process, carrying out load balancing on the memory corresponding to the process in the memory module. The invention can set scanning parameters for each process, scan the process according to the scanning parameters of each process, and perform personalized load balancing according to the page fault interruption times of each process, thereby realizing the migration of the memory under the condition of considering the difference of each process and improving the effect of load balancing.
In the description provided herein, numerous specific details are set forth. It is understood, however, that embodiments of the invention may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.
Similarly, it should be appreciated that in the foregoing description of exemplary embodiments of the invention, various features of the invention are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure and aiding in the understanding of one or more of the various inventive aspects.
Those skilled in the art will appreciate that the modules or units or groups of devices in the examples disclosed herein may be arranged in a device as described in this embodiment, or alternatively may be located in one or more devices different from the devices in this example. The modules in the foregoing examples may be combined into one module or may additionally be divided into multiple sub-modules.
Those skilled in the art will appreciate that the modules in the device in an embodiment may be adaptively changed and disposed in one or more devices different from the embodiment. Modules or units or groups in embodiments may be combined into one module or unit or group and may furthermore be divided into sub-modules or sub-units or sub-groups. All of the features disclosed in this specification, and all of the processes or elements of any method or apparatus so disclosed, may be combined in any combination, except that at least some of such features and/or processes or elements are mutually exclusive. Each feature disclosed in this specification may be replaced by an alternative feature serving the same, equivalent or similar purpose, unless expressly stated otherwise.
Moreover, those skilled in the art will appreciate that although some embodiments described herein include some features included in other embodiments, not others, combinations of features of different embodiments are meant to be within the scope of the invention and form different embodiments.
Additionally, some of the embodiments are described herein as a method or combination of method elements that can be implemented by a processor of a computer system or by other means of performing the described functions. A processor having the necessary instructions for carrying out the method or method elements thus forms a means for carrying out the method or method elements. Further, the elements of the apparatus embodiments described herein are examples of the following apparatus: the apparatus is used to implement the functions performed by the elements for the purpose of carrying out the invention.
The various techniques described herein may be implemented in connection with hardware or software or, alternatively, with a combination of both. Thus, the methods and apparatus of the present invention, or certain aspects or portions thereof, may take the form of program code (i.e., instructions) embodied in tangible media, such as floppy diskettes, CD-ROMs, hard drives, or any other machine-readable storage medium, wherein, when the program is loaded into and executed by a machine, such as a computer, the machine becomes an apparatus for practicing the invention.
In the case of program code execution on programmable computers, the computing device will generally include a processor, a storage medium readable by the processor (including volatile and non-volatile memory and/or storage elements), at least one input device, and at least one output device. Wherein the memory is configured to store program code; the processor is configured to perform the load balancing method of the present invention according to instructions in said program code stored in the memory.
By way of example, and not limitation, computer readable media may comprise computer storage media and communication media. Computer-readable media includes both computer storage media and communication media. Computer storage media store information such as computer readable instructions, data structures, program modules or other data. Communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media. Combinations of any of the above are also included within the scope of computer readable media.
As used herein, unless otherwise specified the use of the ordinal adjectives "first", "second", "third", etc., to describe a common object, merely indicate that different instances of like objects are being referred to, and are not intended to imply that the objects so described must be in a given sequence, either temporally, spatially, in ranking, or in any other manner.
While the invention has been described with respect to a limited number of embodiments, those skilled in the art, having benefit of this description, will appreciate that other embodiments can be devised which do not depart from the scope of the invention as described herein. Furthermore, it should be noted that the language used in the specification has been principally selected for readability and instructional purposes, and may not have been selected to delineate or circumscribe the inventive subject matter. Accordingly, many modifications and variations will be apparent to practitioners skilled in this art. The disclosure of the present invention is intended to be illustrative, but not limiting, of the scope of the invention.

Claims (12)

1. A load balancing method adapted to be executed in a computing device including a plurality of processors and a plurality of memory modules, one or more processes running in the processors, the method comprising:
setting scanning parameters of a running process in the processor;
responding to a scanning request for the process, scanning the process according to the scanning parameters, and determining the page missing interruption times of each process;
and according to the page missing interruption times of each process, carrying out load balancing on the memory corresponding to the process in the memory module.
2. The method of claim 1, wherein the method further comprises:
setting a parameter setting interface;
the setting of the scanning parameters of the running process in the processor comprises the following steps:
and setting the scanning parameters of the process according to the parameter setting interface.
3. The method of claim 1, wherein the scan parameters comprise: scanning a memory threshold, the scanning the memory threshold comprising: the process scans the memory size of the memory accessed in the memory module.
4. The method of claim 1, wherein the scan parameters comprise: a scan interval threshold, the method further comprising:
and determining a scanning request initiating interval according to the scanning interval threshold so as to initiate a scanning request according to the scanning request initiating interval.
5. The method of claim 3, wherein scanning the process according to the scan parameters comprises:
and scanning the process according to the scanning memory threshold value, and determining the page missing interruption times of the process within the scanning interval threshold value.
6. The method according to claim 1, wherein the load balancing the memory corresponding to the process in the memory module according to the number of times of page fault interrupts of each process includes:
determining the page missing interruption times generated by accessing each memory module when the process accesses one or more memory modules according to the page missing interruption times of the process;
and balancing the load of the memory according to the page fault interruption frequency generated by the process accessing each memory module.
7. The method of any of claims 1-6, further comprising:
in response to the process migrating from the current processor to the target processor, generating a second scan interval threshold from the first scan interval threshold for the process, the second scan interval threshold being less than the first scan interval threshold.
8. The method of any of claims 1-6, further comprising:
and in response to the process being within the scan interval threshold and no process migration occurring, generating a third scan interval threshold according to the first scan interval threshold of the process, the third scan interval threshold being greater than the first scan interval threshold.
9. The method of any of claims 1-6, further comprising:
and responding to the memory application of the process in the memory module, and generating a second scanning memory threshold according to the first scanning memory threshold of the process, wherein the second scanning memory threshold is larger than the first scanning memory threshold.
10. The method of any of claims 1-6, further comprising:
and responding to the memory release of the process in the memory module, and generating a third scanning memory threshold according to the first scanning memory threshold of the process, wherein the third scanning memory threshold is smaller than the first scanning memory threshold.
11. A computing device, comprising:
one or more processors;
a memory; and
one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, the one or more programs including instructions for performing the method of any of claims 1-10.
12. A computer readable storage medium storing one or more programs, the one or more programs comprising instructions, which when executed by a computing device, cause the computing device to perform the method of any of claims 1-10.
CN202211342457.1A 2022-10-31 2022-10-31 Load balancing method, computing device and storage medium Active CN115421927B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211342457.1A CN115421927B (en) 2022-10-31 2022-10-31 Load balancing method, computing device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211342457.1A CN115421927B (en) 2022-10-31 2022-10-31 Load balancing method, computing device and storage medium

Publications (2)

Publication Number Publication Date
CN115421927A true CN115421927A (en) 2022-12-02
CN115421927B CN115421927B (en) 2023-03-24

Family

ID=84207659

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211342457.1A Active CN115421927B (en) 2022-10-31 2022-10-31 Load balancing method, computing device and storage medium

Country Status (1)

Country Link
CN (1) CN115421927B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116737481A (en) * 2023-08-07 2023-09-12 麒麟软件有限公司 Operating system optimization method for scanning size in automatic NUMA balance characteristic

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101859282A (en) * 2010-05-26 2010-10-13 浙江大学 Disk page swap-in method of virtual platform based on dual tracking
US20130232315A1 (en) * 2012-03-02 2013-09-05 Samsung Electronics Co., Ltd. Scalable, customizable, and load-balancing physical memory management scheme
CN104239184A (en) * 2014-09-22 2014-12-24 北京金山安全软件有限公司 Method and device for identifying abnormal application program of terminal and mobile terminal
CN114416310A (en) * 2021-12-29 2022-04-29 统信软件技术有限公司 Multiprocessor load balancing method, computing device and storage medium
CN114461375A (en) * 2021-07-30 2022-05-10 荣耀终端有限公司 Memory resource management method and electronic equipment
CN114675959A (en) * 2020-12-24 2022-06-28 华为技术有限公司 Memory recovery method and related device

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101859282A (en) * 2010-05-26 2010-10-13 浙江大学 Disk page swap-in method of virtual platform based on dual tracking
US20130232315A1 (en) * 2012-03-02 2013-09-05 Samsung Electronics Co., Ltd. Scalable, customizable, and load-balancing physical memory management scheme
CN104239184A (en) * 2014-09-22 2014-12-24 北京金山安全软件有限公司 Method and device for identifying abnormal application program of terminal and mobile terminal
CN114675959A (en) * 2020-12-24 2022-06-28 华为技术有限公司 Memory recovery method and related device
CN114461375A (en) * 2021-07-30 2022-05-10 荣耀终端有限公司 Memory resource management method and electronic equipment
CN114416310A (en) * 2021-12-29 2022-04-29 统信软件技术有限公司 Multiprocessor load balancing method, computing device and storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116737481A (en) * 2023-08-07 2023-09-12 麒麟软件有限公司 Operating system optimization method for scanning size in automatic NUMA balance characteristic
CN116737481B (en) * 2023-08-07 2023-11-24 麒麟软件有限公司 Operating system optimization method for scanning size in automatic NUMA balance characteristic

Also Published As

Publication number Publication date
CN115421927B (en) 2023-03-24

Similar Documents

Publication Publication Date Title
US10242022B1 (en) Systems and methods for managing delayed allocation on clustered file systems
CN107621959B (en) Electronic device and software training method and computing system thereof
US9229878B2 (en) Memory page offloading in multi-node computer systems
US9792220B2 (en) Microcontroller for memory management unit
US11921632B2 (en) Method and apparatus to use DRAM as a cache for slow byte-addressable memory for efficient cloud applications
US10642727B1 (en) Managing migration events performed by a memory controller
US11287999B2 (en) Multi-instance 2LM architecture for SCM applications
CN115421927B (en) Load balancing method, computing device and storage medium
US10459662B1 (en) Write failure handling for a memory controller to non-volatile memory
TW201939515A (en) Method and system for machine learning training
CN114416310A (en) Multiprocessor load balancing method, computing device and storage medium
KR20210143611A (en) Storage device supporting multi tenancy and operating method thereof
US10901914B2 (en) Method for writing multiple copies into storage device, and storage device
JP2017033375A (en) Parallel calculation system, migration method, and migration program
KR20150090621A (en) Storage device and method for data processing
KR20230156062A (en) Increased address space layout randomization entropy through page remapping and rotations
CN115061954B (en) Missing page interrupt processing method, computing device and storage medium
US11714753B2 (en) Methods and nodes for handling memory
CN115658324B (en) Process scheduling method, computing device and storage medium
US20230342458A1 (en) Techniques to mitigate cache-based side-channel attacks
US20240202031A1 (en) Resource Management Method and Corresponding Apparatus
CN114879987A (en) Kernel upgrading and using method, computing device and storage medium
CN114706828A (en) File loading method, computing device and storage medium
CN114741337A (en) Page table releasing method and computing equipment
CN114880097A (en) Process scheduling method and computing device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant