CN116048821A - High-utilization AI server and resource allocation method thereof - Google Patents

High-utilization AI server and resource allocation method thereof Download PDF

Info

Publication number
CN116048821A
CN116048821A CN202310344475.1A CN202310344475A CN116048821A CN 116048821 A CN116048821 A CN 116048821A CN 202310344475 A CN202310344475 A CN 202310344475A CN 116048821 A CN116048821 A CN 116048821A
Authority
CN
China
Prior art keywords
server
unit
resource utilization
servers
combination
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202310344475.1A
Other languages
Chinese (zh)
Other versions
CN116048821B (en
Inventor
欧阳浩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Haoyuan Nuoxin Technology Co ltd
Original Assignee
Shenzhen Haoyuan Nuoxin Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Haoyuan Nuoxin Technology Co ltd filed Critical Shenzhen Haoyuan Nuoxin Technology Co ltd
Priority to CN202310344475.1A priority Critical patent/CN116048821B/en
Publication of CN116048821A publication Critical patent/CN116048821A/en
Application granted granted Critical
Publication of CN116048821B publication Critical patent/CN116048821B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • G06F9/4881Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5061Partitioning or combining of resources
    • G06F9/5077Logical partitioning of resources; Management or configuration of virtualized resources
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5083Techniques for rebalancing the load in a distributed system
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The invention belongs to the technical field of servers, and particularly relates to a high-utilization-rate AI server and a resource allocation method thereof. The server includes: the system comprises a server array formed by a plurality of unit servers which can be mutually connected, a server combination unit for acquiring the use permission of the unit servers from the server array according to the type of the target task, and a specific form combination server combination unit for the unit servers acquiring the use permission. The server builds a server array through the unit servers, and then creatively acquires a specific number of unit servers according to the task types of the target tasks to perform specific form combination, so that the current tasks can be performed with higher efficiency, and meanwhile, when the unit servers are running, an equalizing network is built in the unit server group to equalize the resource utilization rate, so that the high utilization rate of the AI server is realized.

Description

High-utilization AI server and resource allocation method thereof
Technical Field
The invention belongs to the technical field of servers, and particularly relates to a high-utilization-rate AI server and a resource allocation method thereof.
Background
With the development of 5G, internet of things and artificial intelligence technologies, video monitoring systems are being vigorously built and popularized. The intelligent construction of society is gradually perfected and enriched. The current intelligent construction is mainly applied to intelligent cities, intelligent communities, intelligent construction sites, intelligent tourism, intelligent travel, intelligent street lamps, intelligent parking lots and the like, and is permeated into various industries. And intelligent video analysis plays a great role in intelligent construction. The application with video AI analysis capability is formed by using the existing video monitoring and the AI visual analysis and recognition technology. The functions of face feature recognition, human behavior analysis, body temperature monitoring, people flow density monitoring, illegal invasion of dangerous areas, smoke fire alarm, illegal occupation analysis and the like are realized, and the functions of irreplaceable functions are realized in the security field and other video data acquisition fields. The realization of the video AI analysis function requires high-performance service equipment, a large amount of software and algorithm services to support, and frequent data record storage and analysis processing. The resource consumption is high, the use cost is high, and the comprehensive popularization and application in the industry are difficult to implement.
Generally, in order to solve the problem of server resources, two aspects are needed, namely, improving the performance of the server and improving the utilization efficiency of the server. The former requires the promotion and optimization of the hardware equipment of the whole server, which results in relatively high cost. And the method only needs to carry out algorithm optimization on the existing server equipment, so that the utilization efficiency of the server resources can be improved, the consumption of the server resources is reduced, and the use cost is reduced.
Therefore, it is important to develop an efficient AI server and a resource allocation method thereof. Patent application number CN202110916339.6A discloses a dynamic allocation method for adjusting AI server resources based on variable frequency computing, which comprises the following steps: the data acquisition service step: collecting video data from different sources and converting the video data into video stream data which can be analyzed in the data analysis service step; a data analysis service step: and reading the latest analysis frequency configuration, analyzing the video data, extracting frames from video pictures, recording the video in sequence according to the AI analysis frequency configuration, generating AI analysis unit data, carrying out AI analysis service, and carrying out variable frequency analysis service according to the result data returned by the AI analysis service.
The essence is based on the result of AI frequency conversion analysis, thus realizing on-demand supply, but because the process can only process video stream, the analysis process itself also needs to consume more system resources, and the distribution accuracy is lower.
Disclosure of Invention
In view of the above, the main objective of the present invention is to provide a high-utilization AI server and a resource allocation method thereof, wherein the server constructs a server array through unit servers, and then creatively obtains a specific number of unit servers according to the task type of a target task to perform specific form combination, so that when executing a current task, the current task can be performed with higher efficiency, and meanwhile, when the unit servers are running, an equalization network is constructed in a unit server group to equalize the resource utilization, thereby realizing the high utilization of the AI server.
In order to achieve the above purpose, the technical scheme of the invention is realized as follows:
a high-utilization AI server, the server comprising: the system comprises a server array formed by a plurality of unit servers which can be mutually connected, a server combination unit for acquiring the use permission of the unit servers from the server array according to the type of the target task, and a specific form combination server combination unit for the unit servers acquiring the use permission.
Further, each unit server in the server array is provided with a server ID; each unit server comprises a mode distributor, when the server combination unit acquires the use authority of the unit server from the server array according to the target task type, the mode distributor in the unit server with acquired authority can assign the working mode of the unit server according to the target task type in real time.
Further, each working mode of each unit server at least corresponds to a specific form combination; the number of categories of the specific form combination of the unit servers is greater than or equal to the number of categories of the working mode of each unit server.
Further, when the server combination unit performs specific form combination on the unit servers, according to the target task types and the sequence of the server IDs, a specified number of unit servers are sequentially obtained from the server array, then the obtained unit servers are subjected to specific form combination according to the target task types to obtain unit server combinations, and in the operation process of the unit server combinations, each unit server executes a corresponding working mode according to the target task types.
Further, each unit server further comprises an equalizer; in the unit server combination operation process, an equalizer of each unit server forms an equalizing network to equalize the resources of each unit server.
A resource allocation method for an efficient AI server, the method comprising:
step 1: obtaining a target task type;
step 2: acquiring a specific number of unit servers from the server array based on the target task type;
step 3: based on the target task type, combining the acquired unit servers in a specific form to obtain a unit server combination;
step 4: the unit servers are combined to execute target tasks, and each unit server executes a corresponding working mode according to the type of the target task;
step 5: in the process of executing the target task, the resources of each unit server in the unit server combination are balanced.
Further, in the step 5, in the process of executing the target task, the method for balancing the resources of each unit server in the unit server combination includes: setting an equalizer in each unit server, wherein when the unit servers are combined to form a unit server combination, the equalizer of each unit server and other unit servers in the unit server combination are mutually connected to form an equalizing network, and each equalizer is a node in the equalizing network; in the process of executing the target task by the unit server combination, the balance network dynamically adjusts the resource utilization in the unit server combination to finish the resource balance.
Further, the parameters included in the working mode of each unit server are: the method comprises the steps of an upper limit of the resource utilization rate of a unit server, a lower limit of the resource utilization rate of the unit server, an initial resource utilization rate of the resource of the unit server and an instantaneous change rate of the resource of the unit server.
Further, the method for dynamically adjusting the resource utilization in the unit server combination by the balancing network to complete the resource balancing comprises the following steps: the equalizing network acquires the parameters of each unit server in the current working mode in real time, acquires the real-time resource utilization rate of each unit server in real time, and calculates the resource utilization weight value by using the following formula:
Figure SMS_2
the method comprises the steps of carrying out a first treatment on the surface of the Wherein (1)>
Figure SMS_5
The resource utilization rate upper limit of the unit server is set; />
Figure SMS_7
The resource utilization rate lower limit of the unit server; />
Figure SMS_3
The resource utilization rate is initialized for the resource of the unit server; />
Figure SMS_4
The real-time resource utilization rate of the unit server is realized; />
Figure SMS_6
The instantaneous rate of change of the resources for the cell server; />
Figure SMS_8
The weight is used as a resource utilization weight; />
Figure SMS_1
The number of adjacent unit servers under the current specific form combination for each unit server; and the balancing network continuously adjusts the real-time resource utilization rate of each unit server according to the real-time calculated resource utilization weight, so that the resource utilization weight is continuously increased, and the resource balancing is completed.
Further, the balancing network continuously adjusts the real-time resource utilization rate of each unit server according to the real-time calculated resource utilization weight, so that the resource utilization weight is continuously increased, and the method for completing the resource balancing comprises the following steps: recording the real-time calculated resource utilization weight, drawing a change curve of the resource utilization weight under a two-dimensional coordinate system, acquiring the slope of the curve at each moment according to the drawn change curve, and when the slope is the corresponding resource utilization weight changing from a positive value to a negative value, keeping the real-time resource utilization rate of each unit server under the corresponding resource utilization weight as the maximum value of the resource utilization weight, thereby completing the resource balance.
The high-utilization AI server and the resource allocation method thereof have the following beneficial effects:
1. the efficiency is higher: the AI server is realized through a server array constructed by the unit servers, and is similar to a matrix unlike the traditional server, each unit server is an element in the array, when a target task is executed, the unit servers are sequentially acquired from the server array, and then the unit servers are subjected to morphological integration, so that the whole server resource is integrated into zero, the server resource can be utilized more efficiently and in a fragmentation way, and meanwhile, the morphological integration can be more suitable for different target task types, so that the task execution efficiency is higher.
2. The accuracy is higher: in the invention, secondary allocation for server resources is also arranged in each unit server group, so that each unit server group can operate at the highest efficiency, and the accuracy and efficiency are further improved.
Drawings
Fig. 1 is a method flow chart of a resource allocation method of an AI server with high utilization rate according to an embodiment of the present invention;
fig. 2 is a schematic structural diagram of a server array of an AI server with high utilization and a method for allocating resources thereof according to an embodiment of the present invention;
FIG. 3 is a specific form combination of a unit server of the high-utilization AI server and a resource allocation method thereof according to an embodiment of the invention;
FIG. 4 is a specific form combination of a unit server of the high-utilization AI server and a resource allocation method thereof according to an embodiment of the invention;
FIG. 5 is a specific form combination of a unit server of the high-utilization AI server and a resource allocation method thereof according to an embodiment of the invention;
FIG. 6 is a specific form combination of a unit server of the high-utilization AI server and a resource allocation method thereof according to an embodiment of the invention;
fig. 7 is a specific form combination of a unit server of the AI server with high utilization and a resource allocation method thereof according to an embodiment of the present invention.
Detailed Description
The method of the present invention will be described in further detail with reference to the accompanying drawings.
Example 1
Referring to fig. 1, 2, 3, 4, 5, 6 and 7, a high-utilization AI server, the server comprising: the system comprises a server array formed by a plurality of unit servers which can be mutually connected, a server combination unit for acquiring the use permission of the unit servers from the server array according to the type of the target task, and a specific form combination server combination unit for the unit servers acquiring the use permission.
Specifically, the function of each unit server in the server array may be the same or different.
In case the functions of the cell servers are different, more types of tasks can be performed. However, since the server functions are different, the type of each unit server needs to be determined according to the target task type when the unit server is acquired, and thus the complexity of the whole process is greater.
Specifically, when the types of unit servers in the server array are various, for example, including: image server, video server, text server and voice server; aiming at the task types of the target task, firstly acquiring unit servers according to the sequence, judging the type of each acquired unit server, and if the target task type is an image task, releasing the acquired servers of other types back to the server array, and acquiring only the image server.
Unit servers refer to server units capable of independent operation, which may be interconnected with each other to collectively form a server array. The interconnection between the cell servers may be via a high-speed network, such as ethernet, infiniBand, etc., to enable high-speed data transmission and communication.
For different task types, the server selects a proper unit server from the server array for use, and combines the unit servers into a server combined unit in a specific form. This combination is typically based on the task type and the required resource configuration to maximize the utilization of server resources and achieve efficient completion of the task.
In a server farm, communication and data transfer between servers of the farm also requires efficient management and scheduling. The task is generally divided into a plurality of subtasks by adopting a distributed computing and parallel processing technology, and the subtasks are processed in parallel by different unit servers, and finally, the processing results are summarized, so that the computing efficiency and the throughput are improved.
In addition, in order to further improve the utilization rate of the server, a virtualization technology may be further adopted to divide the physical server into a plurality of virtual servers, so as to achieve more flexible and efficient resource management and utilization.
In summary, the high-utilization AI server realizes the maximum utilization of the server resources through the combination and the efficient management of the plurality of unit servers, thereby improving the performance and the throughput of the server.
Example 2
On the basis of the above embodiment, each unit server in the server array is provided with a server ID; each unit server comprises a mode distributor, when the server combination unit acquires the use authority of the unit server from the server array according to the target task type, the mode distributor in the unit server with acquired authority can assign the working mode of the unit server according to the target task type in real time.
Specifically, the server ID marks the order of each unit server in the server, and when acquired, the unit servers are acquired from the server array in order.
Each cell server is provided with a server ID, which is typically used to identify the server's uniqueness and identity so that the server array can accurately identify and manage each cell server.
Each cell server further comprises a mode allocator which is operative to assign a corresponding mode of operation to the cell server in dependence upon the target task type. This mode of operation typically includes different hardware and software configurations to meet the needs and requirements of different tasks.
When the server combination unit acquires the usage rights of the unit servers from the server array, the mode allocator designates the working mode of the unit servers according to the target task type in real time. This mode of operation typically includes hardware configuration such as CPU, GPU, memory, etc., and software configuration such as operating system, programming language, etc. The mode distributor can automatically select and adjust the working mode of the unit server according to the requirements and the demands of the tasks so as to meet the requirements and the demands of the tasks to the greatest extent and ensure the efficient utilization of the server resources.
For example, in performing an image recognition task, the pattern allocator may assign the cell server to use the GPU for performing acceleration calculations to increase the speed and accuracy of image recognition; when the data analysis task is carried out, the mode distributor can assign a large memory to be used for storage and processing for the unit server so as to meet the requirement of the data analysis task on the memory.
Example 3
On the basis of the above embodiment, each working mode of each unit server at least corresponds to one specific form combination; the number of categories of the specific form combination of the unit servers is greater than or equal to the number of categories of the working mode of each unit server.
Specifically, the specific morphological combinations correspond to the target task types. In practice, the combined morphology of these cell servers will significantly impact the efficiency of performing each task, depending on the type of target task. If the parallel requirements are higher, the unit servers are symmetrically parallel, so that the task execution efficiency can be improved. When some tasks are executed, the data stream needs to be processed at present and then processed in parallel, and at the moment, a single unit server is needed, and then the unit servers are connected in parallel to improve the efficiency.
In the high-utilization AI server, each element server can support a plurality of different operating modes, which are typically selected and configured according to task type and required resource configuration. Each working mode at least corresponds to one specific form combination, so that the server combination unit can select the corresponding specific form combination according to the task type and the required resource configuration, and the efficient utilization and management of the server resources are realized.
For example, when performing deep learning tasks, one unit server may support CPU, GPU, FPGA and other different working modes, where each working mode corresponds to different hardware and software configurations, so as to meet the computing requirements of different deep learning models. Each working mode at least corresponds to a specific form combination, such as collaborative calculation by using a plurality of CPUs and GPUs, or acceleration calculation by using an FPGA, and the like.
It should be noted that the number of categories of the specific form combination of the unit servers is greater than or equal to the number of categories of the operation mode of each unit server. This is because a particular modality combination can generally support multiple different modes of operation simultaneously to meet the needs and requirements of different tasks. For example, a specific form combination can simultaneously support different working modes such as a CPU, a GPU and an FPGA, so that the server combination unit can flexibly select and adjust the working modes according to task types and required resource configuration, and the requirements and demands of tasks can be met to the greatest extent.
In summary, each operation mode of each unit server corresponds to at least one specific form combination, and the number of categories of the specific form combination of the unit server is greater than or equal to the number of categories of the operation mode of each unit server. The design can utilize server resources to the greatest extent, and achieve the purposes of high performance and high throughput.
Example 4
On the basis of the above embodiment, when the server combination unit performs specific form combination on the unit servers, according to the target task types, a specified number of unit servers are sequentially obtained from the server array according to the sequence of the server IDs, then, according to the target task types, the obtained unit servers are subjected to specific form combination to obtain a unit server combination, and in the operation process of the unit server combination, each unit server performs a corresponding working mode according to the target task types.
In particular, different modes of operation will be used for different types of target tasks. The parameters of each working mode can improve the execution efficiency of the task according to different task types.
In the AI server with high utilization, when the server combination unit performs specific form combination on the unit servers, a specified number of unit servers are sequentially obtained from the server array according to the target task type and the sequence of the server IDs. The obtained unit servers can be combined in a specific form according to the target task types so as to meet the requirements and demands of the tasks.
Specifically, the server combination unit generally calculates the number of required unit servers and the working mode of each unit server according to the task type and the required resource configuration, and then sequentially obtains the designated number of unit servers from the server array according to the sequence of the server IDs so as to meet the requirements and demands of the task. The obtained unit servers can be combined in a specific form according to the target task types, so that efficient utilization and management of server resources are realized.
For example, in performing an image recognition task, the server farm may first calculate the number of cell servers required and the mode of operation of each cell server, such as using multiple GPUs to accelerate the computation. And then sequentially acquiring the specified number of GPU servers from the server array according to the sequence of the server IDs, and combining the specific forms to realize the maximum utilization of GPU resources.
In the unit server combined operation process, each unit server executes a corresponding working mode according to the target task type so as to realize efficient completion of the task. The unit servers are typically interconnected through a high-speed network, such as ethernet, infiniBand, etc., to achieve high-speed data transmission and communication, thereby further improving server performance and throughput.
Example 5
In accordance with the above embodiment, each unit server further includes an equalizer; in the unit server combination operation process, an equalizer of each unit server forms an equalizing network to equalize the resources of each unit server.
In the high-utilization AI server, each unit server further comprises an equalizer, and the equalizer is used for equalizing the resources of each unit server in the unit server combination operation process so as to realize more efficient task processing and resource utilization.
In particular, each cell server in a cell server assembly may perform different tasks, which typically require different computing resources, such as CPU, GPU, memory, etc. The equalizer can monitor the resource utilization condition of each unit server and perform resource scheduling and equalization according to the requirement so as to ensure the equalization and maximization of the resource utilization of each unit server.
In the combined operation process of the unit servers, an equalizer of each unit server forms an equalizing network, and the resources of each unit server are equalized through message transmission and cooperation in the network. For example, when the CPU utilization of a certain unit server is too high, the equalizer may request CPU resources from other idle unit servers through network messages, and allocate the requested resources to required tasks for processing, so as to achieve balanced and maximum utilization of the CPU resources.
It should be noted that the equalizer needs to consider multiple factors for resource scheduling and balancing, such as task types, resource requirements, network loads, etc., to implement optimal scheduling and balancing of unit server resources. In addition, in the unit server combination, the equalizer needs to consider the transmission and communication efficiency of data, so as to realize efficient data transmission and communication, thereby further improving the performance and throughput of the server.
Example 6
A method for efficient allocation of resources to AI servers, said method comprising the steps of:
step 1: obtaining a target task type;
step 2: acquiring a specific number of unit servers from the server array based on the target task type;
step 3: based on the target task type, combining the acquired unit servers in a specific form to obtain a unit server combination;
step 4: the unit servers are combined to execute target tasks, and each unit server executes a corresponding working mode according to the type of the target task;
step 5: in the process of executing the target task, the resources of each unit server in the unit server combination are balanced.
Specifically, the AI server is realized through a server array constructed by unit servers, unlike the traditional server, the server array is similar to a matrix, each unit server is an element in the array, when a target task is executed, the unit servers are sequentially acquired from the server array, and then the unit servers are morphologically integrated, so that the whole server resource is integrated into zero, the server resource can be utilized more efficiently and in a fragmentation way, and meanwhile, the morphological integration can be more suitable for different target task types, and the task execution efficiency is higher.
Example 7
On the basis of the above embodiment, in the step 5, in the process of executing the target task, the method for balancing the resources of each unit server in the unit server combination includes: setting an equalizer in each unit server, wherein when the unit servers are combined to form a unit server combination, the equalizer of each unit server and other unit servers in the unit server combination are mutually connected to form an equalizing network, and each equalizer is a node in the equalizing network; in the process of executing the target task by the unit server combination, the balance network dynamically adjusts the resource utilization in the unit server combination to finish the resource balance.
In the step 5, in the process of executing the target task, the method for balancing the resources of each unit server in the unit server combination generally includes the following two key steps:
an equalizer is set in each unit server to form an equalizing network.
In this step, in order to implement balancing of resources of each unit server in the unit server combination, each unit server is provided with an equalizer, and when the unit servers are combined to form the unit server combination, the equalizer of each unit server and other unit servers in the unit server combination are interconnected to form an equalizing network, and each equalizer is a node in the equalizing network. The equalizer is typically a software component that monitors the resource utilization of each cell server and performs resource scheduling and equalization as needed.
In the process of executing the target task by the unit server combination, the balance network dynamically adjusts the resource utilization in the unit server combination to finish the resource balance.
In this step, the balancing network monitors the resource utilization of each unit server and performs resource scheduling and balancing as needed to ensure that the resource utilization of each unit server is balanced and maximized. For example, when the CPU utilization of a certain unit server is too high, the equalizer may request CPU resources from other idle unit servers through network messages, and allocate the requested resources to required tasks for processing, so as to achieve balanced and maximum utilization of the CPU resources. The balancing network generally performs resource scheduling and balancing by considering a plurality of factors, such as task types, resource demands, network loads, and the like, so as to implement optimal scheduling and balancing of unit server resources.
It should be noted that the equalization network needs to consider the transmission and communication efficiency of data to achieve efficient data transmission and communication, so as to further improve the performance and throughput of the server. In addition, the balancing network needs to consider security and reliability to ensure the operation stability and security of the server.
Example 8
On the basis of the above embodiment, the parameters included in the working mode of each unit server are: the method comprises the steps of an upper limit of the resource utilization rate of a unit server, a lower limit of the resource utilization rate of the unit server, an initial resource utilization rate of the resource of the unit server and an instantaneous change rate of the resource of the unit server.
Specifically, if the resource allocation can be more refined, more accurate and efficient allocation can be realized if the resource allocation is more refined when the resource allocation is a factor affecting the accuracy of the resource allocation.
In the high-efficiency AI server, the parameters included in the operation mode of each unit server are an upper limit of the resource utilization rate of the unit server, a lower limit of the resource utilization rate of the unit server, an initial resource utilization rate of the unit server, and an instantaneous change rate of the resource of the unit server. These parameters are used to control and adjust the resource utilization of the element servers to achieve optimal utilization and management of the element servers.
Specifically, the upper and lower limits of the resource utilization of the unit server are used to control the resource utilization range of the unit server to avoid the situations of resource waste and over-utilization. The initial utilization rate of the resources is used for determining the initial state of the unit server, and the instantaneous change rate of the resources is used for monitoring and adjusting the change condition of the utilization of the resources of the unit server so as to realize the dynamic management and scheduling of the unit server.
Meanwhile, in order to further improve the accuracy and efficiency of resource allocation, the invention adopts secondary allocation for server resources. In particular, within each cell server group, resource allocation and scheduling is also performed once so that each cell server group can operate at the highest efficiency. This secondary allocation process typically takes into account factors such as task type, server resource requirements, network load, etc. to achieve optimal utilization and scheduling of unit server resources, thereby further improving server performance and throughput.
In summary, in the high-efficiency AI server, the working mode of each unit server includes an upper limit of the resource utilization rate of the unit server, a lower limit of the resource utilization rate of the unit server, an initial resource utilization rate of the resource of the unit server, and an instantaneous change rate of the resource of the unit server, which are used for controlling and adjusting the resource utilization condition of the unit server, so as to realize the optimal utilization and management of the unit server. Meanwhile, secondary allocation for server resources is adopted, so that accuracy and efficiency are further improved.
Example 9
On the basis of the above embodiment, the method for dynamically adjusting the resource utilization in the unit server combination by the balancing network to complete the resource balancing includes: the equalizing network acquires the parameters of each unit server in the current working mode in real time, acquires the real-time resource utilization rate of each unit server in real time, and calculates the resource utilization weight value by using the following formula:
Figure SMS_11
the method comprises the steps of carrying out a first treatment on the surface of the Wherein (1)>
Figure SMS_13
The resource utilization rate upper limit of the unit server is set; />
Figure SMS_15
The resource utilization rate lower limit of the unit server; />
Figure SMS_10
The resource utilization rate is initialized for the resource of the unit server; />
Figure SMS_12
The real-time resource utilization rate of the unit server is realized; />
Figure SMS_14
The instantaneous rate of change of the resources for the cell server; />
Figure SMS_16
The weight is used as a resource utilization weight; />
Figure SMS_9
The number of adjacent unit servers under the current specific form combination for each unit server; and the balancing network continuously adjusts the real-time resource utilization rate of each unit server according to the real-time calculated resource utilization weight, so that the resource utilization weight is continuously increased, and the resource balancing is completed.
Example 10
Based on the above embodiment, the balancing network continuously adjusts the real-time resource utilization rate of each unit server according to the real-time calculated resource utilization weight, so that the resource utilization weight is continuously increased, and the method for completing the resource balancing comprises the following steps: recording the real-time calculated resource utilization weight, drawing a change curve of the resource utilization weight under a two-dimensional coordinate system, acquiring the slope of the curve at each moment according to the drawn change curve, and when the slope is the corresponding resource utilization weight changing from a positive value to a negative value, keeping the real-time resource utilization rate of each unit server under the corresponding resource utilization weight as the maximum value of the resource utilization weight, thereby completing the resource balance.
In the high-efficiency AI server, the balancing network continuously adjusts the real-time resource utilization rate of each unit server according to the real-time calculated resource utilization weight, so that the resource utilization weight is continuously increased, and the method for completing resource balancing adopts a self-adaptive adjustment method based on the slope of a change curve. Specifically, the balancing network records the resource utilization weight calculated in real time, and draws a change curve of the resource utilization weight under a two-dimensional coordinate system. And then, according to the drawn change curve, acquiring the slope of the curve at each moment, finding a corresponding resource utilization weight when the slope changes from a positive value to a negative value, and setting the weight as the maximum value of the resource utilization weight. And finally, maintaining the real-time resource utilization rate of each unit server under the corresponding resource utilization weight to finish the resource balance.
Specifically, when the resource utilization in the unit server combination is unbalanced, the balancing network continuously adjusts the real-time resource utilization of each unit server to increase the resource utilization weight as much as possible. At different moments, the change curve of the resource utilization weight will show different slopes, wherein the part with the larger slope indicates that the change speed of the resource utilization rate is faster, and the part with the smaller slope indicates that the change speed of the resource utilization rate is slower. Therefore, an optimal resource utilization weight can be found at the slope change of the resource utilization weight change curve, so that the resource utilization rate under the weight is balanced and efficient. Specifically, when the slope changes from a positive value to a negative value, this means that the rate of change of the resource utilization rate begins to slow, and at this time, the corresponding resource utilization weight is the optimal resource utilization weight, so that the real-time resource utilization rate of each unit server under the weight can be maintained, so as to complete resource balancing.
It should be noted that, in the system provided in the foregoing embodiment, only the division of the foregoing functional units is illustrated, in practical application, the foregoing functional allocation may be performed by different functional units, that is, the units or steps in the embodiment of the present invention are further decomposed or combined, for example, the units in the foregoing embodiment may be combined into one unit, or may be further split into multiple sub-units, so as to complete all or the functions of the units described above. The names of the units and the steps related to the embodiment of the invention are only used for distinguishing the units or the steps, and are not to be construed as undue limitation of the invention.
It will be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working process of the storage device and the processing device described above and the related description may refer to the corresponding process in the foregoing method embodiment, which is not repeated herein.
Those of skill in the art will appreciate that the various illustrative elements, method steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the program(s) corresponding to the software elements, method steps may be embodied in Random Access Memory (RAM), memory, read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, removable disk, CD-ROM, or any other form of storage medium known in the art. To clearly illustrate this interchangeability of electronic hardware and software, various illustrative components and steps have been described above generally in terms of their functionality. Whether such functionality is implemented as electronic hardware or software depends upon the particular application and design constraints imposed on the solution. Those skilled in the art may implement the described functionality using different approaches for each particular application, but such implementation is not intended to be limiting.
The terms "first," "another portion," and the like, are used for distinguishing between similar objects and not for describing a particular sequential or chronological order.
The terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or unit/apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or unit/apparatus.
Thus far, the technical solution of the present invention has been described in connection with the preferred embodiments shown in the drawings, but it is easily understood by those skilled in the art that the scope of protection of the present invention is not limited to these specific embodiments. Equivalent modifications and substitutions for related art marks may be made by those skilled in the art without departing from the principles of the present invention, and such modifications and substitutions will fall within the scope of the present invention.
The foregoing description is only of the preferred embodiments of the present invention and is not intended to limit the scope of the present invention.

Claims (10)

1. A high-utilization AI server, the server comprising: the system comprises a server array formed by a plurality of unit servers which can be mutually connected, a server combination unit for acquiring the use permission of the unit servers from the server array according to the type of the target task, and a specific form combination server combination unit for the unit servers acquiring the use permission.
2. The high-utilization AI server of claim 1, wherein each element server in the server array is provided with a server ID; each unit server comprises a mode distributor, when the server combination unit acquires the use authority of the unit server from the server array according to the target task type, the mode distributor in the unit server with acquired authority can assign the working mode of the unit server according to the target task type in real time.
3. The high-utilization AI server of claim 2, wherein each of the operation modes of each of the cell servers corresponds to at least one specific morphology combination; the number of categories of the specific form combination of the unit servers is greater than or equal to the number of categories of the working mode of each unit server.
4. The AI server with high utilization ratio according to claim 3, wherein when the server combination unit performs specific form combination on the unit servers, according to the target task types, a specified number of unit servers are sequentially obtained from the server array according to the sequence of the server IDs, then the obtained unit servers are subjected to specific form combination according to the target task types to obtain a unit server combination, and each unit server performs a corresponding working mode according to the target task types in the operation process of the unit server combination.
5. The high utilization AI server of claim 4, wherein each cell server further comprises an equalizer; in the unit server combination operation process, an equalizer of each unit server forms an equalizing network to equalize the resources of each unit server.
6. A resource allocation method for the high-efficiency AI server of one of claims 1 to 5, characterized in that the method performs the steps of:
step 1: obtaining a target task type;
step 2: acquiring a specific number of unit servers from the server array based on the target task type;
step 3: based on the target task type, combining the acquired unit servers in a specific form to obtain a unit server combination;
step 4: the unit servers are combined to execute target tasks, and each unit server executes a corresponding working mode according to the type of the target task;
step 5: in the process of executing the target task, the resources of each unit server in the unit server combination are balanced.
7. The method of claim 6, wherein the step 5 of balancing the resources of each element server in the set of element servers during the execution of the target task comprises: setting an equalizer in each unit server, wherein when the unit servers are combined to form a unit server combination, the equalizer of each unit server and other unit servers in the unit server combination are mutually connected to form an equalizing network, and each equalizer is a node in the equalizing network; in the process of executing the target task by the unit server combination, the balance network dynamically adjusts the resource utilization in the unit server combination to finish the resource balance.
8. The method of claim 7, wherein the parameters included in the operation mode of each cell server are: the method comprises the steps of an upper limit of the resource utilization rate of a unit server, a lower limit of the resource utilization rate of the unit server, an initial resource utilization rate of the resource of the unit server and an instantaneous change rate of the resource of the unit server.
9. The method of claim 8, wherein the balancing network dynamically adjusts resource utilization in the cell server assembly, and wherein the method for performing resource balancing comprises: the equalizing network acquires the parameters of each unit server in the current working mode in real time, acquires the real-time resource utilization rate of each unit server in real time, and calculates the resource utilization weight value by using the following formula:
Figure QLYQS_1
the method comprises the steps of carrying out a first treatment on the surface of the Wherein (1)>
Figure QLYQS_5
The resource utilization rate upper limit of the unit server is set; />
Figure QLYQS_7
The resource utilization rate lower limit of the unit server; />
Figure QLYQS_3
The resource utilization rate is initialized for the resource of the unit server; />
Figure QLYQS_4
The real-time resource utilization rate of the unit server is realized; />
Figure QLYQS_6
The instantaneous rate of change of the resources for the cell server;
Figure QLYQS_8
the weight is used as a resource utilization weight; />
Figure QLYQS_2
The number of adjacent unit servers under the current specific form combination for each unit server; and the balancing network continuously adjusts the real-time resource utilization rate of each unit server according to the real-time calculated resource utilization weight, so that the resource utilization weight is continuously increased, and the resource balancing is completed.
10. The method of claim 9, wherein the balancing network continuously adjusts the real-time resource utilization of each cell server according to the real-time calculated resource utilization weights such that the resource utilization weights are continuously increased, the method for accomplishing resource balancing comprising: recording the real-time calculated resource utilization weight, drawing a change curve of the resource utilization weight under a two-dimensional coordinate system, acquiring the slope of the curve at each moment according to the drawn change curve, and when the slope is the corresponding resource utilization weight changing from a positive value to a negative value, keeping the real-time resource utilization rate of each unit server under the corresponding resource utilization weight as the maximum value of the resource utilization weight, thereby completing the resource balance.
CN202310344475.1A 2023-04-03 2023-04-03 High-utilization AI server and resource allocation method thereof Active CN116048821B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310344475.1A CN116048821B (en) 2023-04-03 2023-04-03 High-utilization AI server and resource allocation method thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310344475.1A CN116048821B (en) 2023-04-03 2023-04-03 High-utilization AI server and resource allocation method thereof

Publications (2)

Publication Number Publication Date
CN116048821A true CN116048821A (en) 2023-05-02
CN116048821B CN116048821B (en) 2023-06-16

Family

ID=86125909

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310344475.1A Active CN116048821B (en) 2023-04-03 2023-04-03 High-utilization AI server and resource allocation method thereof

Country Status (1)

Country Link
CN (1) CN116048821B (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030069972A1 (en) * 2001-10-10 2003-04-10 Yutaka Yoshimura Computer resource allocating method
CN102088735A (en) * 2010-03-24 2011-06-08 电信科学技术研究院 Method and equipment for balancing inter-sub-frame traffic load and processing inter-cell interference (ICI)
CN112416516A (en) * 2020-11-20 2021-02-26 中国电子科技集团公司第二十八研究所 Cloud data center resource scheduling method for resource utility improvement
CN113010576A (en) * 2021-03-19 2021-06-22 中国建设银行股份有限公司 Method, device, equipment and storage medium for capacity evaluation of cloud computing system
CN113946436A (en) * 2021-07-29 2022-01-18 西北大学 Resource pre-scheduling method based on load balancing
CN115408152A (en) * 2022-08-23 2022-11-29 吉兴信(广东)信息技术有限公司 Adaptive resource matching obtaining method and system
US11561815B1 (en) * 2020-02-24 2023-01-24 Amazon Technologies, Inc. Power aware load placement

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030069972A1 (en) * 2001-10-10 2003-04-10 Yutaka Yoshimura Computer resource allocating method
CN102088735A (en) * 2010-03-24 2011-06-08 电信科学技术研究院 Method and equipment for balancing inter-sub-frame traffic load and processing inter-cell interference (ICI)
US11561815B1 (en) * 2020-02-24 2023-01-24 Amazon Technologies, Inc. Power aware load placement
CN112416516A (en) * 2020-11-20 2021-02-26 中国电子科技集团公司第二十八研究所 Cloud data center resource scheduling method for resource utility improvement
CN113010576A (en) * 2021-03-19 2021-06-22 中国建设银行股份有限公司 Method, device, equipment and storage medium for capacity evaluation of cloud computing system
CN113946436A (en) * 2021-07-29 2022-01-18 西北大学 Resource pre-scheduling method based on load balancing
CN115408152A (en) * 2022-08-23 2022-11-29 吉兴信(广东)信息技术有限公司 Adaptive resource matching obtaining method and system

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
BARATH RAGHAVAN 等: "Cloud control with distributed rate limiting", 《ACM SIGCOMM COMPUTER COMMUNICATION REVIEW》, vol. 37, no. 4, pages 337 - 348 *
MOHIT KUMAR 等: "Dynamic load balancing algorithm for balancing the workload among virtual machine in cloud computing", 《PROCEDIA COMPUTER SCIENCE》, vol. 115, pages 322 - 329, XP085212803, DOI: 10.1016/j.procs.2017.09.141 *
金明 等: "基于深度强化学习的服务功能链映射算法", 《计算机应用研究》, vol. 37, no. 11, pages 3456 - 3460 *

Also Published As

Publication number Publication date
CN116048821B (en) 2023-06-16

Similar Documents

Publication Publication Date Title
CN110096349B (en) Job scheduling method based on cluster node load state prediction
CN108509276B (en) Video task dynamic migration method in edge computing environment
CN107888669B (en) Deep learning neural network-based large-scale resource scheduling system and method
CN107659433B (en) Cloud resource scheduling method and equipment
CN105550323B (en) Load balance prediction method and prediction analyzer for distributed database
CN105491138B (en) Distributed load scheduling method based on load rate graded triggering
Nabavinejad et al. Coordinated batching and dvfs for dnn inference on gpu accelerators
CN105260235A (en) Method and device for scheduling resources on basis of application scenarios in cloud platform
CN111045828A (en) Distributed edge calculation method based on distribution network area terminal and related device
Nazir et al. Cuckoo optimization algorithm based job scheduling using cloud and fog computing in smart grid
Li et al. Adaptive resource allocation based on the billing granularity in edge-cloud architecture
CN104375897A (en) Cloud computing resource scheduling method based on minimum relative load imbalance degree
CN110347515B (en) Resource optimization allocation method suitable for edge computing environment
CN108270805A (en) For the resource allocation methods and device of data processing
CN104657205A (en) Virtualization-based video content analyzing method and system
CN105488134A (en) Big data processing method and big data processing device
CN112835698A (en) Heterogeneous cluster-based dynamic load balancing method for request classification processing
CN116467076A (en) Multi-cluster scheduling method and system based on cluster available resources
CN108845886A (en) Cloud computing energy consumption optimization method and system based on phase space
CN107301094A (en) The dynamic self-adapting data model inquired about towards extensive dynamic transaction
More et al. Energy-aware VM migration using dragonfly–crow optimization and support vector regression model in Cloud
CN108170522B (en) Cloud computing virtual machine migration control method based on dynamic threshold
CN114327811A (en) Task scheduling method, device and equipment and readable storage medium
CN116048821B (en) High-utilization AI server and resource allocation method thereof
CN112685162A (en) High-efficiency scheduling method, system and medium for heterogeneous computing resources of edge server

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant