CN116048821A

CN116048821A - High-utilization AI server and resource allocation method thereof

Info

Publication number: CN116048821A
Application number: CN202310344475.1A
Authority: CN
Inventors: 欧阳浩
Original assignee: Shenzhen Haoyuan Nuoxin Technology Co ltd
Current assignee: Shenzhen Haoyuan Nuoxin Technology Co ltd
Priority date: 2023-04-03
Filing date: 2023-04-03
Publication date: 2023-05-02
Anticipated expiration: 2043-04-03
Also published as: CN116048821B

Abstract

The invention belongs to the technical field of servers, and particularly relates to a high-utilization-rate AI server and a resource allocation method thereof. The server includes: the system comprises a server array formed by a plurality of unit servers which can be mutually connected, a server combination unit for acquiring the use permission of the unit servers from the server array according to the type of the target task, and a specific form combination server combination unit for the unit servers acquiring the use permission. The server builds a server array through the unit servers, and then creatively acquires a specific number of unit servers according to the task types of the target tasks to perform specific form combination, so that the current tasks can be performed with higher efficiency, and meanwhile, when the unit servers are running, an equalizing network is built in the unit server group to equalize the resource utilization rate, so that the high utilization rate of the AI server is realized.

Description

High-utilization AI server and resource allocation method thereof

Technical Field

The invention belongs to the technical field of servers, and particularly relates to a high-utilization-rate AI server and a resource allocation method thereof.

Background

With the development of 5G, internet of things and artificial intelligence technologies, video monitoring systems are being vigorously built and popularized. The intelligent construction of society is gradually perfected and enriched. The current intelligent construction is mainly applied to intelligent cities, intelligent communities, intelligent construction sites, intelligent tourism, intelligent travel, intelligent street lamps, intelligent parking lots and the like, and is permeated into various industries. And intelligent video analysis plays a great role in intelligent construction. The application with video AI analysis capability is formed by using the existing video monitoring and the AI visual analysis and recognition technology. The functions of face feature recognition, human behavior analysis, body temperature monitoring, people flow density monitoring, illegal invasion of dangerous areas, smoke fire alarm, illegal occupation analysis and the like are realized, and the functions of irreplaceable functions are realized in the security field and other video data acquisition fields. The realization of the video AI analysis function requires high-performance service equipment, a large amount of software and algorithm services to support, and frequent data record storage and analysis processing. The resource consumption is high, the use cost is high, and the comprehensive popularization and application in the industry are difficult to implement.

Generally, in order to solve the problem of server resources, two aspects are needed, namely, improving the performance of the server and improving the utilization efficiency of the server. The former requires the promotion and optimization of the hardware equipment of the whole server, which results in relatively high cost. And the method only needs to carry out algorithm optimization on the existing server equipment, so that the utilization efficiency of the server resources can be improved, the consumption of the server resources is reduced, and the use cost is reduced.

Therefore, it is important to develop an efficient AI server and a resource allocation method thereof. Patent application number CN202110916339.6A discloses a dynamic allocation method for adjusting AI server resources based on variable frequency computing, which comprises the following steps: the data acquisition service step: collecting video data from different sources and converting the video data into video stream data which can be analyzed in the data analysis service step; a data analysis service step: and reading the latest analysis frequency configuration, analyzing the video data, extracting frames from video pictures, recording the video in sequence according to the AI analysis frequency configuration, generating AI analysis unit data, carrying out AI analysis service, and carrying out variable frequency analysis service according to the result data returned by the AI analysis service.

The essence is based on the result of AI frequency conversion analysis, thus realizing on-demand supply, but because the process can only process video stream, the analysis process itself also needs to consume more system resources, and the distribution accuracy is lower.

Disclosure of Invention

In view of the above, the main objective of the present invention is to provide a high-utilization AI server and a resource allocation method thereof, wherein the server constructs a server array through unit servers, and then creatively obtains a specific number of unit servers according to the task type of a target task to perform specific form combination, so that when executing a current task, the current task can be performed with higher efficiency, and meanwhile, when the unit servers are running, an equalization network is constructed in a unit server group to equalize the resource utilization, thereby realizing the high utilization of the AI server.

In order to achieve the above purpose, the technical scheme of the invention is realized as follows:

a high-utilization AI server, the server comprising: the system comprises a server array formed by a plurality of unit servers which can be mutually connected, a server combination unit for acquiring the use permission of the unit servers from the server array according to the type of the target task, and a specific form combination server combination unit for the unit servers acquiring the use permission.

Further, each unit server in the server array is provided with a server ID; each unit server comprises a mode distributor, when the server combination unit acquires the use authority of the unit server from the server array according to the target task type, the mode distributor in the unit server with acquired authority can assign the working mode of the unit server according to the target task type in real time.

Further, each working mode of each unit server at least corresponds to a specific form combination; the number of categories of the specific form combination of the unit servers is greater than or equal to the number of categories of the working mode of each unit server.

Further, when the server combination unit performs specific form combination on the unit servers, according to the target task types and the sequence of the server IDs, a specified number of unit servers are sequentially obtained from the server array, then the obtained unit servers are subjected to specific form combination according to the target task types to obtain unit server combinations, and in the operation process of the unit server combinations, each unit server executes a corresponding working mode according to the target task types.

Further, each unit server further comprises an equalizer; in the unit server combination operation process, an equalizer of each unit server forms an equalizing network to equalize the resources of each unit server.

A resource allocation method for an efficient AI server, the method comprising:

step 1: obtaining a target task type;

step 2: acquiring a specific number of unit servers from the server array based on the target task type;

step 3: based on the target task type, combining the acquired unit servers in a specific form to obtain a unit server combination;

step 4: the unit servers are combined to execute target tasks, and each unit server executes a corresponding working mode according to the type of the target task;

step 5: in the process of executing the target task, the resources of each unit server in the unit server combination are balanced.

Further, in the step 5, in the process of executing the target task, the method for balancing the resources of each unit server in the unit server combination includes: setting an equalizer in each unit server, wherein when the unit servers are combined to form a unit server combination, the equalizer of each unit server and other unit servers in the unit server combination are mutually connected to form an equalizing network, and each equalizer is a node in the equalizing network; in the process of executing the target task by the unit server combination, the balance network dynamically adjusts the resource utilization in the unit server combination to finish the resource balance.

Further, the parameters included in the working mode of each unit server are: the method comprises the steps of an upper limit of the resource utilization rate of a unit server, a lower limit of the resource utilization rate of the unit server, an initial resource utilization rate of the resource of the unit server and an instantaneous change rate of the resource of the unit server.

Further, the method for dynamically adjusting the resource utilization in the unit server combination by the balancing network to complete the resource balancing comprises the following steps: the equalizing network acquires the parameters of each unit server in the current working mode in real time, acquires the real-time resource utilization rate of each unit server in real time, and calculates the resource utilization weight value by using the following formula:

the method comprises the steps of carrying out a first treatment on the surface of the Wherein (1)>

The resource utilization rate upper limit of the unit server is set; />

The resource utilization rate lower limit of the unit server; />

The resource utilization rate is initialized for the resource of the unit server; />

The real-time resource utilization rate of the unit server is realized; />

The instantaneous rate of change of the resources for the cell server; />

The weight is used as a resource utilization weight; />

The number of adjacent unit servers under the current specific form combination for each unit server; and the balancing network continuously adjusts the real-time resource utilization rate of each unit server according to the real-time calculated resource utilization weight, so that the resource utilization weight is continuously increased, and the resource balancing is completed.

Further, the balancing network continuously adjusts the real-time resource utilization rate of each unit server according to the real-time calculated resource utilization weight, so that the resource utilization weight is continuously increased, and the method for completing the resource balancing comprises the following steps: recording the real-time calculated resource utilization weight, drawing a change curve of the resource utilization weight under a two-dimensional coordinate system, acquiring the slope of the curve at each moment according to the drawn change curve, and when the slope is the corresponding resource utilization weight changing from a positive value to a negative value, keeping the real-time resource utilization rate of each unit server under the corresponding resource utilization weight as the maximum value of the resource utilization weight, thereby completing the resource balance.

The high-utilization AI server and the resource allocation method thereof have the following beneficial effects:

1. the efficiency is higher: the AI server is realized through a server array constructed by the unit servers, and is similar to a matrix unlike the traditional server, each unit server is an element in the array, when a target task is executed, the unit servers are sequentially acquired from the server array, and then the unit servers are subjected to morphological integration, so that the whole server resource is integrated into zero, the server resource can be utilized more efficiently and in a fragmentation way, and meanwhile, the morphological integration can be more suitable for different target task types, so that the task execution efficiency is higher.

2. The accuracy is higher: in the invention, secondary allocation for server resources is also arranged in each unit server group, so that each unit server group can operate at the highest efficiency, and the accuracy and efficiency are further improved.

Drawings

Fig. 1 is a method flow chart of a resource allocation method of an AI server with high utilization rate according to an embodiment of the present invention;

fig. 2 is a schematic structural diagram of a server array of an AI server with high utilization and a method for allocating resources thereof according to an embodiment of the present invention;

FIG. 3 is a specific form combination of a unit server of the high-utilization AI server and a resource allocation method thereof according to an embodiment of the invention;

FIG. 4 is a specific form combination of a unit server of the high-utilization AI server and a resource allocation method thereof according to an embodiment of the invention;

FIG. 5 is a specific form combination of a unit server of the high-utilization AI server and a resource allocation method thereof according to an embodiment of the invention;

FIG. 6 is a specific form combination of a unit server of the high-utilization AI server and a resource allocation method thereof according to an embodiment of the invention;

fig. 7 is a specific form combination of a unit server of the AI server with high utilization and a resource allocation method thereof according to an embodiment of the present invention.

Detailed Description

The method of the present invention will be described in further detail with reference to the accompanying drawings.

Example 1

Referring to fig. 1, 2, 3, 4, 5, 6 and 7, a high-utilization AI server, the server comprising: the system comprises a server array formed by a plurality of unit servers which can be mutually connected, a server combination unit for acquiring the use permission of the unit servers from the server array according to the type of the target task, and a specific form combination server combination unit for the unit servers acquiring the use permission.

Specifically, the function of each unit server in the server array may be the same or different.

In case the functions of the cell servers are different, more types of tasks can be performed. However, since the server functions are different, the type of each unit server needs to be determined according to the target task type when the unit server is acquired, and thus the complexity of the whole process is greater.

Specifically, when the types of unit servers in the server array are various, for example, including: image server, video server, text server and voice server; aiming at the task types of the target task, firstly acquiring unit servers according to the sequence, judging the type of each acquired unit server, and if the target task type is an image task, releasing the acquired servers of other types back to the server array, and acquiring only the image server.

Unit servers refer to server units capable of independent operation, which may be interconnected with each other to collectively form a server array. The interconnection between the cell servers may be via a high-speed network, such as ethernet, infiniBand, etc., to enable high-speed data transmission and communication.

For different task types, the server selects a proper unit server from the server array for use, and combines the unit servers into a server combined unit in a specific form. This combination is typically based on the task type and the required resource configuration to maximize the utilization of server resources and achieve efficient completion of the task.

In a server farm, communication and data transfer between servers of the farm also requires efficient management and scheduling. The task is generally divided into a plurality of subtasks by adopting a distributed computing and parallel processing technology, and the subtasks are processed in parallel by different unit servers, and finally, the processing results are summarized, so that the computing efficiency and the throughput are improved.

In addition, in order to further improve the utilization rate of the server, a virtualization technology may be further adopted to divide the physical server into a plurality of virtual servers, so as to achieve more flexible and efficient resource management and utilization.

In summary, the high-utilization AI server realizes the maximum utilization of the server resources through the combination and the efficient management of the plurality of unit servers, thereby improving the performance and the throughput of the server.

Example 2

On the basis of the above embodiment, each unit server in the server array is provided with a server ID; each unit server comprises a mode distributor, when the server combination unit acquires the use authority of the unit server from the server array according to the target task type, the mode distributor in the unit server with acquired authority can assign the working mode of the unit server according to the target task type in real time.

Specifically, the server ID marks the order of each unit server in the server, and when acquired, the unit servers are acquired from the server array in order.

Each cell server is provided with a server ID, which is typically used to identify the server's uniqueness and identity so that the server array can accurately identify and manage each cell server.

Each cell server further comprises a mode allocator which is operative to assign a corresponding mode of operation to the cell server in dependence upon the target task type. This mode of operation typically includes different hardware and software configurations to meet the needs and requirements of different tasks.

When the server combination unit acquires the usage rights of the unit servers from the server array, the mode allocator designates the working mode of the unit servers according to the target task type in real time. This mode of operation typically includes hardware configuration such as CPU, GPU, memory, etc., and software configuration such as operating system, programming language, etc. The mode distributor can automatically select and adjust the working mode of the unit server according to the requirements and the demands of the tasks so as to meet the requirements and the demands of the tasks to the greatest extent and ensure the efficient utilization of the server resources.

For example, in performing an image recognition task, the pattern allocator may assign the cell server to use the GPU for performing acceleration calculations to increase the speed and accuracy of image recognition; when the data analysis task is carried out, the mode distributor can assign a large memory to be used for storage and processing for the unit server so as to meet the requirement of the data analysis task on the memory.

Example 3

On the basis of the above embodiment, each working mode of each unit server at least corresponds to one specific form combination; the number of categories of the specific form combination of the unit servers is greater than or equal to the number of categories of the working mode of each unit server.

Specifically, the specific morphological combinations correspond to the target task types. In practice, the combined morphology of these cell servers will significantly impact the efficiency of performing each task, depending on the type of target task. If the parallel requirements are higher, the unit servers are symmetrically parallel, so that the task execution efficiency can be improved. When some tasks are executed, the data stream needs to be processed at present and then processed in parallel, and at the moment, a single unit server is needed, and then the unit servers are connected in parallel to improve the efficiency.

In the high-utilization AI server, each element server can support a plurality of different operating modes, which are typically selected and configured according to task type and required resource configuration. Each working mode at least corresponds to one specific form combination, so that the server combination unit can select the corresponding specific form combination according to the task type and the required resource configuration, and the efficient utilization and management of the server resources are realized.

For example, when performing deep learning tasks, one unit server may support CPU, GPU, FPGA and other different working modes, where each working mode corresponds to different hardware and software configurations, so as to meet the computing requirements of different deep learning models. Each working mode at least corresponds to a specific form combination, such as collaborative calculation by using a plurality of CPUs and GPUs, or acceleration calculation by using an FPGA, and the like.

It should be noted that the number of categories of the specific form combination of the unit servers is greater than or equal to the number of categories of the operation mode of each unit server. This is because a particular modality combination can generally support multiple different modes of operation simultaneously to meet the needs and requirements of different tasks. For example, a specific form combination can simultaneously support different working modes such as a CPU, a GPU and an FPGA, so that the server combination unit can flexibly select and adjust the working modes according to task types and required resource configuration, and the requirements and demands of tasks can be met to the greatest extent.

In summary, each operation mode of each unit server corresponds to at least one specific form combination, and the number of categories of the specific form combination of the unit server is greater than or equal to the number of categories of the operation mode of each unit server. The design can utilize server resources to the greatest extent, and achieve the purposes of high performance and high throughput.

Example 4

On the basis of the above embodiment, when the server combination unit performs specific form combination on the unit servers, according to the target task types, a specified number of unit servers are sequentially obtained from the server array according to the sequence of the server IDs, then, according to the target task types, the obtained unit servers are subjected to specific form combination to obtain a unit server combination, and in the operation process of the unit server combination, each unit server performs a corresponding working mode according to the target task types.

In particular, different modes of operation will be used for different types of target tasks. The parameters of each working mode can improve the execution efficiency of the task according to different task types.

In the AI server with high utilization, when the server combination unit performs specific form combination on the unit servers, a specified number of unit servers are sequentially obtained from the server array according to the target task type and the sequence of the server IDs. The obtained unit servers can be combined in a specific form according to the target task types so as to meet the requirements and demands of the tasks.

Specifically, the server combination unit generally calculates the number of required unit servers and the working mode of each unit server according to the task type and the required resource configuration, and then sequentially obtains the designated number of unit servers from the server array according to the sequence of the server IDs so as to meet the requirements and demands of the task. The obtained unit servers can be combined in a specific form according to the target task types, so that efficient utilization and management of server resources are realized.

For example, in performing an image recognition task, the server farm may first calculate the number of cell servers required and the mode of operation of each cell server, such as using multiple GPUs to accelerate the computation. And then sequentially acquiring the specified number of GPU servers from the server array according to the sequence of the server IDs, and combining the specific forms to realize the maximum utilization of GPU resources.

In the unit server combined operation process, each unit server executes a corresponding working mode according to the target task type so as to realize efficient completion of the task. The unit servers are typically interconnected through a high-speed network, such as ethernet, infiniBand, etc., to achieve high-speed data transmission and communication, thereby further improving server performance and throughput.

Example 5

In accordance with the above embodiment, each unit server further includes an equalizer; in the unit server combination operation process, an equalizer of each unit server forms an equalizing network to equalize the resources of each unit server.

In the high-utilization AI server, each unit server further comprises an equalizer, and the equalizer is used for equalizing the resources of each unit server in the unit server combination operation process so as to realize more efficient task processing and resource utilization.

In particular, each cell server in a cell server assembly may perform different tasks, which typically require different computing resources, such as CPU, GPU, memory, etc. The equalizer can monitor the resource utilization condition of each unit server and perform resource scheduling and equalization according to the requirement so as to ensure the equalization and maximization of the resource utilization of each unit server.

In the combined operation process of the unit servers, an equalizer of each unit server forms an equalizing network, and the resources of each unit server are equalized through message transmission and cooperation in the network. For example, when the CPU utilization of a certain unit server is too high, the equalizer may request CPU resources from other idle unit servers through network messages, and allocate the requested resources to required tasks for processing, so as to achieve balanced and maximum utilization of the CPU resources.

It should be noted that the equalizer needs to consider multiple factors for resource scheduling and balancing, such as task types, resource requirements, network loads, etc., to implement optimal scheduling and balancing of unit server resources. In addition, in the unit server combination, the equalizer needs to consider the transmission and communication efficiency of data, so as to realize efficient data transmission and communication, thereby further improving the performance and throughput of the server.

Example 6

A method for efficient allocation of resources to AI servers, said method comprising the steps of:

step 1: obtaining a target task type;

Specifically, the AI server is realized through a server array constructed by unit servers, unlike the traditional server, the server array is similar to a matrix, each unit server is an element in the array, when a target task is executed, the unit servers are sequentially acquired from the server array, and then the unit servers are morphologically integrated, so that the whole server resource is integrated into zero, the server resource can be utilized more efficiently and in a fragmentation way, and meanwhile, the morphological integration can be more suitable for different target task types, and the task execution efficiency is higher.

Example 7

On the basis of the above embodiment, in the step 5, in the process of executing the target task, the method for balancing the resources of each unit server in the unit server combination includes: setting an equalizer in each unit server, wherein when the unit servers are combined to form a unit server combination, the equalizer of each unit server and other unit servers in the unit server combination are mutually connected to form an equalizing network, and each equalizer is a node in the equalizing network; in the process of executing the target task by the unit server combination, the balance network dynamically adjusts the resource utilization in the unit server combination to finish the resource balance.

In the step 5, in the process of executing the target task, the method for balancing the resources of each unit server in the unit server combination generally includes the following two key steps:

an equalizer is set in each unit server to form an equalizing network.

In this step, in order to implement balancing of resources of each unit server in the unit server combination, each unit server is provided with an equalizer, and when the unit servers are combined to form the unit server combination, the equalizer of each unit server and other unit servers in the unit server combination are interconnected to form an equalizing network, and each equalizer is a node in the equalizing network. The equalizer is typically a software component that monitors the resource utilization of each cell server and performs resource scheduling and equalization as needed.

In the process of executing the target task by the unit server combination, the balance network dynamically adjusts the resource utilization in the unit server combination to finish the resource balance.

In this step, the balancing network monitors the resource utilization of each unit server and performs resource scheduling and balancing as needed to ensure that the resource utilization of each unit server is balanced and maximized. For example, when the CPU utilization of a certain unit server is too high, the equalizer may request CPU resources from other idle unit servers through network messages, and allocate the requested resources to required tasks for processing, so as to achieve balanced and maximum utilization of the CPU resources. The balancing network generally performs resource scheduling and balancing by considering a plurality of factors, such as task types, resource demands, network loads, and the like, so as to implement optimal scheduling and balancing of unit server resources.

It should be noted that the equalization network needs to consider the transmission and communication efficiency of data to achieve efficient data transmission and communication, so as to further improve the performance and throughput of the server. In addition, the balancing network needs to consider security and reliability to ensure the operation stability and security of the server.

Example 8

On the basis of the above embodiment, the parameters included in the working mode of each unit server are: the method comprises the steps of an upper limit of the resource utilization rate of a unit server, a lower limit of the resource utilization rate of the unit server, an initial resource utilization rate of the resource of the unit server and an instantaneous change rate of the resource of the unit server.

Specifically, if the resource allocation can be more refined, more accurate and efficient allocation can be realized if the resource allocation is more refined when the resource allocation is a factor affecting the accuracy of the resource allocation.

In the high-efficiency AI server, the parameters included in the operation mode of each unit server are an upper limit of the resource utilization rate of the unit server, a lower limit of the resource utilization rate of the unit server, an initial resource utilization rate of the unit server, and an instantaneous change rate of the resource of the unit server. These parameters are used to control and adjust the resource utilization of the element servers to achieve optimal utilization and management of the element servers.

Specifically, the upper and lower limits of the resource utilization of the unit server are used to control the resource utilization range of the unit server to avoid the situations of resource waste and over-utilization. The initial utilization rate of the resources is used for determining the initial state of the unit server, and the instantaneous change rate of the resources is used for monitoring and adjusting the change condition of the utilization of the resources of the unit server so as to realize the dynamic management and scheduling of the unit server.

Meanwhile, in order to further improve the accuracy and efficiency of resource allocation, the invention adopts secondary allocation for server resources. In particular, within each cell server group, resource allocation and scheduling is also performed once so that each cell server group can operate at the highest efficiency. This secondary allocation process typically takes into account factors such as task type, server resource requirements, network load, etc. to achieve optimal utilization and scheduling of unit server resources, thereby further improving server performance and throughput.

In summary, in the high-efficiency AI server, the working mode of each unit server includes an upper limit of the resource utilization rate of the unit server, a lower limit of the resource utilization rate of the unit server, an initial resource utilization rate of the resource of the unit server, and an instantaneous change rate of the resource of the unit server, which are used for controlling and adjusting the resource utilization condition of the unit server, so as to realize the optimal utilization and management of the unit server. Meanwhile, secondary allocation for server resources is adopted, so that accuracy and efficiency are further improved.

Example 9

On the basis of the above embodiment, the method for dynamically adjusting the resource utilization in the unit server combination by the balancing network to complete the resource balancing includes: the equalizing network acquires the parameters of each unit server in the current working mode in real time, acquires the real-time resource utilization rate of each unit server in real time, and calculates the resource utilization weight value by using the following formula:

The resource utilization rate upper limit of the unit server is set; />

The resource utilization rate lower limit of the unit server; />

The real-time resource utilization rate of the unit server is realized; />

The instantaneous rate of change of the resources for the cell server; />

The weight is used as a resource utilization weight; />

Example 10

Based on the above embodiment, the balancing network continuously adjusts the real-time resource utilization rate of each unit server according to the real-time calculated resource utilization weight, so that the resource utilization weight is continuously increased, and the method for completing the resource balancing comprises the following steps: recording the real-time calculated resource utilization weight, drawing a change curve of the resource utilization weight under a two-dimensional coordinate system, acquiring the slope of the curve at each moment according to the drawn change curve, and when the slope is the corresponding resource utilization weight changing from a positive value to a negative value, keeping the real-time resource utilization rate of each unit server under the corresponding resource utilization weight as the maximum value of the resource utilization weight, thereby completing the resource balance.

In the high-efficiency AI server, the balancing network continuously adjusts the real-time resource utilization rate of each unit server according to the real-time calculated resource utilization weight, so that the resource utilization weight is continuously increased, and the method for completing resource balancing adopts a self-adaptive adjustment method based on the slope of a change curve. Specifically, the balancing network records the resource utilization weight calculated in real time, and draws a change curve of the resource utilization weight under a two-dimensional coordinate system. And then, according to the drawn change curve, acquiring the slope of the curve at each moment, finding a corresponding resource utilization weight when the slope changes from a positive value to a negative value, and setting the weight as the maximum value of the resource utilization weight. And finally, maintaining the real-time resource utilization rate of each unit server under the corresponding resource utilization weight to finish the resource balance.

Specifically, when the resource utilization in the unit server combination is unbalanced, the balancing network continuously adjusts the real-time resource utilization of each unit server to increase the resource utilization weight as much as possible. At different moments, the change curve of the resource utilization weight will show different slopes, wherein the part with the larger slope indicates that the change speed of the resource utilization rate is faster, and the part with the smaller slope indicates that the change speed of the resource utilization rate is slower. Therefore, an optimal resource utilization weight can be found at the slope change of the resource utilization weight change curve, so that the resource utilization rate under the weight is balanced and efficient. Specifically, when the slope changes from a positive value to a negative value, this means that the rate of change of the resource utilization rate begins to slow, and at this time, the corresponding resource utilization weight is the optimal resource utilization weight, so that the real-time resource utilization rate of each unit server under the weight can be maintained, so as to complete resource balancing.

It should be noted that, in the system provided in the foregoing embodiment, only the division of the foregoing functional units is illustrated, in practical application, the foregoing functional allocation may be performed by different functional units, that is, the units or steps in the embodiment of the present invention are further decomposed or combined, for example, the units in the foregoing embodiment may be combined into one unit, or may be further split into multiple sub-units, so as to complete all or the functions of the units described above. The names of the units and the steps related to the embodiment of the invention are only used for distinguishing the units or the steps, and are not to be construed as undue limitation of the invention.

It will be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working process of the storage device and the processing device described above and the related description may refer to the corresponding process in the foregoing method embodiment, which is not repeated herein.

Those of skill in the art will appreciate that the various illustrative elements, method steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the program(s) corresponding to the software elements, method steps may be embodied in Random Access Memory (RAM), memory, read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, removable disk, CD-ROM, or any other form of storage medium known in the art. To clearly illustrate this interchangeability of electronic hardware and software, various illustrative components and steps have been described above generally in terms of their functionality. Whether such functionality is implemented as electronic hardware or software depends upon the particular application and design constraints imposed on the solution. Those skilled in the art may implement the described functionality using different approaches for each particular application, but such implementation is not intended to be limiting.

The terms "first," "another portion," and the like, are used for distinguishing between similar objects and not for describing a particular sequential or chronological order.

The terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or unit/apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or unit/apparatus.

Thus far, the technical solution of the present invention has been described in connection with the preferred embodiments shown in the drawings, but it is easily understood by those skilled in the art that the scope of protection of the present invention is not limited to these specific embodiments. Equivalent modifications and substitutions for related art marks may be made by those skilled in the art without departing from the principles of the present invention, and such modifications and substitutions will fall within the scope of the present invention.

The foregoing description is only of the preferred embodiments of the present invention and is not intended to limit the scope of the present invention.

Claims

1. A high-utilization AI server, the server comprising: the system comprises a server array formed by a plurality of unit servers which can be mutually connected, a server combination unit for acquiring the use permission of the unit servers from the server array according to the type of the target task, and a specific form combination server combination unit for the unit servers acquiring the use permission.

2. The high-utilization AI server of claim 1, wherein each element server in the server array is provided with a server ID; each unit server comprises a mode distributor, when the server combination unit acquires the use authority of the unit server from the server array according to the target task type, the mode distributor in the unit server with acquired authority can assign the working mode of the unit server according to the target task type in real time.

3. The high-utilization AI server of claim 2, wherein each of the operation modes of each of the cell servers corresponds to at least one specific morphology combination; the number of categories of the specific form combination of the unit servers is greater than or equal to the number of categories of the working mode of each unit server.

4. The AI server with high utilization ratio according to claim 3, wherein when the server combination unit performs specific form combination on the unit servers, according to the target task types, a specified number of unit servers are sequentially obtained from the server array according to the sequence of the server IDs, then the obtained unit servers are subjected to specific form combination according to the target task types to obtain a unit server combination, and each unit server performs a corresponding working mode according to the target task types in the operation process of the unit server combination.

5. The high utilization AI server of claim 4, wherein each cell server further comprises an equalizer; in the unit server combination operation process, an equalizer of each unit server forms an equalizing network to equalize the resources of each unit server.

6. A resource allocation method for the high-efficiency AI server of one of claims 1 to 5, characterized in that the method performs the steps of:

step 1: obtaining a target task type;

7. The method of claim 6, wherein the step 5 of balancing the resources of each element server in the set of element servers during the execution of the target task comprises: setting an equalizer in each unit server, wherein when the unit servers are combined to form a unit server combination, the equalizer of each unit server and other unit servers in the unit server combination are mutually connected to form an equalizing network, and each equalizer is a node in the equalizing network; in the process of executing the target task by the unit server combination, the balance network dynamically adjusts the resource utilization in the unit server combination to finish the resource balance.

8. The method of claim 7, wherein the parameters included in the operation mode of each cell server are: the method comprises the steps of an upper limit of the resource utilization rate of a unit server, a lower limit of the resource utilization rate of the unit server, an initial resource utilization rate of the resource of the unit server and an instantaneous change rate of the resource of the unit server.

9. The method of claim 8, wherein the balancing network dynamically adjusts resource utilization in the cell server assembly, and wherein the method for performing resource balancing comprises: the equalizing network acquires the parameters of each unit server in the current working mode in real time, acquires the real-time resource utilization rate of each unit server in real time, and calculates the resource utilization weight value by using the following formula:

The resource utilization rate upper limit of the unit server is set; />

The resource utilization rate lower limit of the unit server; />

The real-time resource utilization rate of the unit server is realized; />

The instantaneous rate of change of the resources for the cell server;

the weight is used as a resource utilization weight; />

10. The method of claim 9, wherein the balancing network continuously adjusts the real-time resource utilization of each cell server according to the real-time calculated resource utilization weights such that the resource utilization weights are continuously increased, the method for accomplishing resource balancing comprising: recording the real-time calculated resource utilization weight, drawing a change curve of the resource utilization weight under a two-dimensional coordinate system, acquiring the slope of the curve at each moment according to the drawn change curve, and when the slope is the corresponding resource utilization weight changing from a positive value to a negative value, keeping the real-time resource utilization rate of each unit server under the corresponding resource utilization weight as the maximum value of the resource utilization weight, thereby completing the resource balance.