CN117632510A

CN117632510A - Service distribution method, chip, electronic device and computer readable storage medium

Info

Publication number: CN117632510A
Application number: CN202311704300.3A
Authority: CN
Inventors: 季永国
Original assignee: TP Link Technologies Co Ltd
Current assignee: TP Link Technologies Co Ltd
Priority date: 2023-12-12
Filing date: 2023-12-12
Publication date: 2024-03-01

Abstract

The application relates to the technical field of computers, and provides a service distribution method, a chip, electronic equipment and a computer readable storage medium, wherein the method comprises the following steps: acquiring pre-calculation force time required by running the service to be distributed on each of a plurality of AI cores; acquiring the current available calculation time length of each AI core in a plurality of AI cores; acquiring a target AI core from a plurality of AI cores, wherein the current available calculation time length of the target AI core meets the pre-calculation time consumption of the service to be distributed; and distributing the service to be distributed to the target AI core. The technical scheme provided by the application can improve the rationality of the server deployment of the service to be distributed and the computing power utilization rate of a plurality of AI cores on the server.

Description

Service distribution method, chip, electronic device and computer readable storage medium

Technical Field

The present disclosure relates to the field of computer technologies, and in particular, to a service distribution method, a chip, an electronic device, and a computer readable storage medium.

Background

The server can be used for performing some high-performance calculation and data processing service distribution tasks, has strong calculation capacity and high-speed data processing service distribution capacity, and can be used for performing complex scientific calculation, data analysis, artificial intelligence training, reasoning and other tasks.

Servers are typically configured with a CPU (Central Processing Unit ) large capacity memory, and a GPU (Graphics Processing Unit, graphics processor) or NPU (Neural Processing Unit, neural network processor) capable of providing parallel computing capabilities to the server. When the processor employs AI (Artificial Intelligence ) technology, it may be referred to as an AI core. Since the calculation power between different AI cores cannot be shared, different tasks or services need to be distributed to different AI cores for processing through reasonable distribution, so that the processing efficiency of the server is improved. In the prior art, different tasks or services are bound with different AI kernels through user operations.

The situation of uneven calculation force distribution or overload calculation force easily occurs through the mode that users bind AI cores to different tasks or services manually.

Disclosure of Invention

The embodiment of the application provides a service distribution method, a device, a chip, electronic equipment and a computer readable storage medium, which can reduce the waste of computing power resources of a server and improve the rationality and computing power utilization rate of computing power distribution of the server.

In a first aspect, an embodiment of the present application provides a service allocation method, including:

Acquiring the pre-computation effort required for the service to be distributed to run on each AI core of the plurality of AI cores is time consuming.

The current available computing power duration of each of the plurality of AI cores is obtained.

And acquiring a target AI core from the plurality of AI cores, wherein the currently available calculation time length of the target AI core meets the pre-calculation time consumption of the service to be distributed.

And distributing the service to be distributed to the target AI core.

In some real-time approaches, the target AI core is the AI core of the plurality of AI cores that has the longest currently available computing power duration.

In some embodiments, the allocating the service to be allocated to the target AI core includes:

determining whether the current available computing power time length of the target AI core is greater than the target pre-computing power time consumption, wherein the target pre-computing power time consumption is the computing power time consumption occupied by running the service to be distributed on the target AI core.

If yes, the service to be distributed is distributed to the target AI core.

In some embodiments, the target pre-calculation force time is a pre-calculation force time.

In some embodiments, the above method further comprises:

and if the currently available computing force duration of the target AI core is less than the target computing force time consumption, acquiring the minimum computing force time consumption required by running the service to be distributed on the target AI core.

And when the currently available computing power time of the target AI core is greater than or equal to the predicted lowest computing power time, distributing the service to be distributed to the target AI core.

In some embodiments, the current available force duration is a current available optimal force duration, and the method further comprises:

and when the current available optimal computing force duration of the target AI core is smaller than the predicted lowest computing force time, acquiring the current available lowest computing force duration of each AI core in the plurality of AI cores.

And selecting one of the multiple AI cores with the longest currently available minimum calculation time duration as a target AI core.

And judging whether the currently available lowest calculation force duration of the target AI core is greater than or equal to the predicted lowest calculation force time consumption.

If yes, the service to be distributed is distributed to the target AI core.

In some embodiments, the acquiring the target AI core from the plurality of AI cores includes:

it is determined whether a current available computing power duration of the first AI core is greater than a pre-computing power time required by the first AI core to operate the service to be allocated, the first AI core being any one of a plurality of AI cores.

If yes, adding the first AI core into the AI core set to be selected.

And screening out a target AI core from the AI core set to be selected, wherein the service with the same service type as the service to be allocated exists in the operated service operated on the target AI core.

In some embodiments, the above method further comprises:

the pre-occupation calculation time length of each AI core in the plurality of AI cores is obtained.

And carrying out risk marking on each of the plurality of AI cores according to the pre-calculated force duration of each AI core, a preset risk threshold and a preset safety threshold, so as to obtain at least one safety AI core and at least one risk AI core, wherein the at least one safety AI core and the at least one risk AI core are the AI cores in the plurality of AI cores, and the preset risk threshold is larger than the preset safety threshold.

And migrating the executed service on the risk AI core to the security AI core.

In some embodiments, the migration of the running service on the risk AI core to the security AI core includes:

the method comprises the steps of obtaining pre-calculated force occupation time of each operated service in a plurality of operated services, wherein the operated services are operated in a first risk AI core, and the first risk AI core is any one of at least one risk AI core.

And screening the service to be migrated from the plurality of operated services according to the pre-calculated time length of each operated service.

And migrating the service to be migrated to a target security AI core, wherein the target security AI core is one of at least one security AI core.

In some embodiments, the above method further comprises:

The difference of the pre-calculated force occupation duration of the first risk AI kernel minus the pre-calculated force occupation duration of the service to be migrated is a first difference.

The first difference is less than a preset risk threshold and greater than the second difference.

The second difference is a difference of the pre-calculated force duration of the first risk AI kernel minus the pre-calculated force duration of other operated services, which are any one of the plurality of operated services operated on the first risk AI kernel different from the service to be migrated.

In some embodiments, the number of at least one security AI core is a plurality, and migrating the service to be migrated to the target security AI core includes:

and acquiring the pre-calculated force duration of each security AI core in the at least one security AI core.

And selecting one with the shortest pre-calculated force duration from the at least one security AI core as a target security AI core.

And migrating the service to be migrated to the target security AI core.

In some embodiments, migrating the service to be migrated to the target secure AI core comprises:

and determining whether the sum of the pre-occupation calculation time of the service to be migrated and the pre-occupation calculation time of the target security AI core is smaller than a preset risk threshold.

If yes, the service to be migrated is migrated to the target security AI core.

If not, returning to execute the step of acquiring the estimated occupied computing time length of each of the plurality of operated services.

In some embodiments, obtaining the pre-calculated force duration for each AI core of the plurality of AI cores comprises:

the method comprises the steps of obtaining the estimated occupied time length of a first AI core, wherein the first AI core is any one of a plurality of AI cores, the estimated occupied time length is a calculated time length obtained by carrying out weighted summation on a plurality of instantaneous occupied time lengths according to a preset weight coefficient.

The plurality of instantaneous occupation calculation time periods are occupation calculation time periods obtained by sampling at a plurality of time points, the plurality of instantaneous occupation calculation time periods and the plurality of time points are in one-to-one correspondence, the plurality of time points are mutually different time points in a preset period, the first instantaneous occupation time period is instantaneous occupation calculation time period obtained by sampling at a first time point, the second instantaneous occupation time period is instantaneous occupation calculation time period obtained by sampling at a second time point, a first weight coefficient corresponding to the first instantaneous occupation calculation time period is larger than a second weight coefficient corresponding to the second instantaneous occupation calculation time period, the first time point is closer to the end time of the preset period than the second time point, and the first time point and the second time point are different time points in the plurality of time points.

In some embodiments, obtaining the pre-calculated force duration of the first AI core includes:

and predicting the predicted occupation calculation time length of the first AI core based on the rule of the historical occupation calculation time length of the first AI core.

Determining whether the pre-calculated occupancy force duration of the first AI core meets a pre-set confidence threshold.

If yes, executing the step of carrying out risk marking on each AI core in the plurality of AI cores according to the pre-calculated time length of the occupied time of each AI core, the pre-set risk threshold value and the pre-set safety threshold value.

If not, executing the step of acquiring the pre-calculated force occupying time length of the first AI core.

In a second aspect, embodiments of the present application provide a chip for performing the method as in any one of the first aspects above.

In a third aspect, embodiments of the present application provide an electronic device comprising a processor and a memory, the processor being configured to execute a computer program stored in the memory to implement a method as in any of the first aspects above. Or,

the electronic device comprises a chip as in the second aspect.

In some embodiments, the electronic device is an AI server on which a plurality of AI cores are disposed.

The chip is any one of a plurality of AI cores. Alternatively, the chip and the plurality of AI cores are different.

In a fourth aspect, embodiments of the present application provide a computer readable storage medium storing a computer program, characterized in that the computer program when executed by a processor implements a method as in any of the first aspects described above.

In the technical scheme provided by the application, the server can obtain the pre-calculation time required by running the service to be distributed on each of the plurality of AI cores and the current available calculation time of each of the plurality of AI cores; and determining the AI core with the currently available calculation time length meeting the pre-calculation time consumption of the service to be distributed as a target AI core, and distributing the service to be distributed to the target AI core. According to the technical scheme, the computing power of each AI core of the server can be reasonably distributed by calculating the computing power time required by running the service to be distributed on the AI cores and the current available computing power time of the AI cores, so that the waste of computing power resources of the server is reduced. The rationality of the server deployment of the service to be distributed and the computing power utilization rate of a plurality of AI cores on the server are improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the following description will briefly introduce the drawings that are needed in the embodiments or the description of the prior art, it is obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a schematic block diagram of a service allocation method according to an embodiment of the present application.

Fig. 2 is a schematic flow chart of a service allocation method according to an embodiment of the present application.

Fig. 3 is another flow chart of a service allocation method according to an embodiment of the present application.

Fig. 4 is a schematic illustration of a migration flow of a service allocation method according to an embodiment of the present application.

Fig. 5 is a schematic diagram of another migration flow of a service allocation method according to an embodiment of the present application.

Fig. 6 is a schematic structural diagram of a service distribution device according to an embodiment of the present application.

Fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system configurations, techniques, etc. in order to provide a thorough understanding of the embodiments of the present application. However, it will be apparent to one skilled in the art that the present application may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present application with unnecessary detail.

It should be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

It should also be understood that the term "and/or" as used in this specification and the appended claims refers to any and all possible combinations of one or more of the associated listed items, and includes such combinations.

As used in this specification and the appended claims, the term "if" may be interpreted as "when..once" or "in response to a determination" or "in response to detection" depending on the context. Similarly, the phrase "if a determination" or "if a [ described condition or event ] is detected" may be interpreted in the context of meaning "upon determination" or "in response to determination" or "upon detection of a [ described condition or event ]" or "in response to detection of a [ described condition or event ]".

In addition, in the description of the present application and the appended claims, the terms "first," "second," "third," and the like are used merely to distinguish between descriptions and are not to be construed as indicating or implying relative importance.

Reference in the specification to "one embodiment" or "some embodiments" or the like means that a particular feature, structure, or characteristic described in connection with the embodiment is included in one or more embodiments of the application. Thus, appearances of the phrases "in one embodiment," "in some embodiments," "in other embodiments," and the like in the specification are not necessarily all referring to the same embodiment, but mean "one or more but not all embodiments" unless expressly specified otherwise. The terms "comprising," "including," "having," and variations thereof mean "including but not limited to," unless expressly specified otherwise.

The cloud server brings great convenience to our life and work, and can be applied to the fields of Internet of things, artificial intelligence, big data, intelligent manufacturing, smart cities and the like. The computing power of the server may be composed of the computing power of multiple AI cores, e.g., the computing power of the video AI server may be composed of the computing power of multiple CPUs, multiple GPUs, and multiple NPUs. Because the calculation power among different AI cores cannot be shared, different tasks or services can be distributed to different AI cores for processing in order to reasonably distribute the calculation power of the server and improve the calculation power utilization rate of the server.

Specifically, the calculation force can be described by a time length occupied by a service running on the AI core in each second, for example, the first service runs on the first AI core, the occupied time length is 300ms, and the second service runs on the second AI core, and the occupied calculation force time length is 280ms; if only the first service is running on the first AI core, the remaining computing power of the first AI core is 700ms, and if only the first service is running on the second AI core, the remaining computing power of the second AI core is 720ms.

In some embodiments, the computational effort may also be described in terms of the number of image frames processed Per Second by the AI kernel, i.e., FPS (Frame Per Second, number of transmission frames Per Second), for example, the computational effort of the first AI kernel is 30FPS and the computational effort of the Second AI kernel is 40FPS.

The present embodiments may be described by way of example using the length of time that a service running on an AI core occupies per second.

In the embodiment of the application, before the service is operated by the AI core, a developer can calculate the calculation power of the AI core first, and the calculation power of a plurality of AI cores can be measured according to the normalized calculation power.

Illustratively, taking the example of running a video AI service, the AI operation logic for each video frame of video data may include: AI model reasoning operation of AI core processing and other business logic operation, wherein the AI model reasoning operation can be IPU (Intelligence Processing Unit, intelligent processor) processing or NPU processing; other business logic operations may be processed by the CPU.

For example, taking an operation video AI service to calculate a face snapshot service algorithm as an example, if the algorithm for calculating the face snapshot service needs two inference models, such as a target detection model and a foreground detection model, and if processing a frame of video needs to operate 1 target detection model+5 foreground detection models, taking an operation video AI computing power test by one of the algorithm platforms as an example, the step of obtaining the AI computing power may be:

the first step: and acquiring videos of N target application scenes as a test video set, wherein the target application scenes can be scenes such as subway ports, company front platforms, intersections and the like, and N is a positive integer.

And a second step of: inputting a first test video into a target inference model (a combined model of a target detection model and a foreground detection model), and adjusting the sampling inference frame rate of the target inference model per second to be a first frame rate to obtain a test result (comprising a face snapshot result, model inference time consumption and total processing time consumption of each frame); the first test video is one of the test video sets, the first frame rate is one of M preset frame rates, and M is a positive integer.

And a third step of: and traversing the test results of the first test video in the target inference model under M frame rates to obtain a test result set.

Fourth step: and traversing the videos of N target application scenes in the test video set to obtain test result sets under N different frame rates.

Fifth step: and analyzing the test result sets under N different frame rates to obtain AI accounting force analysis results (optimal accounting force time and lowest accounting force time).

The step of analyzing the N test result sets may include:

1. and extracting a plurality of preferred frame rates of which the face capturing results of N videos meet the preset results from the test result set under N different frame rates. For example, the plurality of preferred frame rates may include 8FPS, 6FPS, and 5FPS.

2. The lowest frame rate among the plurality of preferred frame rates is taken as the optimal frame rate. For example, in the case where the plurality of frame rates includes 8FPS, 6FPS, and 5FPS, the optimal frame rate is then 5FPS.

3. And extracting a plurality of basic frame rates, in which the result of capturing the face of N videos is only a special scene and cannot meet the preset result, from the test result set under N different frame rates. For example, the plurality of base frame rates may include 4FPS and 3FPS.

4. The lowest frame rate among the plurality of base frame rates is taken as the lowest frame rate. For example, in the case where the plurality of frame rates includes 4FPS and 3FPS, the lowest frame rate is 3FPS.

5. Each averaging process to calculate the individual models is time consuming. For example, if 1 time of the object detection model+5 times of the foreground detection model are required to be executed for processing a frame of video, the average time consumption of each time of processing the object detection model is time1 (10 ms), the average time consumption of each time of processing the object detection model is time2 (5 ms), and the time consumption of each frame of video in model reasoning is time1+5×time 2=35 ms. Since other logic processing is time consuming in addition to model time consuming, it is generally independent of the scene. If average data time3 (10 ms) is obtained based on the test statistical data, the calculation formula of average processing time consumption for processing 1 frame of video face snapshot is as follows: total time (total duration) =time1+5×time2+time3=45 ms.

6. Based on each average processing time consumption, combining the optimal frame rate and the lowest frame rate, the optimal calculation force time consumption and the lowest calculation force time consumption can be obtained. For example, optimal calculation effort time=5×45 ms=225 ms, lowest calculation effort time=3×45=135 ms.

The above example is only one possible method for acquiring the AI accounting force in a possible scenario, and other acquisition methods in other scenarios may be obtained based on the same concept, which is not limited herein.

In the embodiment of the application, by acquiring and measuring the computing power of a plurality of AI cores, the distribution of a plurality of services to be distributed and the migration of a plurality of services running on a plurality of AI cores can be realized according to the computing power information of the plurality of AI cores.

It should be noted that, for different calculation methods and calculation force measurement standards, the calculated calculation force information of the same AI core is different, but whichever calculation method is adopted, the current available calculation force time consumption of the plurality of AI cores under the consent standard can be calculated, and the current available optimal calculation force duration/current available minimum calculation force duration of the plurality of AI cores can be normalized, so as to realize unified distribution of a plurality of services and improve the rationality of calculation force distribution of the server.

In the prior art, the tasks/services to be allocated can be manually allocated to the appropriate AI core for processing according to the calculation force required by the tasks/services. The manual binding may be preceded by matching AI kernels for which the remaining computing power is higher than the computing power required for the task/service to be allocated by the amount of computing power required for the task/service to be allocated.

However, the tasks/services are running in the process where the required computing power is related to the environment and scenario in which the tasks being processed are located. The computational effort required for different tasks/services is different in different scenarios. For example, in the case of a video service, when a video picture is a still picture, the required computation effort is low, and when a scene change occurs in the video picture. As the degree of change of the picture increases, the required calculation force gradually increases. Since the task/service is bound according to the required initial computing power when the corresponding AI core is bound, the computing power occupation of the AI core may generate uneven load as the required computing power increases. In the prior art, under the condition of excessive or overload AI accounting force, the distribution of the accounting force is uneven, so that the utilization rate of the accounting force is not high.

The embodiment of the application provides a service computing power distribution method, which can flexibly distribute computing power of a server, reduce waste of computing power resources of the server and improve rationality and computing power utilization rate of computing power distribution of the server.

As shown in fig. 1, a schematic block diagram of a service computing power distribution method according to an embodiment of the present application may include: a plurality of AI account force statistics modules (e.g., 1 st AI account force statistics module, 2 nd AI account force statistics module … … nth AI account force statistics module), AI server account force statistics module, algorithm configuration input module and AI server account force distribution/migration module.

The AI accounting force statistics modules can be deployed on the AI cores and are used for counting the current available force calculation time length and the current occupied force calculation time length of the AI cores. The plurality of AI accounting force statistics modules can realize the dynamic migration of the AI server accounting force distribution/migration module to a plurality of services deployed on a plurality of AI cores by acquiring the current available accounting force time length and the current occupied accounting force time length and sending the current available accounting force time length and the current occupied accounting force time length to the AI server accounting force statistics module.

The AI server computing power statistics module may be a module which is deployed on the processor alone, or may be a module which is deployed on one AI core together with one of the plurality of computing power statistics modules, and is configured to receive the current available computing power duration and the current occupied computing power duration sent by the plurality of AI computing power statistics modules. The AI server computing power statistics module may further send the received current available computing power time length and the current occupied computing power time length of the plurality of AI cores to the AI server computing power distribution/migration module, so that the AI server computing power distribution/migration module may analyze and calculate the estimated occupied computing power time length of the plurality of AI cores and the estimated occupied computing power time length of each of the plurality of operated services through the current available computing power time length and the current occupied computing power time length, so that the AI server computing power distribution/migration module may implement dynamic migration of the services.

The algorithm configuration input module can be used for receiving a calculation method and a calculation model deployed by a user and forwarding the received information to the AI server calculation force distribution/migration module; the algorithm configuration input module may also be used to receive the service to be distributed and forward the service to be distributed to the AI server computing power distribution/migration module. The algorithm configuration input module is further configured to receive static input information when the plurality of AI cores are deployed and installed for the first time, where the static input information may include static computing power occupation information of the plurality of AI cores, memory resource occupation requirements of the AI server, and/or minimum video resolution requirements of the AI service. When the service distribution is performed for the first time, the AI server computing power distribution/migration module does not need to obtain the current computing power occupation information or the current available computing power duration of the plurality of AI cores through the plurality of AI computing power statistic modules, and only needs to perform the first service distribution through static input information.

The AI server computing power distribution/migration module may be configured to receive a current available computing power duration and a current occupied computing power duration of each AI core of the plurality of AI cores sent by the AI server computing power statistics module; and receiving the service to be distributed input by the algorithm configuration input module, calculating the pre-calculation force time required by the service to be distributed to run on each AI core in the plurality of AI cores, distributing the service to be distributed to the target AI core, and realizing the distribution of the maximum service to be distributed on the basis of maximally meeting the calculation force effect.

The AI server computing power distribution/migration module can dynamically migrate services on AI cores with possible overload computing power, so that the resource load on each AI core is balanced, the AI server computing power distribution/migration module can realize statistics and management mechanisms of the occupation of the computing power of the AI algorithm, and the computing power resource utilization rate of each AI core in the server is maximized.

The service distribution method provided by the embodiment of the application can realize the distribution and migration of the service, namely at least the functions of service distribution and service dynamic migration.

The workflow of each module in the first service distribution stage may be: the algorithm configuration input module is responsive to AI algorithm enabling information entered by a user, wherein the AI algorithm enabling information may include a type of algorithm service, video channel information, such as video resolution and/or highest frame rate information that can be provided, a guard time, such as a time period during which the algorithm is running. The received AI algorithm enabling information is sent to an AI server computing power distribution/migration module. The AI server computing power distribution/migration module distributes the service corresponding to the AI algorithm enabling information to the corresponding target AI core based on the received AI algorithm enabling information.

The workflow of each module may be: the algorithm configuration input module responds to the AI algorithm enabling information input by the user, and sends the received algorithm enabling information to the AI server computing power distribution/migration module; the AI server computing power distribution/migration module distributes the service corresponding to the AI algorithm enabling information to the corresponding target AI core based on the received AI algorithm enabling information and combined with the current available computing power duration of each AI core sent by the AI server computing power statistics module.

In the service dynamic migration stage, the workflow of each module can be: the AI server computing power distribution/migration module obtains the current available computing power time and the current occupied computing power time of a plurality of AI cores sent by the AI server computing power statistics module at fixed time, and the current available computing power time and the current occupied computing power time history record of the plurality of AI cores, calculates the estimated occupied computing power time of the plurality of AI cores and the estimated occupied computing power time of each operated service in the plurality of operated services, so that the AI server computing power distribution/migration module can realize the dynamic migration of the service.

As shown in fig. 2, a flowchart of a service allocation method according to an embodiment of the present application is provided, where the method may include the following steps:

Step S201: acquiring the pre-computation effort required for the service to be distributed to run on each AI core of the plurality of AI cores is time consuming.

In this embodiment of the present application, the plurality of AI cores may be a plurality of AI cores with the same class, or may be a plurality of AI cores with different classes, which is not limited herein.

For example, the server may be configured with 3 CPUs, 1 GPU and 2 NPUs, with the multiple AI cores comprising: 3 CPU AI cores, 1 GPU AI core and 2 NPU AI cores.

Each AI core may be configured with an AI-accounting-force statistics module, with the plurality of AI-accounting-force statistics modules on the plurality of AI cores for counting the computing forces of the plurality of AI cores.

Since the classes of the multiple AI cores may not be identical and the processing power varies, the pre-computation effort time required for running the same service to be allocated on the multiple AI cores may not be identical. That is, since the computing power of a plurality of AI cores is different, the pre-computing power required for running the same service to be allocated on different AI cores is time-consuming.

Taking one service to be allocated as an example, the computational effort on a first AI core may take 200ms and the computational effort on a second AI core with slightly more processing power may take 150ms.

According to the method and the device for calculating the power consumption of the service to be distributed on the different AI cores, the power calculation resources of the AI cores can be more reasonably and fully utilized, and the rationality of power calculation resource distribution is improved.

Step S202: the current available computing power duration of each of the plurality of AI cores is obtained.

In the embodiment of the application, the plurality of AI accounting force statistics modules may calculate the current available force duration of the plurality of AI kernels. The current available computing time length of the AI core may be a difference value between a total computing time length of the AI core and a computing time length occupied by a service already running in the AI core. The plurality of AI accounting force statistics modules may send the obtained current available force duration of the plurality of AI cores to the AI server force statistics module.

Step S203: and acquiring a target AI core from the plurality of AI cores, wherein the currently available calculation time length of the target AI core meets the pre-calculation time consumption of the service to be distributed.

In this embodiment of the present invention, the target AI core may be an AI core with the longest current available computing time length among the plurality of AI cores, and after receiving the current available computing time lengths of the plurality of AI cores, the AI server computing time length calculating module may send the current available computing time lengths of the plurality of AI cores to the AI server computing time length distributing/migrating module, so that the AI server computing time length distributing/migrating module may use the AI core with the longest available computing time length among the plurality of AI cores as the target AI core, and calculate whether the current available computing time length of the target AI core is greater than the pre-computing time length of the target AI core, where the pre-computing time length is the computing time length occupied by running on the target AI core for the service to be distributed.

If the current available calculation force time of the target AI core is longer than the pre-calculation force time of the target AI core, step S204 is performed.

If the currently available computing power duration of the target AI core is less than or equal to the pre-computing power time consumption of the target AI core, step S201 is re-executed to obtain the pre-computing power time consumption required by the service to be distributed to run on each AI core of the plurality of AI cores. In this embodiment of the present application, the current available computing time length on the plurality of AI cores is not constant, so when the service to be allocated fails to be allocated, step S201 may be continuously executed, and whether there is a target AI core that can be allocated at the next moment is detected until the service to be allocated is allocated successfully.

Step S204: and distributing the service to be distributed to the target AI core.

When the current available computing power time of the target AI core is longer than the target pre-computing power time, the server distributes the service to be distributed to the target AI core for operation through the AI server computing power distribution/migration module.

In this embodiment of the present application, the currently available computing force duration of the plurality of AI kernels may be the currently available optimal computing force duration. One or more services which are the same or different are operated on each AI core, and when the services on each AI core are operated in the optimal state, the current remained calculated force duration of each AI core is the current available optimal force duration of each AI core. The target AI core may be the AI core of the plurality of AI cores having the longest duration of best available force.

The target pre-computation force time consuming may be a pre-computation force time consuming occupied when the service to be allocated runs on the target AI core. The predicted optimal computation time may be the computation time required in the optimal operation state when the service to be allocated is running on the target AI core.

For example, taking a video AI server as an example, when a service to be allocated is an image collected by a monitoring camera, and the service to be allocated runs on a first AI core with a time consuming for optimal computing power, the AI server can capture all face data of the image, if ten faces are in a current picture, the AI server can capture all ten faces; when the service to be distributed runs on the first AI core with non-optimal computing power time consumption, the face data collected by the monitoring camera cannot be completely captured by the AI server, if ten faces exist in the current picture, only five faces can be captured by the AI server.

In the embodiment of the present application, if the current available optimal calculation force duration of the target AI core is greater than or equal to the required optimal calculation force duration when the service to be allocated runs on the target AI core, the server allocates the service to be allocated to the target AI core for running. The server may mark the target AI core as a power sufficient state.

For example, there are five AI kernels: when the first AI core, the second AI core, the third AI core, the fourth AI core and the fifth AI core, if the currently available optimal force duration of the five AI cores is respectively 100ms, 150ms, 200ms, 250ms and 300ms, the target AI core is the fifth AI core. If the predicted optimal calculation time of the service to be distributed is 100ms, the currently available optimal calculation time duration of the fifth AI core is 300ms & gtthe predicted optimal calculation time duration of 100ms, and the server distributes the service to be distributed to the fifth AI core for operation.

If the current available optimal computing time length of the target AI core is smaller than the required optimal computing time length of the service to be distributed when the service to be distributed runs on the target AI core, the server obtains the required optimal computing time length of the service to be distributed when the service to be distributed runs on the target AI core.

The predicted lowest computation effort time may be the lowest computation effort time required to ensure a normal operating state when the service to be allocated is running on the target AI core.

For example, there are five AI kernels: when the first AI core, the second AI core, the third AI core, the fourth AI core and the fifth AI core, if the currently available optimal force duration of the five AI cores is respectively 100ms, 150ms, 200ms, 250ms and 300ms, the target AI core is the fifth AI core. If the predicted optimal calculation time of the service to be allocated is 400ms, the currently available optimal calculation time of the fifth AI core is 300ms less than the predicted optimal calculation time of 400ms, which means that if the service to be allocated cannot be allocated according to the predicted optimal calculation time, the server acquires the predicted minimum calculation time required by continuing to run the service to be allocated on the fifth AI core.

And if the current available optimal calculation force time of the target AI core is greater than or equal to the predicted minimum calculation force time, distributing the service to be distributed to the target AI core. The service to be distributed can then run in a least power-consuming state on the target AI core. The server may mark the target AI kernel as a power saturation state.

If the current available optimal computing force time length of the target AI core is smaller than the predicted lowest computing force time length, the current available lowest computing force time length of each AI core in the plurality of AI cores is acquired.

When the deployed service on each AI core runs with the lowest calculation power, the current remaining calculation power duration of the plurality of AI cores is the current available lowest calculation power duration. The target AI core may be the AI core of the plurality of AI cores having the longest currently available least power duration.

And if the currently available lowest calculation force time length of the target AI core is greater than or equal to the predicted lowest calculation force time consumption, distributing the service to be distributed to the target AI core. The service to be distributed can then run in a least power-consuming state on the target AI core. The server may mark the target AI kernel as a power saturation state.

For example, there are five AI kernels: when the first AI core, the second AI core, the third AI core, the fourth AI core and the fifth AI core, if the currently available minimum calculation time lengths of the five AI cores are respectively 100ms, 150ms, 200ms, 300ms and 250ms, the target AI core is the fourth AI core. If the predicted minimum calculation time of the service to be distributed is 280ms, the currently available minimum calculation time of the fourth AI core is 300ms > the predicted minimum calculation time of 280ms, and the server distributes the service to be distributed to the fourth AI core.

If the currently available lowest computation force duration of the target AI core is less than the predicted lowest computation force time, the service to be distributed fails to be distributed.

In this embodiment of the present application, after the service to be allocated fails to be allocated, the server may execute the above method flow once every preset time until the service to be allocated is allocated successfully. According to the technical scheme provided by the application, the current available computing power of the AI cores can be dynamically obtained, and the computing power utilization rate of the AI cores in the server is improved.

According to the method and the system, the current available optimal calculation time length and the current available minimum calculation time length of the AI core are cited, and the optimal calculation time and the minimum calculation time of the service to be distributed are calculated, namely, the optimal calculation time and the minimum calculation time are introduced, so that the calculation effect of each service can be met to the maximum degree on the basis of ensuring the normal operation of each service, and more AI services are migrated. The adding and distributing mechanism of the AI service can reduce idle waste of the computing power resources, thereby improving the computing power resource utilization rate of a plurality of AI cores.

In some embodiments, the target AI core may also be one of a plurality of AI cores, where the currently available computing power is longer than the pre-computing power required by the service to be allocated, and the deployed plurality of executed services have AI cores with the same service type as the service type to be allocated.

For example, if there are M AI kernels with a current available computing power duration longer than the pre-computing power required by the service to be allocated, adding the M AI kernels into the set of AI kernels to be selected; if the to-be-selected AI core set has N AI cores, the AI cores with the same service type as the to-be-allocated service exist in the operated services operated on the N AI cores, and the target AI core can be the AI core with the longest currently available calculation time length in the N AI cores.

For example, there are five AI kernels: when the first AI core, the second AI core, the third AI core, the fourth AI core and the fifth AI core are used, if the currently available calculation time of the five AI cores is 100ms, 150ms, 200ms, 300ms and 250ms respectively, and the pre-calculation time of the allocation service is 180ms, the server can add the third AI core, the fourth AI core and the fifth AI core into the set of to-be-selected AI cores; if the service with the same type as the service to be distributed exists in the operated services operated on the third AI core and the fourth AI core in the set of the AI cores to be selected, the fourth AI core can be used as the target AI core.

In the embodiment of the application, since the services of the same type can share part of memory resources, the services of the same type are deployed in the same AI core for processing, so that the consumption of the memory resources can be reduced, and the memory resources are saved.

In the technical scheme provided by the application, the server can obtain the pre-calculation time required by running the service to be distributed on each of the plurality of AI cores and the current available calculation time of each of the plurality of AI cores; and determining the AI core with the currently available calculation time length meeting the pre-calculation time consumption of the service to be distributed as a target AI core, and distributing the service to be distributed to the target AI core. According to the technical scheme, the calculation force time required by running the service to be distributed on the AI cores and the current available calculation force time of the AI cores can be calculated, and the calculation force of each AI core of the server can be reasonably distributed. The method comprises the steps of sequentially calculating the time consumption of the optimal computing force of a target AI core under the current available optimal computing force time, the time consumption of the optimal computing force of a service to be distributed, and the time consumption of the optimal computing force of the target AI core under the current available minimum computing force time, sequentially decrementing the distribution principle, so that the computing force requirement of the service to be distributed is met to the greatest extent, a plurality of services deployed on a plurality of AI cores can fully operate, the effect of an algorithm is ensured, the computing force waste of the plurality of AI cores is avoided, and the rationality of deploying the service to be distributed by a server and the computing force utilization rate of the plurality of AI cores on the server are improved.

For example, as shown in fig. 3, another flow chart of a service allocation method provided in an embodiment of the present application may include the following steps:

step S301: the pre-optimal and pre-minimal computing effort time required for the service to be distributed to run on each of the plurality of AI cores is obtained.

Step S302: and acquiring the current available optimal force duration of each AI core in the plurality of AI cores, and determining the AI core with the longest current available optimal force duration as the target AI core.

Step S303: and judging whether the currently available optimal calculation force duration of the target AI core is longer than the predicted optimal calculation force time.

If yes, step S304 is executed to allocate the service to be allocated to the target AI core.

If not, step S305 determines whether the currently available optimal computing force duration of the target AI kernel is greater than the predicted minimum computing force duration.

Step S304: and distributing the service to be distributed to the target AI core, and prompting sufficient calculation information.

In the embodiment of the application, after the service to be allocated is allocated according to the currently available optimal calculation force duration and the predicted optimal calculation force time consumption of the target AI core, the server prompts enough calculation force information, and the enough calculation force information is used for indicating that the target AI core is in a sufficient calculation force state currently.

Step S305: and judging whether the currently available optimal calculation force duration of the target AI core is longer than the predicted lowest calculation force time.

If yes, step S306 is executed to allocate the service to be allocated to the target AI core.

If not, step S307 obtains the current available minimum calculation time length of each of the plurality of AI cores, and determines the AI core with the longest current available minimum calculation time length as the target AI core.

Step S306: and distributing the service to be distributed to the target AI core, and prompting calculation power saturation information.

In the embodiment of the application, after the service to be allocated is allocated according to the currently available optimal calculation force duration and the predicted lowest calculation force time consumption of the target AI core, the server prompts calculation force saturation information, and the calculation force saturation information is used for indicating that the target AI core is currently in a calculation force saturation state.

Step S307: and acquiring the current available minimum calculation time length of each AI core in the plurality of AI cores, and determining the AI core with the longest current available minimum calculation time length as the target AI core.

Step S308: and judging whether the currently available lowest calculation force duration of the target AI core is longer than the predicted lowest calculation force time.

If not, step S309 prompts allocation failure information.

Step S309: and prompting the allocation failure information.

In this embodiment of the present application, after the service to be allocated fails to be allocated, the server may execute the method flows provided in steps S301 to S309 once every preset time until the service to be allocated is allocated successfully.

The specific process of the method flow provided in the above steps S301 to S309 may refer to the descriptions of the above steps S201 to S204, which are not described herein.

In the technical solution provided in the embodiment of the present application, in the process of executing the foregoing method flow steps S201 to S204, the pre-calculated time length of the next time period of each AI core may be pre-calculated at intervals of a preset time period, and a part of services in the AI core whose pre-calculated time length exceeds a preset risk threshold may be migrated to the AI core whose pre-calculated time length is lower than a preset safety threshold. The condition of excessive calculation force in the AI core operation process is avoided, and the operation efficiency of the server is improved.

As shown in fig. 4, a schematic illustration of a power migration flow of a service allocation method according to an embodiment of the present application, the power migration method may include the following steps:

step S401: the pre-occupation calculation time length of each AI core in the plurality of AI cores is obtained.

In the technical solution provided in the present application, taking a first AI core of any one AI core of a plurality of AI cores as an example, a method for the first AI core to obtain a pre-calculated force duration may be:

firstly, based on the rule of the historical occupation calculation time length of the first AI core, predicting the predicted occupation calculation time length of the next time period of the first AI core, and acquiring a preset confidence threshold of the first AI core.

In this embodiment of the present application, the pre-occupation calculation duration of the next time period of the first AI core may be calculated based on the historical statistical rule of the first AI core. The preset confidence threshold of the first AI core may be determined based on a similarity of historical occupancy calculation time periods within a same time period of the first AI core history. Specifically, a statistical algorithm may be used to calculate a confidence value, where the smaller the calculated variance value, the smaller the fluctuation of the historical occupation calculation time length of the first AI core is, and the more reliable the preset confidence threshold of the first AI core is.

For example, the method for obtaining the preset confidence threshold of the first AI kernel may specifically be:

1. and acquiring a plurality of previous calculation forces within a preset time range and a certain time period before the current moment.

For example, the preset time range may be seven days or eight days, and the certain period of time may be a period of time of one hour from the current time. If the current time is 8 points, the preset time range is seven days, and a certain time period is before the current time, namely 7 days before the current time: the calculation time of 00-8:00 can be respectively: timeDay1Pre, timeDay2Pre, …, timeDay7Pre.

2. And acquiring a plurality of post-calculation forces within a preset time range within a certain time period after the current moment.

For example, the preset time range may be seven days or eight days, and the certain period of time after the current time may be a period of time of one hour after the current time. If the current time is 8 points, the preset time range is seven days, and a certain time period is a certain time period after the current time, namely 8 days after the current time: the calculation time of 00-9:00 can be respectively: timeDay1Next, timeDay2Next, …, timeDay7Next.

3. Calculating a difference variance of the calculated force values of the time consuming a plurality of previous calculated forces and the time consuming a plurality of later calculated forces as a preset confidence threshold.

For example, the variance of the calculated force difference may be { (TimeDay 1Next-TimeDay1 Pre) ² +(TimeDay2Next-TimeDay2Pre) ² +…+(TimeDay7Next-TimeDay7Pre) ² }/7。

Then, it is determined whether the pre-calculated occupancy time period of the first AI core satisfies a pre-set confidence threshold.

In the embodiment of the present application, whether the pre-calculated time length of the first AI core is reliable may be determined according to the pre-calculated time length of the pre-calculated first AI core and the calculated preset confidence threshold.

For example, if the similarity of the historical occupation calculation time length of the first AI core in the same time period is higher, that is, the variance calculated according to the historical occupation calculation time length is smaller, the calculation force fluctuation of the first AI core is smaller; the higher the reliability of the pre-calculated time length of the first AI core is predicted according to the rule of the historical time length of the pre-calculated time length of the first AI core, the easier the pre-calculated time length of the first AI core meets the preset confidence threshold.

In this embodiment of the present application, the method for calculating the pre-calculated force duration of the first AI kernel may be:

firstly, a plurality of post-calculation forces within a preset time range and a certain time period after the current moment are acquired.

For example, the preset time range may be seven days or eight days, and the certain period of time after the current time may be one hour after the current time. If the current time is 8 points, the preset time range is seven days, and a certain time period is a certain time period after the current time, namely 8 days after the current time: the calculation time of 00-9:00 can be respectively: timeDay1Next, timeDay2Next, …, timeDay7Next.

And taking the average value of the plurality of time consuming times for calculating the force afterwards as the pre-occupied time length for calculating the force.

For example, the pre-occupancy calculation duration TimeAverageNext is:

TimeAverageNext＝(TimeDay1Next+TimeDay2Next+…+TimeDay7Next)/7。

for example, the method for determining whether the pre-calculated duration of the first AI kernel meets the preset confidence threshold may be:

if the preset confidence coefficient threshold value of the first AI core is larger than the variance between the time consumption of the plurality of later calculation forces and the time length of the pre-calculated occupation force, determining that the time length of the pre-calculated occupation force of the first AI core meets the preset confidence coefficient threshold value.

For example, the preset confidence threshold for the first AI core may be: (TimeDay 1Next-TimeDay1 Pre) ² +(TimeDay2Next-TimeDay2Pre) ² +…+(TimeDay7Next-TimeDay7Pre) ² }/7。

The variance between the plurality of post-calculation force durations and the pre-calculated occupancy force durations may be: { (TimeDay 1 Next-TimeAverageNext) ² +(TimeDay2Next–TimeAverageNext) ² +…+(TimeDay7Next–TimeAverageNext) ² }/7。

And finally, if the pre-calculated time length of the first AI core meets the preset confidence coefficient threshold, performing risk marking on the first AI core according to the preset risk threshold and the preset safety threshold of the pre-calculated time length of the first AI core.

If the pre-occupation calculation time length of the first AI core does not meet the preset confidence coefficient threshold, acquiring the pre-occupation calculation time length of the first AI core according to a preset weight coefficient, wherein the pre-occupation calculation time length comprises the following steps:

the method comprises the steps of obtaining a plurality of instant occupation force calculating time lengths corresponding to a plurality of time points in a preset time period, wherein the instant occupation force calculating time lengths correspond to the time points one by one, and the time points are different time points in the preset time period.

The first instantaneous occupation time length and the second instantaneous occupation force calculation time length can be obtained, wherein the first instantaneous occupation time length is the instantaneous occupation force calculation time length obtained by sampling at a first time point, the second instantaneous occupation time length is the instantaneous occupation force calculation time length obtained by sampling at a second time point, a first weight coefficient corresponding to the first instantaneous occupation force calculation time length is larger than a second weight coefficient corresponding to the second instantaneous occupation force calculation time length, the first time point is closer to the end time of the preset time period than the second time point, and the first time point and the second time point are different time points in a plurality of time points.

For example, the preset time period may be a time period within 4 seconds nearest to the current time, and 1 second from the current time, 2 seconds from the current time, 3 seconds from the current time, and 4 seconds from the current time may be regarded as the plurality of time points, respectively, as the first time point, the second time point, the third time point, and the fourth time point. The occupied time periods corresponding to the four time points are respectively a first instantaneous occupied time period, a second instantaneous occupied time period, a third instantaneous occupied time period and a fourth instantaneous occupied time period; the weight coefficients corresponding to the four homeopathic occupied durations can be respectively 0.4, 0.3, 0.2 and 0.1, namely the weight coefficient corresponding to the instantaneous occupied duration measured closer to the current time is larger.

And then, according to a preset weight coefficient, carrying out weighted summation on a plurality of instantaneous occupied calculation time lengths to obtain the calculation time length.

Step S402: and carrying out risk marking on each AI core in the plurality of AI cores according to the pre-calculated time length of the pre-occupied computing force of each AI core, the pre-set risk threshold and the pre-set safety threshold to obtain at least one safety AI core and at least one risk AI core.

Wherein the at least one security AI core and the at least one risk AI core are AI cores of the plurality of AI cores, and the preset risk threshold is greater than the preset security threshold.

In the embodiment of the present application, an AI core whose pre-occupation time length is longer than a preset risk threshold may be marked as a risk AI core, and an AI core whose pre-occupation time length is shorter than a preset safety threshold may be marked as a safety AI core.

For example, there are four AI kernels: when the first AI core, the second AI core, the third AI core and the fourth AI core are respectively 900ms, 700ms, 840ms and 300ms of the four AI cores, if the preset risk threshold is 800ms, the preset safety threshold is 500ms, the first AI core and the third AI core can be marked as risk AI cores, and the fourth AI core is a safety AI core.

Step S403: and migrating the executed service on the risk AI core to the security AI core.

In the embodiment of the present application, the multiple risk AI kernels may be sequenced from high to low according to the duration of the pre-occupation calculation, and one or more of the running services on the multiple risk AI kernels may be sequentially migrated to the security AI kernel according to the sequence from high to low according to the duration of the pre-occupation calculation, so that the duration of the pre-occupation calculation of the risk AI kernels falls within a preset risk threshold.

Specifically, taking a first risk AI core of one of the risk AI cores as an example, the method for migrating the running service to the security AI core by the first risk AI core may specifically include:

first,: and acquiring the pre-calculated force occupation time length of each operated service in a plurality of operated services operated in the first risk AI core.

In this embodiment of the present application, the method for obtaining the estimated time periods of the multiple running services in the first risk AI core and the method for obtaining the estimated time period of the power consumption of each AI core may be the same or different, which is not limited herein.

And then, screening the service to be migrated from the plurality of running services according to the pre-calculated force duration of each running service.

In the embodiment of the application, after the service to be migrated in the first risk AI core is migrated to the target security AI core, the pre-calculated time length of the first risk AI core is smaller than a preset risk threshold; and the pre-calculated time length of the first risk AI core after the migration of the service to be migrated is relatively closest to a preset risk threshold.

In this embodiment of the present application, the closer the pre-calculated force duration of the first risk AI core after migration is to the preset risk threshold, the higher the calculated force utilization rate of the AI core.

For example, if the difference obtained by subtracting the pre-calculated time length of the service to be migrated from the pre-calculated time length of the first risk AI kernel is regarded as a first difference; taking the difference value of the pre-occupation calculation time length of the first risk AI core minus the pre-occupation calculation time length of other running services as a second difference value; the first difference is less than a preset risk threshold and greater than the second difference. The other executed service is any one service different from the service to be migrated among the plurality of executed services executed on the first risk AI core.

For example, when the pre-calculated force occupation duration of the first risk AI core is 900ms, if the pre-calculated force occupation durations of the plurality of executed services executed in the first risk AI core are respectively: first service 400ms, second service 300ms, third service 150ms, and fourth service 50ms. And migrating one of the first service, the second service and the third service because the pre-calculation force occupation time is 900ms minus 50ms of the fourth service or is larger than a preset risk threshold value. Since the difference between the pre-calculated force occupation time length of 900ms minus the third service 150ms is greater than the difference between the first service and the second service, the third service can be determined as the service to be migrated.

In the embodiment of the present application, when risk marking is performed on each AI core of the plurality of AI cores according to the pre-calculated force duration, the pre-set risk threshold, and the pre-set security threshold of each AI core, if the risk AI core is not detected, only the security AI core is detected, and only the security AI core may be marked. If only the risk AI core is detected and the security AI core is not detected, the risk AI core may be marked and monitoring is continued whether the security AI core is present or not, so as to perform migration of the risk AI core. If the risk AI core and the security AI core are not detected, no migration is performed.

In some embodiments, if there are no services to be migrated among the plurality of already-running services. That is, the pre-calculated force occupation duration of the first risk AI kernel does not exist in the plurality of executed services, and the executed services with the difference of the pre-calculated force occupation durations of the executed services less than the preset risk threshold value can be migrated. The sum of the pre-calculated force occupancy time durations of the plurality of executed services subtracted from the pre-calculated force occupancy time duration of the first risk AI kernel may be considered as a first difference.

And finally, migrating the service to be migrated to a target security AI core, wherein the target security AI core is one of at least one security AI core.

In the embodiment of the present application, the method for migrating the service to be migrated to the target security AI core may be:

firstly, acquiring the pre-calculated time length of each security AI core in a plurality of security AI cores, and taking the shortest pre-calculated time length of each security AI core as a target security AI core.

Then, determining whether the sum of the pre-occupation calculation time consumption of the service to be migrated and the pre-occupation calculation time length of the target security AI core is smaller than a preset risk threshold.

And if the sum of the time consumption of the pre-occupied computing force of the service to be migrated and the time duration of the pre-occupied computing force of the target security AI core is smaller than a preset risk threshold, migrating the service to be migrated to the target security AI core.

And if the sum of the pre-occupation calculation time consumption of the service to be migrated and the pre-occupation calculation time length of the target security AI core is greater than or equal to a preset risk threshold, re-acquiring the pre-occupation calculation time length of each operated service in a plurality of operated services operated in the first risk AI core.

In the technical solution provided in the present application, since a plurality of services running on a plurality of AI cores may be different in different usage scenarios and different running time periods, the required computation force may be different. In order to avoid the situation that the computing power of the AI cores is overloaded or the computing power utilization rate is too low, the technical scheme provided by the application can be used for pre-calculating the pre-occupied computing power time consumption of each AI core every preset time period, marking the risk AI cores and the safe AI cores in the server, and migrating part of the running services in the risk AI cores to the safe AI cores so as to improve the running efficiency of the server and the computing power resource utilization rate of the server. According to the method and the device, the calculated force duration of the pre-occupation of each AI core is calculated, so that the dynamic migration of the service can be realized, the most AI cores in the plurality of AI cores are ensured to be in the optimal force operation and occupation state, the force overload of the risk AI cores is avoided, and the force effect is influenced; and the situation of resource waste caused by excessive calculation power in the safe AI cores is generated, so that the resource load balance among the AI cores is realized.

For example, as shown in fig. 5, another computational migration flow chart of a service allocation method provided in an embodiment of the present application may include the following steps:

step S501: the pre-occupation calculation time length of each AI core in the plurality of AI cores is obtained.

Step S502: and marking the AI cores with the pre-occupation time duration exceeding a preset risk threshold value in the plurality of AI cores as risk AI cores, and marking the AI cores with the pre-occupation time duration being lower than a preset safety threshold value in the plurality of AI cores as safety AI cores.

Step S503: the plurality of risk AI kernels are grouped into a set of risk AI kernels.

Step S504: and setting the AI core with the longest calculated force occupation duration in the risk AI core set as a first risk AI core.

Step S505: and acquiring the pre-calculated time length of each executed service in the plurality of services executed on the first risk AI core, and forming the plurality of executed services into a service set to be migrated.

Step S506: and determining the running service with the shortest predicted occupation calculation time length in the service set to be migrated as the first service to be migrated.

Step S507: and judging whether a first difference value obtained by subtracting the estimated time length of the first to-be-migrated service from the estimated time length of the first risk AI core is smaller than a preset risk threshold value.

If yes, go to step S508.

If not, the first service to be migrated is moved out of the service set to be migrated, and step S506 is executed.

Step S508: and setting the AI core with the shortest pre-calculated force duration in the plurality of security AI cores as a first security AI core.

Step S509: and judging whether the sum of the pre-calculated time length of the first security AI core and the pre-calculated time length of the first service to be migrated is smaller than a preset risk threshold.

If yes, step S510 is executed to migrate the first service to be migrated to the first security AI core.

If not, the above step S505 is executed.

Step S510: and migrating the first service to be migrated to the first security AI core.

In this embodiment of the present application, after migrating a first service to be migrated to a first security AI core, updating the migrated service information, removing the first risk AI core from the risk AI core set, and executing step S510 to determine whether an AI core exists in the risk AI core set.

Step S511: and judging whether the AI core exists in the risk AI core set.

If yes, go to step S504.

If not, the current method flow is ended, and the method flow provided in the above steps S501 to S511 is re-executed waiting for the next time period.

The specific process of the method flow provided in the above steps S501 to S511 may refer to the descriptions of the above steps S401 to S403, which are not described herein in detail.

It should be understood that the embodiments of the foregoing application may be implemented in combination with each other to adapt to practical application requirements without logic conflict. The specific examples or embodiments obtained from these combinations are within the scope of the present application.

Corresponding to the service allocation method in the above embodiment, the present embodiment provides a service allocation apparatus, which may be implemented as part or all of a computer device by software, hardware or a combination of both, for performing the steps in the service allocation method in the above embodiment.

Fig. 6 shows a schematic structural diagram of a service distribution device 60 according to an embodiment of the present application, and for convenience of explanation, only a portion related to the embodiment of the present application is shown.

Referring to fig. 6, the apparatus 60 includes a first acquisition module 610, a second acquisition module 620, a third acquisition module 630, and a distribution module 640.

The first obtaining module 610 is configured to obtain pre-computation force time required by the service to be distributed to run on each AI core of the plurality of AI cores.

A second obtaining module 620 is configured to obtain a current available computing power duration of each AI core of the plurality of AI cores.

A third obtaining module 630, configured to obtain a target AI core from the multiple AI cores, where a currently available computing power duration of the target AI core meets a pre-computing power time consumption of the service to be allocated.

An allocation module 640, configured to allocate the service to be allocated to the target AI core.

In some implementations, the target AI core is the AI core of the plurality of AI cores having the longest currently available computing power duration.

In some implementations, the allocation module 640 includes:

If yes, the service to be distributed is distributed to the target AI core.

In some implementations, the assignment module 640 further includes:

If yes, the service to be distributed is distributed to the target AI core.

In some implementations, the assignment module 640 further includes:

If yes, adding the first AI core into the AI core set to be selected.

In some embodiments, the first obtaining module 610 further includes:

and the acquisition unit is used for acquiring the pre-occupation calculation time length of each AI core in the plurality of AI cores.

The marking unit is used for marking risks of each AI core in the plurality of AI cores according to the pre-calculated time length of the occupied force of each AI core, the preset risk threshold value and the preset safety threshold value, so as to obtain at least one safe AI core and at least one risk AI core, wherein the at least one safe AI core and the at least one risk AI core are the AI cores in the plurality of AI cores, and the preset risk threshold value is larger than the preset safety threshold value.

And the migration unit is used for migrating the operated service on the risk AI core to the security AI core.

In some embodiments, the migration unit includes:

acquiring the pre-calculated force occupation time of each of a plurality of operated services, wherein the plurality of operated services are operated in a first risk AI core, and the first risk AI core is any one of at least one risk AI core;

screening the service to be migrated from a plurality of operated services according to the pre-calculated time length of each operated service;

In some embodiments, the migration unit includes:

And migrating the service to be migrated to the target security AI core.

In some embodiments, the migration unit includes:

If yes, the service to be migrated is migrated to the target security AI core.

In some embodiments, the migration unit includes:

It should be noted that, because the content of information interaction and execution process between the above devices/units is based on the same concept as the method embodiment of the present application, specific functions and technical effects thereof may be referred to in the method embodiment section, and will not be described herein again.

It should be understood that the sequence number of each step in the foregoing embodiment does not mean that the execution sequence of each process should be determined by the function and the internal logic of each process, and should not limit the implementation process of the embodiment of the present application in any way.

Based on the same inventive concept, the embodiment of the application also provides electronic equipment.

Fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present application. As shown in fig. 7, the electronic device 7 of this embodiment includes: at least one processor 710 (only one shown in fig. 7), a memory 720, and a communication module 740, the memory 720 storing a computer program 730 that may be run on the processor 710. The steps of the above-described service allocation method embodiment, such as 201 to 204 shown in fig. 2, are implemented by the processor 710 when executing the computer program 730. Alternatively, the processor 710 may implement the functions of the modules/units in the above-described apparatus embodiments when executing the computer program 730, for example, the functions of the modules 610 to 640 shown in fig. 6, and the communication module 740 may be a separate communication unit, for communicating with an external server or terminal device.

The electronic device 7 may include, but is not limited to: processor 710, memory 720. It will be appreciated by those skilled in the art that fig. 7 is merely an example of the electronic device 7 and is not meant to be limiting of the electronic device 7, and may include more or fewer components than shown, or may combine certain components, or different components, e.g., the electronic device 7 may also include an input transmitting device, a network access device, a bus, etc.

The processor 710 may be a central processing unit (Central Processing Unit, CPU), but may also be other general purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), off-the-shelf programmable gate arrays (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

Memory 720 may be an internal storage unit of electronic device 7 in some embodiments, such as a hard disk or memory of electronic device 7. The memory 720 may also be an external storage device of the electronic device 7, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card) or the like, which are provided on the electronic device 7. Memory 720 may also include both internal storage units and external storage devices for electronic device 7. The memory 720 is used to store an operating system, application programs, boot loader (BootLoader), data, and other programs, etc., such as program code of the computer program 730. Memory 720 may also be used to temporarily store data that has been transmitted or is to be transmitted.

In addition, it will be clearly understood by those skilled in the art that, for convenience and brevity of description, only the above-described division of the functional units and modules is illustrated, and in practical application, the above-described functional allocation may be performed by different functional units and modules according to needs, i.e. the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-described functions. The functional units in the embodiments of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.

The embodiments of the present application provide a computer readable storage medium storing a computer program, which when run on an electronic device causes the electronic device to perform the steps of the method embodiments described above.

The embodiment of the application provides a chip, which comprises a processor and a memory, wherein a computer program is stored in the memory, and the computer program realizes the steps in the method embodiments when being executed by the processor.

Embodiments of the present application provide a computer program product for causing an electronic device to perform the steps of the various method embodiments described above when the computer program product is run on the electronic device.

It should be appreciated that the processors referred to in the embodiments of the present application may be central processing units (Central Processing Unit, CPU), but may also be other general purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), off-the-shelf programmable gate arrays (Field Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

It should also be understood that the memory referred to in the embodiments of the present application may be volatile memory or nonvolatile memory, or may include both volatile and nonvolatile memory. The nonvolatile Memory may be a Read-only Memory (ROM), a Programmable ROM (PROM), an Erasable Programmable ROM (EPROM), an Electrically Erasable Programmable EPROM (EEPROM), or a flash Memory. The volatile memory may be random access memory (Random access Memory, RAM) which acts as an external cache. By way of example, and not limitation, many forms of RAM are available, such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double Data Rate SDRAM (Double Data Rate SDRAM), enhanced SDRAM (ESDRAM), synchronous DRAM (SLDRAM), and Direct RAM (DR RAM).

It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-described division of the functional units and modules is illustrated, and in practical application, the above-described functional distribution may be performed by different functional units and modules according to needs, i.e. the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-described functions. The functional units and modules in the embodiment may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit, where the integrated units may be implemented in a form of hardware or a form of a software functional unit. In addition, specific names of the functional units and modules are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working process of the units and modules in the above system may refer to the corresponding process in the foregoing method embodiment, which is not described herein again.

In the foregoing embodiments, the descriptions of the embodiments are emphasized, and in part, not described or illustrated in any particular embodiment, reference is made to the related descriptions of other embodiments.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other manners. For example, the system embodiments described above are merely illustrative, e.g., the division of the modules or units is merely a logical functional division, and there may be additional divisions when actually implemented, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection via interfaces, devices or units, which may be in electrical, mechanical or other forms.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in each embodiment of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.

The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the present application implements all or part of the flow of the method of the above embodiments, and may be implemented by a computer program to instruct related hardware, where the computer program may be stored in a computer readable storage medium, where the computer program, when executed by a processor, may implement the steps of each of the method embodiments described above. Wherein the computer program comprises computer program code which may be in source code form, object code form, executable file or some intermediate form etc. The computer readable medium may include at least: any entity or device capable of carrying computer program code to a large screen apparatus, a recording medium, a computer Memory, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), an electrical carrier signal, a telecommunications signal, and a software distribution medium. Such as a U-disk, removable hard disk, magnetic or optical disk, etc. In some jurisdictions, computer readable media may not be electrical carrier signals and telecommunications signals in accordance with legislation and patent practice.

Finally, it should be noted that: the foregoing is merely a specific embodiment of the present application, but the protection scope of the present application is not limited thereto, and any changes or substitutions within the technical scope of the present disclosure should be covered in the protection scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A method of service allocation, the method comprising:

acquiring pre-calculation force time required by running a service to be distributed on each of a plurality of AI cores;

acquiring the current available calculation time length of each AI core in a plurality of AI cores;

acquiring a target AI core from the plurality of AI cores, wherein the current available calculation force duration of the target AI core meets the pre-calculation force time consumption of the service to be distributed;

and distributing the service to be distributed to the target AI core.

2. The method of claim 1, wherein the target AI core is an AI core of the plurality of AI cores having a longest currently available computing power duration.

3. The method according to claim 1 or 2, wherein said assigning the service to be assigned to the target AI core comprises:

Determining whether the current available computing power time length of the target AI core is greater than target pre-computing power time consumption, wherein the target pre-computing power time consumption is the computing power time consumption occupied by the service to be distributed to run on the target AI core;

if yes, the service to be distributed is distributed to the target AI core.

4. A method according to claim 3, wherein the target pre-calculated force time consumption is a pre-optimal calculated force time consumption.

5. The method according to claim 4, wherein the method further comprises:

if the current available computing force time length of the target AI core is smaller than the target computing force time consumption, acquiring the computing force time consumption of the target AI core, which is the least in advance, required by running the service to be distributed;

and when the current available calculation force time of the target AI core is longer than or equal to the predicted minimum calculation force time, distributing the service to be distributed to the target AI core.

6. The method of claim 5, wherein the current available computing force duration is a current available optimal computing force duration, the method further comprising:

when the current available optimal calculation force duration of the target AI core is less than the predicted minimum calculation force time, acquiring the current available minimum calculation force duration of each AI core in the plurality of AI cores;

Selecting one of the plurality of AI cores with the longest currently available minimum computational effort duration as the target AI core;

judging whether the currently available lowest calculation force duration of the target AI core is greater than or equal to the predicted lowest calculation force time consumption;

if yes, the service to be distributed is distributed to the target AI core.

7. The method of any of claims 4 to 6, wherein the acquiring the target AI core from the plurality of AI cores comprises:

determining whether a current available computing power time length of a first AI core is longer than a pre-computing power time required by the first AI core to operate the service to be distributed, wherein the first AI core is any one of a plurality of AI cores;

if yes, adding the first AI core into an AI core set to be selected;

and screening the target AI core from the AI core set to be selected, wherein the service with the same service type as the service to be allocated exists in the operated service operated on the target AI core.

8. The method of any one of claims 1, 2, 4, 5, and 6, further comprising:

acquiring the pre-calculated force occupying time length of each AI core in the plurality of AI cores;

according to the pre-calculated force duration of each AI core, a preset risk threshold value and a preset safety threshold value, performing risk marking on each AI core in the plurality of AI cores to obtain at least one safety AI core and at least one risk AI core, wherein the at least one safety AI core and the at least one risk AI core are AI cores in the plurality of AI cores, and the preset risk threshold value is larger than the preset safety threshold value;

And migrating the operated service on the risk AI core to the security AI core.

9. The method of claim 8, wherein the migrating the executed service on the risk AI core to the security AI core comprises:

acquiring pre-calculated force occupation time of each of a plurality of operated services, wherein the plurality of operated services are operated in a first risk AI core, and the first risk AI core is any one of the at least one risk AI core;

screening the service to be migrated from the plurality of operated services according to the pre-calculated force occupation time length of each operated service;

and migrating the service to be migrated to a target security AI core, wherein the target security AI core is one of the at least one security AI core.

10. The method of claim 9, wherein the step of determining the position of the substrate comprises,

the difference of the pre-calculated force occupation time length of the first risk AI core minus the pre-calculated force occupation time length of the service to be migrated is a first difference value;

the first difference value is smaller than the preset risk threshold value and larger than the second difference value;

the second difference value is a difference value obtained by subtracting the pre-calculated force duration of other operated services from the pre-calculated force duration of the first risk AI core, wherein the other operated services are any one service different from the service to be migrated from a plurality of operated services operated on the first risk AI core.

11. The method of claim 9 or 10, wherein the number of the at least one security AI core is a plurality, and wherein the migrating the service to be migrated to the target security AI core comprises:

acquiring the pre-calculated force occupying time length of each security AI core in the at least one security AI core;

selecting one of the at least one security AI core with the shortest predicted occupation calculation force duration as the target security AI core;

and migrating the service to be migrated to the target security AI core.

12. The method of claim 11, wherein the migrating the service to be migrated to the target security AI core comprises:

determining whether the sum of the pre-calculated effort consuming time of the service to be migrated and the pre-calculated effort consuming time of the target security AI core is smaller than the preset risk threshold;

if yes, migrating the service to be migrated to the target security AI core;

and if not, returning to the step of acquiring the pre-calculated time length of each of the plurality of operated services.

13. The method of claim 8, wherein the obtaining the pre-calculated force duration for each of the AI cores comprises:

Acquiring a predicted occupation calculation time length of a first AI core, wherein the first AI core is any one of the AI cores, and the predicted occupation calculation time length is a calculation time length obtained by carrying out weighted summation on a plurality of instantaneous occupation calculation time lengths according to a preset weight coefficient;

the plurality of instantaneous occupation calculation time periods are occupation calculation time periods obtained by sampling at a plurality of time points, the plurality of instantaneous occupation calculation time periods are in one-to-one correspondence with the plurality of time points, the plurality of time points are mutually different time points in a preset period, the first instantaneous occupation time period is instantaneous occupation calculation time period obtained by sampling at a first time point, the second instantaneous occupation time period is instantaneous occupation calculation time period obtained by sampling at a second time point, a first weight coefficient corresponding to the first instantaneous occupation calculation time period is larger than a second weight coefficient corresponding to the second instantaneous occupation calculation time period, the first time point is close to the end time of the preset period than the second time point, and the first time point and the second time point are different time points in the plurality of time points.

14. The method of claim 13, wherein the obtaining the pre-calculated force duration of the first AI core comprises:

Predicting the predicted occupied computing force duration of the first AI core based on the rule of the historical occupied computing force duration of the first AI core;

determining whether the pre-calculated occupancy time length of the first AI core meets a preset confidence threshold;

if yes, executing the step of risk marking of each AI core in the plurality of AI cores according to the pre-calculated time length of the pre-occupied time of each AI core, the preset risk threshold and the preset safety threshold;

and if not, executing the step of acquiring the pre-calculated force duration of the first AI core.

15. An electronic device comprising a processor and a memory, the processor being configured to execute a computer program stored in the memory to implement the method of any of the preceding claims 1-14.

16. A computer readable storage medium storing a computer program, characterized in that the computer program, when executed by a processor, implements the method as claimed in any of the preceding claims 1-14.