CN111047042A

CN111047042A - Operation method and device of reasoning service model

Info

Publication number: CN111047042A
Application number: CN201911245117.5A
Authority: CN
Inventors: 郝滋雨
Original assignee: Beijing QIYI Century Science and Technology Co Ltd
Current assignee: Beijing QIYI Century Science and Technology Co Ltd
Priority date: 2019-12-06
Filing date: 2019-12-06
Publication date: 2020-04-21
Anticipated expiration: 2039-12-06
Also published as: CN111047042B

Abstract

The embodiment of the invention provides an operation method and a device of inference service, wherein the method comprises the following steps: sequentially inputting a plurality of object sets to be inferred of the number of objects to a preset inference service; acquiring operation efficiency parameters of the inference service for processing a plurality of object sets to be inferred; determining the operation efficiency of the reasoning service by adopting the operation efficiency parameter, and configuring the number of objects corresponding to the reasoning service into the number of target objects when the operation efficiency of the reasoning service is at a peak value; and inputting a set of objects to be inferred to the inference service by adopting the number of the target objects. According to the operation method of the inference service, when the operation efficiency is in the peak value, the number of the target objects corresponding to the inference service is determined, the target object number is adopted, and the set of the objects to be inferred is input to the inference service, so that the inference service is kept at a good operation efficiency, processor resources consumed by the inference service are saved, and the processing speed of the objects to be inferred is higher.

Description

Operation method and device of reasoning service model

Technical Field

The invention relates to the technical field of deep learning, in particular to an operation method of an inference service model and an operation device of the inference service model.

Background

Inference services generally refer to program objects that provide inferential predictive functions using a trained deep learning model. For example, inference services may be used to provide personalized recommendations, intelligent searching, data classification, hot spot prediction, and other functions. The inference service may include a pre-processing module, a model version management module, a service interface encapsulation module, and an inference service model. The reasoning service model is a deep learning model after training. When the inference service model operates, data to be inferred input from the outside needs to be acquired, and an inference result is obtained after the data to be inferred is processed by the deep learning model.

Generally speaking, the inference service model requires a large amount of computational resources of the processor during operation. As the demand for inference services increases, the operational efficiency of the inference service model changes accordingly each time the inference service model needs to process more data. To meet the demands of the inference service, it is necessary to more efficiently utilize the computational resources of the processor and improve the operation efficiency of the inference service model.

Disclosure of Invention

The embodiment of the invention aims to provide an operation method of an inference service and an operation device of the inference service so as to improve the operation efficiency of an inference service model. The specific technical scheme is as follows:

in a first aspect of the present invention, there is provided a method for operating an inference service model, including:

acquiring a plurality of preset object sets to be inferred, wherein the number of objects contained in different object sets to be inferred is different;

sequentially inputting the set of the objects to be inferred to a preset inference service model, wherein the input sequence to the inference service model is to firstly input the set of the objects to be inferred with the least number of the objects;

acquiring an operation efficiency parameter of the inference service model in a state of processing each object set to be inferred;

configuring the number of objects of an object set to be inferred corresponding to a target operation efficiency parameter into the number of target objects, wherein the target operation efficiency parameter is the optimal parameter in each operation efficiency parameter;

and inputting a set of objects to be inferred comprising the number of the target objects into the inference service model.

Optionally, the operating efficiency parameter includes: delay values, throughput, and processor utilization.

Optionally, the configuring the number of objects of the object set to be inferred corresponding to the target operation efficiency parameter into the target number of objects includes:

calculating delay value acceleration G between delay values of two object sets to be inferred with adjacent values of the number of the objects by adopting the following formula_L；

Wherein L is₁The delay value corresponding to the object set to be inferred with less number of objects in the two object sets to be inferred; l is₂The delay value corresponding to the object set to be inferred with more objects in the two object sets to be inferred;

calculating the throughput speed increase G between the throughputs of two object sets to be inferred with the number values of the objects adjacent by adopting the following formula_T；

Wherein, T₁The throughput corresponding to the object set to be inferred with less objects in the two object sets to be inferred; t is₂The throughput corresponding to the object set to be inferred with more objects in the two object sets to be inferred;

calculating a processor utilization rate acceleration G between the processor utilization rates of two object sets to be inferred with adjacent values of the object number by adopting the following formula_U；

Wherein, U₁The processor utilization rate corresponding to the object set to be inferred with less objects in the two object sets to be inferred; u shape₂The processor utilization rate corresponding to the object set to be inferred with more objects in the two object sets to be inferred;

comparing delay value acceleration rate, throughput acceleration rate and processor utilization rate acceleration rate corresponding to the two object sets to be inferred in sequence, and determining whether the delay value acceleration rate is greater than the throughput acceleration rate and/or whether the delay value acceleration rate is greater than the processor utilization rate acceleration rate; the comparison sequence is that two object sets to be inferred with the least number of objects are compared firstly;

when the delay value acceleration rate is larger than the throughput acceleration rate, and/or the delay value acceleration rate is larger than the processor utilization rate acceleration rate, configuring the operation efficiency parameter corresponding to the object set to be inferred with fewer objects in the two object sets to be inferred corresponding to the delay value acceleration rate, the throughput acceleration rate and the processor utilization rate acceleration rate as a target operation efficiency parameter, and configuring the object number of the object set to be inferred with fewer objects as a target object number.

Optionally, configuring the number of objects in the object set to be inferred corresponding to the target operating efficiency parameter as the number of target objects, including:

generating a fitting parameter f corresponding to each object set to be inferred by adopting the following formula:

wherein, L is a delay value, T is throughput, and U is the utilization rate of the processor;

establishing a coordinate system and generating a fitting curve by taking the object number of the object set to be inferred as an x axis and the fitting parameter f of the object set to be inferred as a y axis;

determining a first derivative of a point corresponding to the object set to be inferred in the fitting curve;

when the absolute value of the first-order derivative is smaller than a preset threshold value, determining the operation efficiency parameter of the object set to be inferred corresponding to the first-order derivative as a target operation efficiency parameter, and configuring the number of the objects of the object set to be inferred as the number of the target objects.

In a second aspect of the present invention, there is also provided an apparatus for operating an inference service, including:

the system comprises a first acquisition module, a second acquisition module and a third acquisition module, wherein the first acquisition module is used for acquiring a plurality of preset object sets to be inferred, and the number of objects contained in different object sets to be inferred is different;

the first input module is used for sequentially inputting the object sets to be inferred to a preset inference service model, wherein the input sequence to the inference service model is to input the object sets to be inferred with the least number of objects;

the second acquisition module is used for acquiring the operation efficiency parameters of the inference service model under the state of processing each object set to be inferred;

the configuration module is used for configuring the number of objects of the object set to be inferred corresponding to the target operation efficiency parameter into the number of target objects, wherein the target operation efficiency parameter is the optimal parameter in each operation efficiency parameter;

a second input module for inputting the set of objects to be inferred containing the number of the target objects to the inference service model

Optionally, the configuration module includes:

a delay value acceleration calculation submodule for calculating a delay value acceleration G between delay values of two object sets to be inferred whose values are adjacent to each other by using the following formula_L；

a throughput speed-up calculation submodule for adoptingCalculating the throughput speed increase G between the throughputs of two object sets to be inferred with the number of the objects adjacent to each other_T；

a processor utilization rate acceleration calculation submodule for calculating a processor utilization rate acceleration G between the processor utilization rates of two to-be-inferred object sets whose numerical values are adjacent to each other by using the following formula_U；

the comparison submodule is used for comparing delay value acceleration, throughput acceleration and processor utilization rate acceleration corresponding to the two object sets to be inferred in sequence, and determining whether the delay value acceleration is greater than the throughput acceleration and/or whether the delay value acceleration is greater than the processor utilization rate acceleration; the comparison sequence is that two object sets to be inferred with the least number of objects are compared firstly;

and the first configuration submodule is used for configuring the operation efficiency parameter corresponding to the object set to be inferred with less object number in the two object sets to be inferred corresponding to the delay value acceleration rate, the throughput acceleration rate and the processor utilization acceleration rate as a target operation efficiency parameter and configuring the object number of the object set to be inferred with less object number as the target object number when the delay value acceleration rate is greater than the throughput acceleration rate and/or the delay value acceleration rate is greater than the processor utilization acceleration rate.

Optionally, the configuration module includes:

the parameter generation submodule is used for generating a fitting parameter f corresponding to each object set to be inferred by adopting the following formula:

the curve generation submodule is used for establishing a coordinate system and generating a fitting curve by taking the object number of the object set to be inferred as an x axis and the fitting parameter f of the object set to be inferred as a y axis;

a derivative determining submodule for determining a first derivative of a point corresponding to the object set to be inferred in the fitting curve;

and the second configuration submodule is used for determining the operation efficiency parameter of the object set to be inferred corresponding to the first derivative as a target operation efficiency parameter when the absolute value of the first derivative is smaller than a preset threshold value, and configuring the number of the objects of the object set to be inferred as the number of the target objects.

In another aspect of the present invention, an electronic device includes a processor, a communication interface, a memory and a communication bus, wherein the processor, the communication interface and the memory communicate with each other via the communication bus;

a memory for storing a computer program;

the processor is used for realizing the operation method steps of the reasoning service in any embodiment of the invention when the program stored in the memory is executed.

In yet another aspect of the present invention, there is also provided a computer-readable storage medium having stored therein instructions, which when executed on a computer, cause the computer to execute the method of operating an inference service as described in any above.

In yet another aspect of the present invention, there is also provided a computer program product containing instructions which, when run on a computer, cause the computer to perform the method of operating an inference service as described in any above.

The operation method of the inference service of the embodiment of the invention inputs the set of the objects to be inferred to a preset inference service model in sequence, wherein the input sequence to the inference service model is to input the set of the objects to be inferred with the least number of the objects; acquiring an operation efficiency parameter of the inference service model in a state of processing each object set to be inferred; and configuring the number of objects of the object set to be inferred corresponding to the target operation efficiency parameter as the number of the target objects, wherein the target operation efficiency parameter is the optimal parameter in each operation efficiency parameter. And inputting the set of the objects to be inferred to the set of the object models to be inferred by adopting the number of the target objects, so that the inference service model can be kept at a better operation efficiency, processor resources consumed by the inference service model are saved, and the processing speed of the objects to be inferred is higher.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below.

FIG. 1 is a flowchart illustrating steps of a method for operating an inference service model according to an embodiment of the present invention;

FIG. 2 is a block diagram of an embodiment of an apparatus for operating an inference service model according to an embodiment of the present invention;

fig. 3 is a block diagram of an embodiment of an electronic device according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be described below with reference to the drawings in the embodiments of the present invention.

Referring to fig. 1, a flowchart illustrating steps of an embodiment of an operation method of an inference service model of the present invention is shown, which may specifically include the following steps:

step 101, acquiring a plurality of preset object sets to be inferred, wherein the number of objects contained in different object sets to be inferred is different;

in the embodiment of the present invention, the inference service model may be a deep learning model after training. The reasoning service model can be used for reasoning service, and functions of providing personalized recommendation, intelligent search, data classification, hotspot prediction and the like are realized.

In the embodiment of the present invention, a set of objects to be inferred may be input to the inference service model each time, and a processing result returned by the inference service model may be obtained. Specifically, the set of objects to be inferred includes a preset number of objects to be inferred. The number of objects may be at least one. As an example of the present invention, when the inference service model is used for performing picture classification, the object to be inferred may be a picture, and the set of objects to be inferred may include at least one picture. When the inference service model is used for performing video classification, the object to be inferred may be a video clip, and the set of objects to be inferred may include at least one video clip. There may be no connection between the video segments, and the video segments may also be obtained by dividing a complete video into several video segments.

In an embodiment of the present invention, the inference service model may be loaded in the processor. Specifically, at least one inference service container may be preset in the processor, and the inference service model may be loaded in the inference service container.

In the embodiment of the present invention, the processor may be a CPU (central processing unit), a GPU (graphics processing unit), and the like, which is not limited in this respect.

In the embodiment of the present invention, the inference service model may obtain the set of objects to be inferred, and return a corresponding processing result for each object to be inferred in the set of objects to be inferred.

In the embodiment of the present invention, the operation efficiency of the inference service model may be influenced by the number of objects in the set of objects to be inferred. When the number of the objects in the set of the objects to be inferred changes, the operating efficiency parameter generated by the inference service model in the operating process can change correspondingly, so that in order to improve the operating efficiency of the inference service model, when the workload of the inference service model changes, the optimal target object number of the inference service model under the current workload needs to be determined, and when the set of the objects to be inferred with the target object number is input to the inference service model, the inference service model can operate under the optimal operating efficiency parameter, so that better operating efficiency is maintained.

In the embodiment of the present invention, in order to determine the number of target objects, the inference service model may obtain a plurality of preset sets of objects to be inferred, where the number of objects included in different sets of objects to be inferred is different.

Step 102, inputting the set of objects to be inferred to a preset inference service model in sequence, wherein the input sequence to the inference service model is to input the set of objects to be inferred with the least number of objects;

in the embodiment of the present invention, the operation efficiency of the inference service model may gradually increase with the increase of the number of objects in the set of objects to be inferred, and then, after a certain number of objects is reached, the operation efficiency may not be continuously increased, but may decrease. Therefore, the set of objects to be inferred can be sequentially input to the preset inference service model according to the sequence of firstly inputting the set of objects to be inferred with the least number of objects.

103, acquiring an operation efficiency parameter of the inference service model in a state of processing each object set to be inferred;

in this embodiment of the present invention, the operation efficiency parameter may be a parameter for evaluating the operation efficiency of the inference service model. As an example of the present invention, the operation efficiency parameter may include processor resources occupied by the inference service at runtime, and consumed time for the inference service to process the set of objects to be inferred.

In the embodiment of the invention, the number of the objects in the object set to be inferred can influence the operation efficiency of the inference service. For example, as the number of objects increases, the processor resources consumed by the inference service to process the set of objects to be inferred, and the time consumed to process the set of objects to be inferred, may increase. In order to determine the operation efficiency of the inference service, the operation efficiency parameters of the inference service in the state of processing the set of objects to be inferred of each object number can be obtained.

104, configuring the number of objects of an object set to be inferred corresponding to a target operation efficiency parameter into the number of target objects, wherein the target operation efficiency parameter is the optimal parameter in each operation efficiency parameter;

in the embodiment of the invention, the change of the number of the objects in the object set to be inferred can affect the change of the operation efficiency parameter of the inference service model. For example, when the number of objects increases, the number of processor resources consumed by the inference service model to process the set of objects to be inferred may increase, and the number of objects to be inferred processed by the inference service model per unit time may also increase. And the increase speed of the number of the objects to be inferred processed in unit time by the inference service model is different from the increase speed of the processor resources consumed by the inference service model for processing the set of the objects to be inferred. Therefore, when the set of objects to be inferred including the number of the objects is input to the inference service model, the inference service model can adopt less processor resources, namely, can process more objects to be inferred, and the inference service model can also adopt more processor resources, and the number of the processed objects to be inferred is not obviously increased.

Therefore, the set of objects to be inferred with different object quantities can be input into the inference service model, and the operation efficiency parameters of the inference service model when the set of objects to be inferred with different object quantities is processed are obtained. Comparing the operation efficiency parameters corresponding to different object quantities, finding the optimal parameter of the inference service model with the highest operation efficiency from the operation efficiency parameters, and taking the optimal parameter as a target operation efficiency parameter.

In the embodiment of the present invention, the number of objects corresponding to the target operation efficiency parameter may be configured as a target object number. When the inference service model inputs the set of objects to be inferred including the number of the target objects, the inference service model can be expected to be in an optimal state, and the processing speed of the objects to be inferred can be faster by using smaller processing resources.

Step 105, inputting a set of objects to be inferred comprising the number of the target objects to the inference service model.

The operation efficiency of the processor can be the highest through the number of the target objects acquired in the process, so that when the subsequent processor processes the inference service through the preset inference service model, the operation efficiency of the processor can be in the optimal state through inputting the inference object set comprising the number of the target objects into the inference server model.

In the embodiment of the invention, under the condition that the set of the objects to be inferred including the target object number is input to the inference service model, the inference service model can utilize smaller processing resources to obtain faster processing speed of the objects to be inferred, so that the inference service model can have better operation efficiency. Therefore, after the number of the target objects is determined, the number of the target objects adopted by the set of the objects to be inferred can be kept unchanged, the set of the objects to be inferred is input to the inference service model, the inference service model is kept under better operation efficiency, processor resources consumed by the inference service model are saved, and the processing speed of the objects to be inferred is higher.

As an example of the present invention, the operation efficiency parameter includes a delay value, a throughput, and a processor utilization, when the number of the objects to be inferred is 10, the delay value is 10ms, the throughput is 10, and the processor utilization is 0.1. When the number of the objects to be inferred is 20, the delay value is 11ms, the throughput is 20, and the utilization rate of the processor is 0.2. It can be seen that when the number of the objects to be inferred is 20, the delay value is increased by 1ms, the throughput is increased by 10, and the processor utilization is increased by 0.1, compared to when the number of the objects to be inferred is 10. In case the delay value increases lower and the processor utilization increases well, the throughput has a relatively significant increase, whereby the number of objects to be inferred is 20 as the target number of objects.

In an embodiment of the present invention, the step of sequentially inputting a plurality of object numbers of the set of objects to be inferred to a preset inference service model includes:

s11, detecting the workload of the inference service model according to a first preset period;

in the embodiment of the invention, when the inference service model runs, the workload can be configured for the inference service model according to actual needs. The workload may be the number of requests the inference service model expects to process per unit time. The request may be a request sent by a user to the inference service model, and the request may include at least one object to be inferred. For example, when the inference service model is used for picture classification, a user may send a request including a picture, and request the inference service model to classify the picture included in the request.

In the embodiment of the present invention, the workload may be determined by the number of real-time requests sent by the current user according to actual needs, or by active configuration of the user, which is not limited in the present invention. For example, 100 requests per second are set to be processed according to the expected request throughput of the user using the inference service model. For another example, the inference service model receives the request amount to perform dynamic configuration, and when the daytime request amount is large, the request amount is set to be 100 requests per second.

In the embodiment of the present invention, when the workload changes, the number of requests that the inference service model needs to process in a unit time may change, and at this time, the inference service model may reduce the operation efficiency of the inference service model due to reasons that the number of the target objects cannot be matched with the number of requests that the workload needs to process, or processor resources occupied by other program objects in the processor are changed due to the change of the workload, and the like. Thus, a target object quantity can be redetermined when the workload of the inference service model changes. Specifically, a load detection module may be provided, and the load detection module may be configured to detect a workload of the inference service model according to a first preset period, so as to determine whether the number of the target objects needs to be adjusted.

In the embodiment of the present invention, the first preset period may be 1min, 1h, 1 day, and the like, which is not limited in the present invention.

And S12, when the workload changes, sequentially inputting a plurality of object-quantity object sets to be inferred to a preset inference service model.

In the embodiment of the present invention, when the workload changes, the number of target objects for which the inference service model can obtain the optimal operation efficiency under the current workload may be re-determined. Therefore, the set of objects to be inferred with a plurality of object numbers can be input to the preset inference service model in turn again, the steps 101 to 105 are continuously executed, the number of new target objects is determined, the inference service model can be kept under good operation efficiency, processor resources consumed by the inference service model are saved, and the processing speed of the objects to be inferred is higher.

In an embodiment of the present invention, the step of configuring the number of objects in the object set to be inferred, which corresponds to the target operation efficiency parameter, as the number of target objects includes:

and S21, configuring the number of the objects of the object set to be inferred corresponding to the target operation efficiency parameter into the number of the target objects according to a second preset period.

In the embodiment of the invention, the operation efficiency of the inference service model can be influenced due to frequent change of the number of the objects to be inferred. Therefore, when the number of the target objects needs to be determined again, a plurality of object sets to be inferred of the number of the objects can be sequentially input to a preset inference service model in the second preset period, and the operation efficiency parameters of the inference service model for processing the plurality of object sets to be inferred are obtained. And then when the second preset period is finished, determining an optimal target operation efficiency parameter in the operation efficiency parameters, and configuring the number of objects corresponding to the target operation efficiency parameter as the number of target objects, so as to avoid frequent change of the target object number data. And when the number of the target objects is not required to be determined, keeping the number of the target objects unchanged, and operating the inference service model. In an embodiment of the present invention, the second preset period may be 10min, 1h, 1 day, and the like, which is not limited in the present invention.

As an example of the present invention, the second preset period is 1h, when the number of the target objects needs to be re-determined, the number of the target objects may be determined within 1h, and the number of the objects corresponding to the target operation efficiency parameter is configured as the number of the target objects at 1h, and the number of the target objects is adopted, and an object set to be inferred is input to the inference service model, so that the re-determination of the number of the target objects is completed according to the second preset period.

In one embodiment of the invention, the operating efficiency parameters include latency values, throughput, and processor utilization;

in an embodiment of the present invention, the operating efficiency parameters may include latency values, throughput, and processor utilization. The delay value may be the average length of time that the inference service model will consume to process a single object to be inferred. The throughput may be the number of average objects to be inferred that the inference service model processes through in a unit time. The processor utilization may be an average utilization of the processors occupied by the inference service model at runtime. The delay values, the throughput, and the processor utilization may be employed to evaluate the operational efficiency of the inference service model.

In this embodiment of the present invention, a change in the number of objects in the set of objects to be inferred may affect a change in the operating efficiency parameter of the inference service model. Therefore, the optimal operation efficiency of the inference service model can be determined by adopting the delay value, the throughput and the processor utilization rate generated by the inference service model when the inference service model processes the set of objects to be inferred with different object numbers. The operation efficiency parameter of the inference service model at the optimal operation efficiency can be an optimal parameter, and the number of objects of the object set to be inferred corresponding to the target operation efficiency parameter is configured as the number of the target objects.

In particular implementations, for the inference service model, the latency values, throughput, and processor utilization typically increase as the number of objects of the set of objects to be inferred increases. But the three do not grow at the same rate.

The increase of the delay value may represent a longer average time period that the inference service model needs to consume to process a single object to be inferred. Therefore, it may be desirable for the inference service model to have a lower latency value.

The increase in throughput may indicate that the number of average objects to be inferred that are processed by the inference service model in a unit time increases, and therefore, the inference service model may be expected to have a higher throughput.

The increase in the processor utilization rate may indicate that the average utilization rate of the processor occupied by the inference service model during running is increased, and the present application expects to utilize the computing resources of the processor more fully, so the present application expects to have a higher processor utilization rate.

Therefore, the delay value, the throughput and the processor utilization rate can be compared with the acceleration among the three when the number of the objects of the object set to be inferred is increased. In general, for the delay value, as the number of objects increases, the delay value may be slightly increased first, and until a certain number of objects is reached, the delay value may be greatly increased. For throughput, as the number of objects increases, the throughput may increase until a certain number of objects is reached, and the throughput cannot continue to increase. With respect to processor utilization, as the number of objects increases, the processor utilization may have a relatively uniform increase.

Thus, as the number of objects increases, the speed increase of the delay value may be changed from being lower than the throughput speed increase and/or the processor utilization speed increase to being larger than the throughput speed increase and/or the processor utilization speed increase. At this time, it is considered that the number of objects is continuously increased, and better operation efficiency cannot be obtained. That is, when the increase rate of the delay value is changed from being lower than the throughput increase rate and/or the processor utilization increase rate to being larger than the throughput increase rate and/or the processor utilization increase rate, the inference service model is at an optimal operation efficiency, the operation efficiency parameter is an optimal parameter, and the number of objects is a target number of objects.

In an embodiment of the present invention, configuring the number of objects in the object set to be inferred, which corresponds to the target operation efficiency parameter, into the target number of objects includes:

s31, calculating the delay value acceleration G between the delay values of two object sets to be inferred with adjacent values of the number of the objects by adopting the following formula_L；

in the embodiment of the invention, the delay value acceleration G between the delay values of two object sets to be inferred with adjacent values of the number of the objects can be calculated_LThereby knowing the speed increase of the delay value when the number of objects increases.

In the embodiment of the invention, the delay value acceleration G can be calculated in sequence according to the sequence of two to-be-inferred object sets with the least number of objects to be calculated first_L。

S32, calculating the throughput speed increase G between the throughputs of two object sets to be inferred with the number values of the objects adjacent by adopting the following formula_T；

in the embodiment of the invention, the throughput speed G between the throughputs of two object sets to be inferred with adjacent values of the number of the objects can be calculated_TThereby learning the speed-up of the throughput when the number of objects increases.

In the embodiment of the invention, the object with the least number can be calculated firstlyThe sequence of the two object sets to be inferred calculates the throughput speed increase G in turn_T。

S33, calculating a processor utilization rate acceleration G between the processor utilization rates of two object sets to be inferred with adjacent values of the object number by adopting the following formula_U；

in the embodiment of the invention, the processor utilization rate acceleration G between the processor utilization rates of two object sets to be inferred with adjacent numerical values of the number of the objects can be calculated_UThus, the processor utilization rate is accelerated when the number of objects is increased.

In the embodiment of the invention, the processor utilization rate increasing G can be calculated in sequence according to the sequence of the two to-be-inferred object sets with the least number of the first calculated objects_U。

S34, comparing the delay value acceleration rate, the throughput acceleration rate and the processor utilization rate acceleration rate corresponding to the two object sets to be inferred in sequence, and determining whether the delay value acceleration rate is greater than the throughput acceleration rate and/or whether the delay value acceleration rate is greater than the processor utilization rate acceleration rate; the comparison sequence is that two object sets to be inferred with the least number of objects are compared firstly;

in the embodiment of the present invention, when the number of the objects in the set of objects to be inferred increases, the speed increases among the delay value, the throughput, and the processor utilization rate can be compared. In general, for the delay value, as the number of objects increases, the delay value may be slightly increased first, and until a certain number of objects is reached, the delay value may be greatly increased. For throughput, as the number of objects increases, the throughput may increase until a certain number of objects is reached, and the throughput cannot continue to increase. With respect to processor utilization, as the number of objects increases, the processor utilization may have a relatively uniform increase. Thus, as the number of objects increases, the speed increase of the delay value may be changed from being lower than the throughput speed increase and/or the processor utilization speed increase to being larger than the throughput speed increase and/or the processor utilization speed increase.

In the embodiment of the present invention, according to the sequence of first comparing two sets of objects to be inferred, which include the least number of objects, the delay value acceleration rate, the throughput acceleration rate, and the processor utilization acceleration rate corresponding to the two sets of objects to be inferred are sequentially compared, and it is determined whether the delay value acceleration rate is greater than the throughput acceleration rate, and/or whether the delay value acceleration rate is greater than the processor utilization acceleration rate, so as to find the inflection point of acceleration change.

And S35, when the delay value is increased more than the throughput rate, and/or the delay value is increased more than the processor utilization rate, configuring the operation efficiency parameter corresponding to the object set to be inferred with less number of objects in the two object sets to be inferred corresponding to the delay value, the throughput rate and the processor utilization rate, as the target operation efficiency parameter, and configuring the object number of the object set to be inferred with less number of objects as the target object number.

In the embodiment of the present invention, when the delay value increase rate is greater than the throughput increase rate, and/or when the delay value increase rate is greater than the processor utilization increase rate, it may be considered that the number of objects is continuously increased and a better operation efficiency cannot be obtained, and at this time, the inference service model is at an optimal operation efficiency.

Therefore, in the two to-be-inferred object sets corresponding to the delay value acceleration, the throughput acceleration and the processor utilization rate acceleration, the operation efficiency parameter corresponding to the to-be-inferred object set with a smaller number of objects, namely the optimal operation efficiency parameter, is configured as the target operation efficiency parameter. And configuring the number of the target objects corresponding to the set of the objects to be inferred with less number of the objects, namely the optimal number of the objects.

As an example of the present invention, when the number of objects in the set of objects to be inferred is 10, the delay value is 10ms, the throughput is 10, and the processor utilization rate is 0.1. When the number of the objects in the object set to be inferred is 20, the delay value is 11ms, the throughput is 20, and the utilization rate of the processor is 0.2. When the number of the objects in the object set to be inferred is 30, the delay value is 35ms, the throughput is 25, and the utilization rate of the processor is 0.4. According to the sequence of the two to-be-inferred object sets with fewer compared objects, comparing delay value acceleration, throughput acceleration and processor utilization acceleration corresponding to the two to-be-inferred object sets in sequence, and determining whether the delay value acceleration is greater than the throughput acceleration and/or whether the delay value acceleration is greater than the processor utilization acceleration.

It can be known from calculation that the delay value between the object set to be inferred with the number of objects being 10 and the object set to be inferred with the number of objects being 20 is increased by 10%, the throughput is increased by 100%, and the processor utilization rate is increased by 100%. The delay value between the object set to be inferred with the number of objects being 20 and the object set to be inferred with the number of objects being 30 is increased by 120%, the throughput is increased by 25%, and the processor utilization rate is increased by 100%. The delay value is increased in speed from being smaller than the throughput speed increase and being smaller than the processor utilization rate speed increase to being larger than the throughput speed increase and being larger than the processor utilization rate speed increase. Then, the operation efficiency parameter corresponding to the set of objects to be inferred with the smaller number of objects in the set of objects to be inferred with the number of objects being 20 and the set of objects to be inferred with the number of objects being 30 may be configured as the target operation efficiency parameter, and the number of objects 20 in the set of objects to be inferred with the smaller number of objects may be configured as the target number of objects.

s41, generating a fitting parameter f corresponding to each object set to be inferred by adopting the following formula:

in the embodiment of the present invention, a fitting parameter may be generated by using the delay value, the throughput, and the processor utilization, and the number of the target objects may be determined based on the fitting parameter.

In the embodiment of the present invention, the operation efficiency of the inference service model may be evaluated based on the delay value, the throughput, and the processor utilization, and the number of the target objects with the optimal operation efficiency may be determined, so that a fitting parameter may be generated based on the delay value, the throughput, and the processor utilization, and the fitting parameter may be used to evaluate the operation efficiency of the inference service model.

In the embodiment of the present invention, the inference service model may expect to obtain better throughput and appropriate processor utilization rate at a lower delay value, so that the delay value may be compared with the throughput and the processor utilization rate to obtain a calculation formula of the fitting parameter f.

S42, establishing a coordinate system and generating a fitting curve by taking the number of the objects in the object set to be inferred as an x axis and the fitting parameter f of the object set to be inferred as a y axis;

in the embodiment of the invention, in a plane coordinate system, curve fitting can be performed by taking the number of the objects in the object set to be inferred as the value of the X axis and the fitting parameter as the value of the Y axis, so as to obtain a fitting curve. The fitted curve may be used to represent a relationship between the fitted parameters and the number of objects.

S43, determining a first derivative of a point corresponding to the object set to be inferred in the fitting curve;

in the embodiment of the present invention, a first derivative of a point corresponding to the set of objects to be inferred in the fitted curve may be determined. The first derivative may be used to indicate a relationship between the delay value, the throughput, and the processor utilization. When the first derivative is a negative number, the speed increase of the delay value may be considered to be smaller than the throughput speed increase and the processor utilization rate speed increase, and when the first derivative is a positive number, the speed increase of the delay value may be considered to be larger than the throughput speed increase and the processor utilization rate speed increase.

S44, when the absolute value of the first derivative is smaller than a preset threshold, determining the operation efficiency parameter of the object set to be inferred corresponding to the first derivative as a target operation efficiency parameter, and configuring the number of the objects of the object set to be inferred as the number of the target objects.

In the embodiment of the present invention, a preset threshold may be set, and when the absolute value of the first-order derivative is smaller than the preset threshold, it may be considered that the speed increase of the delay value is close to the throughput speed increase and the processor utilization rate increase, in this case, the delay value, the throughput, and the processor utilization rate corresponding to the number of objects may be considered to have a lower delay value, and at the same time, have a higher throughput and an appropriate processor utilization rate, and the operation efficiency of the inference service model is optimal. At this point, if the number of objects continues to increase, the delay value may be significantly increased and the throughput and the processor utilization may not be significantly increased. If the number of objects continues to be reduced, the throughput and the processor utilization may be significantly reduced and the delay value may not be significantly reduced. Therefore, the operation efficiency parameter of the object set to be inferred corresponding to the first derivative can be determined as a target operation efficiency parameter, and the number of objects in the object set to be inferred is configured as the number of target objects. Under the condition of the target object quantity, the inference service model can obtain higher throughput and proper processor utilization rate under the condition of lower delay value, so that the operation efficiency of the inference service model is improved.

Referring to fig. 2, a block diagram of an embodiment of an operating apparatus of an inference service model according to the present invention is shown, which may specifically include the following steps:

In an embodiment of the present invention, the operation efficiency parameter includes: delay values, throughput, and processor utilization.

In one embodiment of the invention, the configuration module comprises:

Wherein L is₁To be inferred for twoIn the object set, the delay value corresponding to the object set to be inferred with fewer objects; l is₂The delay value corresponding to the object set to be inferred with more objects in the two object sets to be inferred;

a throughput speed-up calculation submodule for calculating a throughput speed-up G between throughputs of two to-be-inferred object sets whose numerical values are adjacent to each other by using the following formula_T；

In one embodiment of the invention, the configuration module comprises:

The embodiment of the present invention further provides an electronic device, as shown in fig. 3, which includes a processor 301, a communication interface 302, a memory 303, and a communication bus 304, where the processor 301, the communication interface 302, and the memory 303 complete mutual communication through the communication bus 304,

a memory 303 for storing a computer program;

the processor 301, when executing the program stored in the memory 303, implements the following steps:

calculating two values of the number of the objects to be pushed which are adjacent by adopting the following formulaThroughput speed up G between throughputs of physical object sets_T；

In an embodiment of the present invention, configuring the number of objects in the object set to be inferred, which corresponds to the target operation efficiency parameter, as the number of target objects includes:

The communication bus mentioned in the above terminal may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The communication bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown, but this does not mean that there is only one bus or one type of bus.

The communication interface is used for communication between the terminal and other equipment.

The Memory may include a Random Access Memory (RAM), or may also include a non-volatile Memory (non-volatile Memory), such as at least one disk Memory. Optionally, the memory may also be at least one memory device located remotely from the processor.

The Processor may be a general-purpose Processor, and includes a Central Processing Unit (CPU), a Network Processor (NP), and the like; the integrated Circuit may also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete gate or transistor logic device, or discrete hardware components.

In yet another embodiment of the present invention, a computer-readable storage medium is further provided, which has instructions stored therein, which when executed on a computer, cause the computer to execute the method for operating an inference service as described in any of the above embodiments.

In a further embodiment provided by the present invention, there is also provided a computer program product containing instructions which, when run on a computer, cause the computer to perform the method of operating an inference service as described in any of the above embodiments.

In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the invention to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, from one website site, computer, server, or data center to another website site, computer, server, or data center via wired (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that incorporates one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.

It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

All the embodiments in the present specification are described in a related manner, and the same and similar parts among the embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the system embodiment, since it is substantially similar to the method embodiment, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiment.

The above description is only for the preferred embodiment of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention shall fall within the protection scope of the present invention.

Claims

1. A method of operating an inference service model, comprising:

2. The method of claim 1, wherein the operating efficiency parameter comprises: delay values, throughput, and processor utilization.

3. The method according to claim 2, wherein configuring the number of objects in the set of objects to be inferred corresponding to the target operating efficiency parameter as the number of target objects comprises:

calculating the numerical phase of the number of objects by using the following formulaThroughput speed-up G between throughputs of two adjacent sets of objects to be inferred_T；

4. The method according to claim 2, wherein configuring the number of objects in the set of objects to be inferred corresponding to the target operating efficiency parameter as the number of target objects comprises:

5. An apparatus for operating an inference service model, comprising:

and the second input module is used for inputting the set of the objects to be inferred containing the number of the target objects into the inference service model.

6. The apparatus of claim 5, wherein the operating efficiency parameter comprises: delay values, throughput, and processor utilization.

7. The apparatus of claim 6, wherein the configuration module comprises:

Wherein, T₁The throughput corresponding to the object set to be inferred with less objects in the two object sets to be inferred; t is₂Is two to be pushedIn the object management set, the throughput corresponding to the object set to be inferred with more objects is obtained;

8. The apparatus of claim 6, wherein the configuration module comprises:

9. An electronic device is characterized by comprising a processor, a communication interface, a memory and a communication bus, wherein the processor and the communication interface are used for realizing mutual communication by the memory through the communication bus;

a memory for storing a computer program;

a processor for implementing the method steps of operating the inference service according to any of claims 1-4 when executing a program stored in a memory.

10. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out a method for operating an inference service according to any one of claims 1-4.