CN117391624A

CN117391624A - Service execution method and device, storage medium and electronic equipment

Info

Publication number: CN117391624A
Application number: CN202311348012.9A
Authority: CN
Inventors: 唐波; 於光中; 毛尚勤; 谢乾龙; 王兴星
Original assignee: Beijing Sankuai Network Technology Co ltd
Current assignee: Beijing Sankuai Network Technology Co ltd
Priority date: 2023-10-17
Filing date: 2023-10-17
Publication date: 2024-01-12

Abstract

The specification discloses a service execution method, a device, a storage medium and electronic equipment. In the service execution method provided in the present specification, a current state of a target service is obtained; determining a target action of executing a target service in a current state according to the current state and a predetermined service execution strategy, wherein the target action is used for characterizing resources required for executing the target service, the service execution strategy is obtained by optimizing a strategy to be optimized, the confidence value is obtained according to the mean value and the variance of the estimated resources, the estimated resources and the estimated benefits are obtained according to actions corresponding to each state of the target service, the actions corresponding to each state are obtained according to each state and the strategy to be optimized, and each state is contained in a state space of the target service; and executing the target service by adopting the target action.

Description

Service execution method and device, storage medium and electronic equipment

Technical Field

The present disclosure relates to the field of computer technologies, and in particular, to a service execution method, a device, a storage medium, and an electronic apparatus.

Background

In the scenario of executing part of the business, the executor needs to pay out certain resources to achieve the goal. In the execution of this type of service, the most considered problem is how to obtain the greatest benefit under the limited resources. In general, this class of problems is referred to as constrained optimization problems.

For example, in a scenario where the platform is delivering multimedia, a merchant may deliver multimedia information to a user in the platform through a bid to obtain a corresponding benefit. For platforms, how to strive to the maximum benefit for merchants with a limited bid amount is a constrained optimization problem where the total bid amount for delivering multimedia by the merchant has been determined.

Currently, such algorithms typically employ a way to introduce lagrangian multipliers to transform the original problem into a dual problem for solution. However, since existing algorithms are very sensitive to the estimation of resource consumption, inaccurate resource estimation can lead to lagrangian multiplier errors, and minor errors can also cause significant deviations in policy optimization.

Therefore, how to optimize the strategy more efficiently and stably in the constrained optimization problem to achieve the best service execution effect is a problem to be solved urgently.

Disclosure of Invention

The present disclosure provides a service execution method, apparatus, storage medium, and electronic device, so as to at least partially solve the foregoing problems in the prior art.

The technical scheme adopted in the specification is as follows:

the present specification provides a service execution method, including:

acquiring the current state of a target service;

determining a target action for executing the target service in the current state according to the current state and a predetermined service execution strategy, wherein the target action is used for representing resources required for executing the target service, the service execution strategy is obtained by optimizing a strategy to be optimized according to the mean value and the variance of the estimated resources, the estimated resources and the estimated benefits are obtained according to actions corresponding to states of the target service, the actions corresponding to the states are obtained according to the states and the strategy to be optimized, and the states are contained in a state space of the target service;

and executing the target service by adopting the target action.

Optionally, the target service is a multimedia delivery service, the state at least includes a user portrait and a business portrait, and the action is used for representing a bid of the business when delivering multimedia information to the user.

Optionally, determining the estimated resource and the estimated profit according to the states and the actions corresponding to the states specifically includes:

and inputting the actions of each state and the corresponding states into a pre-trained analysis model to obtain estimated resources and estimated benefits output by the analysis model.

Optionally, at least more than two analysis models exist, and the structure and/or parameters of each analysis model are different;

the method for determining the mean value and the variance of the estimated resources specifically comprises the following steps:

inputting the actions of each state and the corresponding states into each analysis model to obtain independent estimated resources output by the analysis model;

and determining the average of the independent estimated resources as the average of the estimated resources, and determining the variance of the independent estimated resources as the variance of the estimated resources.

Optionally, pre-training the analysis model specifically includes:

acquiring a sample state and a sample action corresponding to the sample state, and acquiring a labeling resource and a labeling profit of the sample state;

Inputting the actions corresponding to the sample states and the sample states into an analysis model to be trained, and obtaining resources to be optimized and benefits to be optimized which are output by the analysis model;

and training the analysis model by taking the minimum difference between the resources to be optimized and the labeling resources and the minimum difference between the benefits to be optimized and the labeling benefits as an optimization target.

Optionally, optimizing the policy to be optimized with the upper confidence value not greater than the constraint value and the estimated gain being the maximum optimization target specifically includes:

determining a difference between the upper confidence value and the constraint value as a dual difference;

initializing a Lagrangian multiplier, and adjusting the dual difference by adopting the Lagrangian multiplier to obtain dual resources;

and optimizing the strategy to be optimized and the Lagrangian multiplier by taking the maximum difference between the estimated gain and the dual resource as an optimization target.

Optionally, before optimizing the strategy to be optimized and the lagrangian multiplier, the method further comprises:

determining a dominant term according to the dual difference, wherein the dominant term is positively correlated with the square of the dual difference;

And optimizing the strategy to be optimized and the Lagrangian multiplier by taking the maximum difference between the estimated gain and the dual resource as an optimization target, wherein the method specifically comprises the following steps:

and optimizing the strategy to be optimized and the Lagrangian multiplier by taking the maximum difference between the estimated gain and the sum of the dual resource and the dominant term as an optimization target.

The service execution device provided in the present specification, the device includes:

the acquisition module is used for acquiring the current state of the target service;

the determining module is used for determining a target action for executing the target service in the current state according to the current state and a predetermined service execution strategy, wherein the target action is used for representing resources required by executing the target service, the service execution strategy is obtained by optimizing a strategy to be optimized according to the mean value and the variance of the estimated resources, the estimated resources and the estimated benefits are obtained according to actions corresponding to states of the target service, the actions corresponding to the states are obtained according to the states and the strategy to be optimized, and the states are contained in a state space of the target service;

And the execution module is used for executing the target service by adopting the target action.

The present specification provides a computer-readable storage medium storing a computer program which, when executed by a processor, implements the above-described service execution method.

The present specification provides an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the above-mentioned service execution method when executing the program.

The above-mentioned at least one technical scheme that this specification adopted can reach following beneficial effect:

in the service execution method provided in the present specification, a current state of a target service is obtained; determining a target action for executing the target service in the current state according to the current state and a predetermined service execution strategy, wherein the target action is used for representing resources required for executing the target service, the service execution strategy is obtained by optimizing a strategy to be optimized according to the mean value and the variance of the estimated resources, the estimated resources and the estimated benefits are obtained according to actions corresponding to states of the target service, the actions corresponding to the states are obtained according to the states and the strategy to be optimized, and the states are contained in a state space of the target service; and executing the target service by adopting the target action.

When the service execution method provided by the specification is adopted to execute the target service with the constrained optimization problem, the target service can be executed according to the current state and the target action determined by the predetermined service execution strategy. The service execution strategy can be obtained by updating and optimizing by combining the conservative strategy optimization and the local strategy salinization which are proposed by the method under the solution of converting the original problem into the dual problem by introducing the Lagrangian multiplier. The method can solve the problem of underestimation of predicted consumed resources in dual problems through conservative strategy optimization, can be combined with local strategy salinization, and can modify an original target through a method of increasing Lagrangian so as to salify a neighborhood region of a local optimal strategy, thereby gradually reducing uncertainty of resource estimation in the region, and finally improving the overall effect of an algorithm to obtain a better service execution strategy.

Drawings

The accompanying drawings, which are included to provide a further understanding of the specification, illustrate and explain the exemplary embodiments of the present specification and their description, are not intended to limit the specification unduly. In the drawings:

Fig. 1 is a schematic flow chart of a service execution method in the present specification;

fig. 2 is a schematic diagram of a service execution device provided in the present specification;

fig. 3 is a schematic view of the electronic device corresponding to fig. 1 provided in the present specification.

Detailed Description

For the purposes of making the objects, technical solutions and advantages of the present specification more apparent, the technical solutions of the present specification will be clearly and completely described below with reference to specific embodiments of the present specification and corresponding drawings. It will be apparent that the described embodiments are only some, but not all, of the embodiments of the present specification. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are intended to be within the scope of the present application based on the embodiments herein.

The following describes in detail the technical solutions provided by the embodiments of the present specification with reference to the accompanying drawings.

Fig. 1 is a flow chart of a service execution method in the present specification, specifically including the following steps:

s100: the current state of the target service is obtained.

All steps in the service execution method provided in the present specification may be implemented by any electronic device having a computing function, for example, a terminal, a server, or the like.

The method is used for determining the target action according to the state of the target service and the service execution strategy, and executing the target service by adopting the target action. Based on this, the current state of the target traffic may be determined first in this step. The target service can be any service in a scene with constrained optimization problems.

Various common target services which can be used as an embodiment can exist, for example, taking the target service as a multimedia delivery service as an example, under the scene of multimedia delivery, a merchant can consume cost to deliver a multimedia message to a user through a platform so as to attract the user to purchase goods from the user and obtain corresponding benefits. Each single multimedia impression requires a cost to the merchant that is a bid for the merchant's information impression bit. In general, a merchant only needs to set a target to be achieved by multimedia delivery, such as a return on investment (Return on Investment), a click-through rate, a conversion rate, etc., on a platform, and the platform can intelligently help the merchant to plan cost through the multimedia delivery system in a subsequent preset period of time, that is, a delivery period, so as to complete multimedia delivery. The upper cost limit that can be used to make a multimedia delivery during a delivery period is predetermined by the merchant. For the platform, if the cost consumed by delivering the multimedia information reaches the upper cost limit preset by the merchant but cannot reach the target expected by the merchant, the experience of the merchant is poor, and the dissatisfaction of the merchant is caused, which is needed to be avoided by the platform. Thus, how the platform can help merchants to strive for maximum revenue under limited resources is a constrained optimization problem.

For another example, taking the target service as a cloud service, in a cloud service scenario, the platform may provide cloud computing resources to the user to meet the computing needs of the user in exchange for benefits. In this scenario, the total amount of resources that the platform can provide is an upper bound, and the computational requirements that the user puts forward each time are determined. Therefore, how a platform strives for maximum benefit for itself with limited cloud computing resources is a constrained optimization problem.

The state of the target service may refer to the relevant information of the target service in the execution process, and there may be a plurality of different states for any target service, and the relevant information contained in the states of the different target services are different. Still taking the multimedia delivery service as an example, the status of the multimedia delivery service may include, but is not limited to, time factors, user portraits, business portraits, and the like.

S102: determining a target action for executing the target service in the current state according to the current state and a predetermined service execution strategy, wherein the target action is used for representing resources required for executing the target service, the service execution strategy is obtained by optimizing a strategy to be optimized according to the mean value and the variance of the estimated resources, the estimated resources and the estimated benefits are obtained according to actions corresponding to states of the target service, the actions corresponding to the states are obtained according to the states and the strategy to be optimized, and the states are contained in a state space of the target service.

In this step, a target action for executing the target service may be determined according to the current state of the target service determined in step S100 and the service execution policy determined in advance, where the target action is used to characterize resources that need to be consumed when executing the target service; the service execution strategy is used for representing rules or modes of determining target actions when the target service is in different states. In other words, the service execution policy determines how much resources should be consumed to execute the target service in each state of the target service, respectively.

For example, assuming that the target service has five states S1, S2, S3, S4, and S5, the service execution policy may be expressed as { S1: c1 S2: c2 S3: c3 S4: c4 S5: c5}, meaning that in state S1, resource C1 is consumed to execute the target traffic, in state S2, resource C2 is consumed to execute the target traffic … …, and so on.

Taking a multimedia delivery scenario as an example, in the multimedia delivery scenario, the fields for delivering multimedia information to the user by the platform are limited, so that merchants need to compete for the fields for multimedia delivery in many cases. For each multimedia delivery field, more resources are consumed, i.e. a merchant with a higher bid can perform multimedia delivery on the field. Of course, in the competition stage, a single bid only represents the cost that the business is willing to pay, and only the business who obtains the field actually consumes the corresponding cost, and the business who does not obtain the field does not actually consume the cost. The cost that can be consumed by a business in a delivery period is limited, and if multimedia delivery is always performed with a high single bid, there is a high probability that the goals set by the business cannot be achieved due to insufficient delivery times. It is therefore important to have a reasonable service execution strategy, i.e. how to consume costs for multimedia delivery in each state. For example, if the preference of the active user matches the multimedia information put by the business in a certain period of time, then the higher bid can be considered to compete for the field of the multimedia put; conversely, if the preferences of the active user do not match the multimedia information placed by the merchant for a certain period of time, then it may be considered to compete or relinquish competition with a lower bid.

Likewise, in the cloud service scenario, the platform may choose to consume different cloud computing resources when meeting the user's needs, resulting in different benefits. For this scenario, the business state may include, but is not limited to, user portraits, user requirements, resource information, and the like; the business execution policy is to determine how to consume computing resources in each state to meet the needs of the user.

In the case where the service execution policy is very important in the constrained optimization problem, how to get a better service execution policy is the most critical point in the method. In the service execution method provided in the present specification, when determining a service execution policy, a state space and a constraint value of the target service may be specifically obtained; determining actions corresponding to all states according to all states contained in the state space and strategies to be optimized; determining estimated resources and estimated benefits according to the states and the actions corresponding to the states; determining an upper confidence value according to the mean value and the variance of the estimated resources; and optimizing the strategy to be optimized to obtain the service execution strategy by taking the upper confidence value not larger than the constraint value and the estimated gain as an optimization target at maximum.

Since the service execution policy is used to determine how much resources are consumed in each state of the target service to execute the target service, when determining the service execution policy, a state space of the target service needs to be acquired first. In the state space of the target service, all possible states of the target service are contained. The constraint value may be a constraint in executing the task, such as the total amount of resources, etc.

In the method, the service execution strategy is not generated in one step directly from scratch, but is finally determined after being continuously updated and optimized. Thus, in determining the service execution policy, there will be a policy to be optimized first. The service execution policy may be optimized for a plurality of times, that is, in the method, the process of determining the service execution policy may be performed for a plurality of times, and the policy to be optimized is continuously updated and optimized until a satisfactory service execution policy is finally obtained. When each round of executing the process, the policy to be optimized can be a service execution policy obtained after the previous round of executing, and in the initial round, the policy to be optimized can be generated by random initialization.

According to each state contained in the obtained state space and the strategy to be optimized, the action corresponding to each state in the process of the round of optimization, namely the resource required by executing the target service in each state, can be determined. On the basis, the estimated resources and the estimated benefits can be determined according to the states and the actions corresponding to the states. Wherein the estimated resources represent total resources estimated to be consumed after the target service of one preset period is completed, and the estimated benefits represent total benefits estimated to be obtained after the target service of one preset period is completed. In the process of optimizing the service execution strategy, the preset period can be represented by the number of times of completing the target service, or can be represented by a period of time, and can be specifically set according to the requirement, in the scene of actually executing the target service, the period can be generally set by an executor of the service, for example, in the multimedia delivery scene, a merchant can set how many times the multimedia delivery needs to be performed, or how long the multimedia delivery needs to be performed.

Still taking multimedia delivery services as an example, in this scenario, the service execution policy is a bidding policy that the platform helps businesses place intelligent bids while multimedia delivery. When the multimedia delivery system is not on line and is in the adjustment stage, the bidding strategy of the multimedia delivery system can be continuously optimized through testing. Every time a round of multimedia delivery is performed, that is, a preset period, the bidding strategy can be optimized and adjusted once according to the final result. In the multimedia delivery of a preset period, the constraint value is the total consumable cost preset by a merchant, the estimated resource is the total cost estimated to be consumed by the merchant after all the multimedia delivery is completed, and the estimated benefit is the total estimated benefit obtained by the merchant after all the multimedia delivery is completed. In each round of multimedia delivery, the corresponding action in each state, namely the single bid in each state, can be determined according to the current bidding strategy, so that the expected total cost and total income are obtained. And optimally adjusting the bidding strategy with the estimated total cost not exceeding the total cost preset by the merchant and the estimated maximum total income as an optimization target.

In the cloud service scenario, the service execution policy is a resource consumption policy of the platform when the user demand is satisfied. In a cloud service providing scene of a preset period, the constraint value is the total amount of cloud computing resources which can be provided by the platform, the estimated resources are the total amount of resources which are expected to be consumed by the platform after meeting the requirements of all users, and the estimated benefits are the total benefits expected to be obtained by the platform after meeting the requirements of all users. When the cloud service is provided, the cloud computing resource amount provided for the single user requirement in each state can be determined according to the current resource consumption strategy, so that the expected total resource consumption and the total income are obtained. And optimizing and adjusting the resource consumption strategy by taking the total resource consumption not exceeding the total amount of cloud computing resources available by the platform and the expected maximum total income as an optimization target.

There are various ways to determine the estimated resources and estimated benefits, and this description provides a specific embodiment for reference. Because the state of the target service is not fixed every time the target service is executed in a preset period, a neural network model can be adopted to learn the transition relation among the states in the target service scene, and the estimated resources and the estimated benefits are obtained by combining the actions corresponding to the states. Specifically, the actions of each state and the corresponding states can be input into a pre-trained analysis model, and estimated resources and estimated benefits output by the analysis model are obtained.

The structure of the analysis model can be set according to requirements, and the determination of the estimated resources and the estimated benefits can be completed only by ensuring, so that the method is not particularly limited in the specification. Additionally, according to the structure or training mode of the adopted analysis model, a preset period can be input into the analysis model, so that the analysis model predicts the estimated resources and the estimated benefits according to the actions of each state and the corresponding states and the preset period.

So far, according to the various data determined in the above, a more specific constrained optimization problem can be obtained. Here, by S represents a state space, s= { S ₀ ，s ₁ ，s ₂ … … }, a represents the motion space, a= { a ₀ ，a ₁ ，a ₂ … …, s of the same designation corresponds to a; pi represents the policy to be optimized. The constrained optimization problem described above can be defined using the following formula:

the above formula is referred to as formula (1) in the following of this specification. Where ρ represents the plateau resulting from the strategy to be optimized pi,representing estimated resources->Representing estimated profits, and d representing constraint values.

For this type of problem, the solution is typically performed by introducing a lagrange multiplier λ, converting the original problem into a dual problem. The method comprises the following steps:

the above formula is referred to as formula (2) in the following of this specification. In the process of updating and optimizing in multiple rounds, the formula is solved by alternately optimizing the strategy pi to be optimized and updating the Lagrangian multiplier lambda, so that a final service execution strategy is obtained.

Generally, the strategy to be optimized is optimized according to the formula given by the content, wherein the determined estimated resource is not larger than the constraint value, the estimated gain is maximized as an optimization target. However, it is considered that in constrained optimization, when the problem is solved by adopting a dual method, the problem of underestimating the resources required for executing the target service is easy to occur. Under the condition of low estimated resources, the feasibility boundary of the strategy to be optimized exceeds a feasible space in the optimization process, so that a very aggressive service execution strategy is generated, and finally constraint violation is caused. For the problem, the method adopts the upper confidence boundary mode of the resource to encourage overestimation of the resource, so that the strategy to be optimized generates a conservative feasibility boundary, the feasible space of the strategy to be optimized is reduced, and the constraint satisfaction is improved.

In the method, the upper confidence value of the resource can be determined according to the mean value and the variance of the estimated resource. When solving the dual form of the constrained optimization problem using equation (2), the strategy to be optimized will estimate the estimated resources in a way that minimizes the resource consumption when the lagrangian multiplier is greater than 0. Assuming that the distribution of consumed resources is gaussian with noise and zero mean each time the target service is executed, the zero mean attribute cannot be maintained under the operation of the minimization function, and then the estimated resources are smaller than the actually consumed resources.

In this approach it is proposed to solve the above problems in a conservative policy optimization way. In the method, the upper confidence value of the determined resource is adoptedTo replace the estimated resource +.>In this process, the upper confidence value may be calculated according to the following formula:

the above formula is referred to as formula (3) in the following of this specification. Wherein,representing the upper confidence value, N representing the determined number of estimated resources, +.>And (3) representing the determined ith estimated resource, wherein k represents the influence weight of the variance.

Obviously, in the above manner, a plurality of estimated resources need to be determined. There are a variety of ways to determine the plurality of predicted resources, and this specification provides a specific embodiment for reference. In the embodiment provided in the method, under the thought of determining the estimated resources and the estimated benefits by adopting the analysis model, a plurality of independent estimated resources can be predicted by adopting a plurality of analysis models. Specifically, for each analysis model, the actions of each state and the corresponding states can be input into the analysis model to obtain independent estimated resources output by the analysis model; and determining the average of the independent estimated resources as the average of the estimated resources, and determining the variance of the independent estimated resources as the variance of the estimated resources. The analysis models can obtain analysis models with different structures and/or parameters through different designs or different training so as to obtain different prediction results.

Wherein each analytical model may be trained in advance. Specifically, a sample state and a sample action corresponding to the sample state can be obtained, and a labeling resource and a labeling profit of the sample state are obtained; inputting the actions corresponding to the sample states and the sample states into an analysis model to be trained, and obtaining resources to be optimized and benefits to be optimized which are output by the analysis model; and training the analysis model by taking the minimum difference between the resources to be optimized and the labeling resources and the minimum difference between the benefits to be optimized and the labeling benefits as an optimization target.

The labeling resources and labeling benefits are resources and benefits actually consumed after target business is executed by adopting sample states and sample actions corresponding to the sample states in a preset period. The sample state, the sample action corresponding to the sample state, and the labeling resource and the labeling profit of the sample state can be obtained according to the historically executed data of the target service. Where multiple analytical models are present, each analytical model may be trained separately in the manner described above. When the analysis models with the same structure exist, different training samples are adopted for training, so that the analysis models with different parameters can be obtained.

After the upper confidence value of the resource is obtained, the upper confidence value can be used for replacing the estimated resource in the constrained optimization problem, namely, the upper confidence value is not larger than the constraint value, the estimated income is maximized as an optimization target, and the strategy to be optimized is optimized. It should be noted that, when solving the dual problem, the lagrangian multipliers also need to be updated together in the process of optimizing the strategy to be optimized.

At this time, the dual problem corresponding to the constrained optimization problem is solved, that is, when the strategy to be optimized is optimized, the difference between the upper confidence value and the constraint value may be specifically determined as the dual difference; initializing a Lagrangian multiplier, and adjusting the dual difference by adopting the Lagrangian multiplier to obtain dual resources; and optimizing the strategy to be optimized and the Lagrangian multiplier by taking the maximum difference between the estimated gain and the dual resource as an optimization target. The above can be expressed by the following formula:

the above formula is referred to as formula (4) in the following of this specification. The estimated resource in the formula (2) is estimatedSubstitution with the determined upper confidence value +.>The above formula (4) can be obtained.

Furthermore, in the process of optimizing the strategy to be optimized, since the dual method is very sensitive to the estimation of the resources, even a small error is caused by inaccurate resource estimation, the subsequent updating and the optimization generate larger deviation, and even the optimization is performed in the completely opposite direction. For example, when the real resource consumption is below the constraint value, but the predicted resource exceeds the constraint value, the Lagrangian multiplier will be updated to a direction exactly opposite to the real direction and become a misleading penalty term in the original target.

The method provides a way to solve the problems by adopting local strategy saliency. In the method, the original target is modified by an augmented Lagrangian method to highlight the neighborhood region of the local optimization strategy. By correcting the policy gradient, the local policy projection can stabilize policy learning in a local region, thereby gradually reducing uncertainty in resource estimation in this region.

In particular, a dominant term may be determined from the dual differences prior to optimizing the strategy to be optimized and the lagrangian multiplier, the dominant term being positively correlated with the square of the dual differences. At this time, the difference between the estimated gain and the sum of the dual resource and the dominant term may be the maximum optimization target, and the strategy to be optimized and the lagrangian multiplier may be optimized. The above method can be expressed by the following formula:

The above formula is referred to as formula (5) in the following of this specification. Wherein c is a super parameter greater than zero. In updating the optimization strategy to be optimized according to the above formula (5), for a sufficiently large c, the objective function will be dominated by the quadratic penalty term and therefore the neighborhood of the boundary solution will be convex. Wherein the boundary solution is understood as being inPi.

S104: and executing the target service by adopting the target action.

After the target action is obtained in step S102, execution of the target service can be completed by using the obtained target action in this step.

For example, in the multimedia delivery service, the target action is that the merchant performs the bid of the current multimedia delivery service, and the merchant can use the obtained bid to compete for the multimedia delivery field. When the competition is successful, the corresponding cost can be consumed to put multimedia information in the column; when the competition fails, no cost is consumed.

For another example, in a cloud service scenario, the target action is cloud computing resources consumed by the platform when one user demand is met. Unlike multimedia delivery scenarios, there is no competing relationship in the cloud service scenario, and the platform must consume corresponding computing resources when meeting user needs.

In this approach, conservative policy optimization may be used in combination with local policy projection. The local strategy salification can stabilize and concentrate the strategy to be optimized to enable the strategy to be close to a local optimal solution, so that collected samples can be concentrated in a region conforming to the distribution of the local optimal strategy, and in turn, cognitive uncertainty in the local salified region can be eliminated. Along with the reduction of uncertainty, the independent estimated resources obtained by each analysis model can become more accurate, and the inconsistency of the independent estimated resources obtained by each analysis model which causes cognitive uncertainty can also be reduced, namely the variance of the estimated resources can be gradually close to zero. In this way, the conservation of the resource estimated due to the inconsistency of the prediction is also reduced, so that the conserved estimated boundary can be gradually pushed to the real boundary.

The above is a service execution method provided in the present specification, and based on the same concept, the present specification further provides a corresponding service execution device, as shown in fig. 2.

Fig. 2 is a schematic diagram of a service execution device provided in the present specification, which specifically includes:

an obtaining module 200, configured to obtain a current state of a target service;

a determining module 202, configured to determine, according to the current state and a predetermined service execution policy, a target action for executing the target service in the current state, where the target action is used to characterize a resource required for executing the target service, the service execution policy is that the above confidence value is not greater than a constraint value, and the estimated gain is maximum as an optimization target, and is obtained by optimizing a policy to be optimized, where the upper confidence value is obtained according to a mean and a variance of the estimated resource, the estimated resource and the estimated gain are obtained according to actions corresponding to states of the target service and to the states, and the actions corresponding to the states are obtained according to the states and the policy to be optimized, and the states are included in a state space of the target service;

and the executing module 204 is configured to execute the target service by using the target action.

Optionally, the apparatus further includes a pre-determining module 206, specifically configured to input the actions corresponding to the states and the states into a pre-trained analysis model, so as to obtain estimated resources and estimated benefits output by the analysis model.

the predetermined module 206 is specifically configured to input, for each analysis model, the states and actions corresponding to the states into the analysis model, so as to obtain independent estimated resources output by the analysis model; and determining the average of the independent estimated resources as the average of the estimated resources, and determining the variance of the independent estimated resources as the variance of the estimated resources.

Optionally, the predetermined module 206 is specifically configured to obtain a sample state and a sample action corresponding to the sample state, and obtain a labeling resource and a labeling profit of the sample state; inputting the actions corresponding to the sample states and the sample states into an analysis model to be trained, and obtaining resources to be optimized and benefits to be optimized which are output by the analysis model; and training the analysis model by taking the minimum difference between the resources to be optimized and the labeling resources and the minimum difference between the benefits to be optimized and the labeling benefits as an optimization target.

Optionally, the predetermined module 206 is specifically configured to determine a difference between the upper confidence value and the constraint value as a dual difference; initializing a Lagrangian multiplier, and adjusting the dual difference by adopting the Lagrangian multiplier to obtain dual resources; and optimizing the strategy to be optimized and the Lagrangian multiplier by taking the maximum difference between the estimated gain and the dual resource as an optimization target.

Optionally, the predetermined module 206 is specifically configured to determine a dominant term according to the dual difference, where the dominant term is positively correlated with the square of the dual difference; and optimizing the strategy to be optimized and the Lagrangian multiplier by taking the maximum difference between the estimated gain and the sum of the dual resource and the dominant term as an optimization target.

The present specification also provides a computer-readable storage medium storing a computer program operable to execute the service execution method provided in fig. 1 described above.

The present specification also provides a schematic structural diagram of the electronic device shown in fig. 3. At the hardware level, the electronic device includes a processor, an internal bus, a network interface, a memory, and a non-volatile storage, as described in fig. 3, although other hardware required by other services may be included. The processor reads the corresponding computer program from the nonvolatile memory into the memory and then runs the computer program to implement the service execution method described in fig. 1. Of course, other implementations, such as logic devices or combinations of hardware and software, are not excluded from the present description, that is, the execution subject of the following processing flows is not limited to each logic unit, but may be hardware or logic devices.

In the 90 s of the 20 th century, improvements to one technology could clearly be distinguished as improvements in hardware (e.g., improvements to circuit structures such as diodes, transistors, switches, etc.) or software (improvements to the process flow). However, with the development of technology, many improvements of the current method flows can be regarded as direct improvements of hardware circuit structures. Designers almost always obtain corresponding hardware circuit structures by programming improved method flows into hardware circuits. Therefore, an improvement of a method flow cannot be said to be realized by a hardware entity module. For example, a programmable logic device (Programmable Logic Device, PLD) (e.g., field programmable gate array (Field Programmable Gate Array, FPGA)) is an integrated circuit whose logic function is determined by the programming of the device by a user. A designer programs to "integrate" a digital system onto a PLD without requiring the chip manufacturer to design and fabricate application-specific integrated circuit chips. Moreover, nowadays, instead of manually manufacturing integrated circuit chips, such programming is mostly implemented by using "logic compiler" software, which is similar to the software compiler used in program development and writing, and the original code before the compiling is also written in a specific programming language, which is called hardware description language (Hardware Description Language, HDL), but not just one of the hdds, but a plurality of kinds, such as ABEL (Advanced Boolean Expression Language), AHDL (Altera Hardware Description Language), confluence, CUPL (Cornell University Programming Language), HDCal, JHDL (Java Hardware Description Language), lava, lola, myHDL, PALASM, RHDL (Ruby Hardware Description Language), etc., VHDL (Very-High-Speed Integrated Circuit Hardware Description Language) and Verilog are currently most commonly used. It will also be apparent to those skilled in the art that a hardware circuit implementing the logic method flow can be readily obtained by merely slightly programming the method flow into an integrated circuit using several of the hardware description languages described above.

The controller may be implemented in any suitable manner, for example, the controller may take the form of, for example, a microprocessor or processor and a computer readable medium storing computer readable program code (e.g., software or firmware) executable by the (micro) processor, logic gates, switches, application specific integrated circuits (Application Specific Integrated Circuit, ASIC), programmable logic controllers, and embedded microcontrollers, examples of which include, but are not limited to, the following microcontrollers: ARC 625D, atmel AT91SAM, microchip PIC18F26K20, and Silicone Labs C8051F320, the memory controller may also be implemented as part of the control logic of the memory. Those skilled in the art will also appreciate that, in addition to implementing the controller in a pure computer readable program code, it is well possible to implement the same functionality by logically programming the method steps such that the controller is in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers, etc. Such a controller may thus be regarded as a kind of hardware component, and means for performing various functions included therein may also be regarded as structures within the hardware component. Or even means for achieving the various functions may be regarded as either software modules implementing the methods or structures within hardware components.

The system, apparatus, module or unit set forth in the above embodiments may be implemented in particular by a computer chip or entity, or by a product having a certain function. One typical implementation is a computer. In particular, the computer may be, for example, a personal computer, a laptop computer, a cellular telephone, a camera phone, a smart phone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or a combination of any of these devices.

For convenience of description, the above devices are described as being functionally divided into various units, respectively. Of course, the functions of each element may be implemented in one or more software and/or hardware elements when implemented in the present specification.

It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

In one typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.

The memory may include volatile memory in a computer-readable medium, random Access Memory (RAM) and/or nonvolatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of computer-readable media.

Computer readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device. Computer-readable media, as defined herein, does not include transitory computer-readable media (transmission media), such as modulated data signals and carrier waves.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or apparatus that comprises the element.

It will be appreciated by those skilled in the art that embodiments of the present description may be provided as a method, system, or computer program product. Accordingly, the present specification may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present description can take the form of a computer program product on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.

The description may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The specification may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.

In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for system embodiments, since they are substantially similar to method embodiments, the description is relatively simple, as relevant to see a section of the description of method embodiments.

The foregoing is merely exemplary of the present disclosure and is not intended to limit the disclosure. Various modifications and alterations to this specification will become apparent to those skilled in the art. Any modifications, equivalent substitutions, improvements, or the like, which are within the spirit and principles of the present description, are intended to be included within the scope of the claims of the present application.

Claims

1. A service execution method, comprising:

acquiring the current state of a target service;

And executing the target service by adopting the target action.

2. The method of claim 1, wherein the target service is a multimedia delivery service, the status includes at least a user representation, a merchant representation, and the action is to characterize a bid of the merchant in delivering multimedia information to the user.

3. The method of claim 1, wherein determining the estimated resources and the estimated benefits based on the states and the actions corresponding to the states, comprises:

4. A method according to claim 3, wherein there are at least two or more analytical models, each of which differs in structure and/or parameters;

5. A method according to claim 3, characterized in that the pre-training of the analysis model comprises in particular:

6. The method of claim 1, wherein optimizing the policy to be optimized with the upper confidence value not greater than the constraint value and the estimated gain maximum as an optimization objective, specifically comprises:

7. The method of claim 6, wherein prior to optimizing the strategy to be optimized and the lagrangian multiplier, the method further comprises:

8. A service execution apparatus, comprising:

9. A computer-readable storage medium, characterized in that the storage medium stores a computer program which, when executed by a processor, implements the method of any of the preceding claims 1-7.

10. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the method of any of the preceding claims 1-7 when executing the program.