CN109636515A

CN109636515A - A kind of sale of electricity quotient intelligent agent Bidding system and device

Info

Publication number: CN109636515A
Application number: CN201811448384.8A
Authority: CN
Inventors: 王高琴; 张鹏程; 魏路平; 肖艳炜; 王颖; 张凯锋; 郑亚先; 史新红; 朱炳铨; 郭艳敏; 邵平; 程海花; 龙苏岩; 陈爱林; 徐骏; 吕建虎; 叶飞; 曾丹; 黄春波; 杨辰星
Original assignee: Southeast University; State Grid Zhejiang Electric Power Co Ltd; China Electric Power Research Institute Co Ltd CEPRI; State Grid Shanghai Electric Power Co Ltd
Current assignee: Southeast University; State Grid Zhejiang Electric Power Co Ltd; China Electric Power Research Institute Co Ltd CEPRI; State Grid Shanghai Electric Power Co Ltd
Priority date: 2018-11-30
Filing date: 2018-11-30
Publication date: 2019-04-16

Abstract

The present invention relates to a kind of sale of electricity quotient intelligent agent Bidding system and devices, which comprises S1. selects bidding strategies according to the select probability of bidding strategies each in bidding strategies set；S2. according to by the select probability of tendency each bidding strategies of coefficient update of selection bidding strategies corresponding bid income and unselected bidding strategies；S3. Optimal Bidding Strategies are obtained according to the select probability of bidding strategies each after update.Technical solution provided by the invention is able to reflect the decision behavior for concentrating different type sale of electricity company in trade at competitive price, reflects the Decision-making of Bidding behavior of different type sale of electricity quotient, further embodies the decision predisposition of different sale of electricity quotient in real market.

Description

A kind of sale of electricity quotient intelligent agent Bidding system and device

Technical field

The present invention relates to electricity market fields, and in particular to a kind of sale of electricity quotient intelligent agent Bidding system and device.

Background technique

With the propulsion of power market reform, on the basis of Generation Side competition, the competition of sale of electricity side is also being graduallyed relax control, greatly Amount sale of electricity company participates in market in succession.As new main market players, the bid of sale of electricity company bids behavior to entire electric power Risk management, market mode design, the trading rules formulation in market etc. propose some new challenges, and future will also more show It writes.Carry out power market simulation research, it is necessary first to the report for being how sale of electricity quotient complexity in effectively simulation market of solution Valence decision behavior, establish the sale of electricity Shang dynasty reason offer decision-making models, embody sale of electricity quotient bid behavior for market operation process shadow Ring effect.

In recent years, the behavior simulation of bidding for being concentrated mainly on generation side market member based on the modeling technique of agency is ground Study carefully, by market member being modeled as having the computer intelligence of certain learning decision ability act on behalf of, for particular market, according to Market rules construct market member bidding decision simulation model, by emulation experiment, assess market operating status, examine market rule Reasonability then.In comparison, it is less to act on behalf of tactics research for the behavior of bidding of sales market member, and it is specific to be based primarily upon certain Market mode study the bidding decision method based on forecasted electricity market price and Monte Carlo randomized optimization process, there are a large amount of hypothesis Premise, and for market member bid the factors such as target, market mode variation adaptability it is not strong.

Summary of the invention

In view of the deficiencies of the prior art, the purpose of the invention is to describe the sale of electricity quotient of differentiation target in real market Bid behavior is able to reflect and concentrates in trade at competitive price the present invention provides a kind of sale of electricity quotient intelligent agent Bidding system and device The decision behavior of different type sale of electricity company, reflects the Decision-making of Bidding behavior of different type sale of electricity quotient, and simulates as far as possible Actual market.

The purpose of the present invention is adopt the following technical solutions realization:

A kind of sale of electricity quotient intelligent agent Bidding system, it is improved in that the described method includes:

S1. bidding strategies are selected according to the select probability of bidding strategies each in bidding strategies set；

S2. according to each by the tendency coefficient update of selection bidding strategies corresponding bid income and unselected bidding strategies The select probability of bidding strategies；

S3. Optimal Bidding Strategies are obtained according to the select probability of bidding strategies each after update.

Preferably, in the step S1, the initial selected probability of each bidding strategies in bidding strategies set, as the following formula really It is fixed:

In above formula, p₁It (s) is the initial selected probability of s-th of bidding strategies in bidding strategies set, s ∈ [1, M], M are Bidding strategies sum.

Preferably, the step S1, comprising:

According to the select probability of each bidding strategies in the bidding strategies set, using roulette algorithm from bidding strategies collection Bidding strategies are selected in conjunction.

Preferably, the step S2, comprising:

The corresponding receipts of bidding of the bidding strategies are determined according to by the corresponding e-commerce operation target value of selling of selection bidding strategies Benefit；

It is updated according to the income of bidding by the tendency coefficient of selection bidding strategies；

The tendency coefficient of unselected bidding strategies is updated according to the forgetting factor of unselected bidding strategies；

According to the select probability of the corresponding bidding strategies of tendency coefficient update of each bidding strategies.

Further, described to determine the bidding strategies according to by the corresponding e-commerce operation target value of selling of selection bidding strategies Corresponding income of bidding, comprising:

It determines as the following formula and determines that the bidding strategies are corresponding according to by the corresponding e-commerce operation target of selling of selection bidding strategies Income of bidding:

In above formula, R is the corresponding income of bidding of the bidding strategies, α_nTo be described by the corresponding sale of electricity of selection bidding strategies Quotient n-th runs target value, θ_nFor the weight for being runed target value by the corresponding sale of electricity quotient n-th of selection bidding strategies, β_nIt is described The conversion coefficient of target value, n ∈ [1,9] are runed by the corresponding sale of electricity quotient n-th of selection bidding strategies.

Specifically, determining that the corresponding sale of electricity quotient first of the bidding strategies runs target value α as the following formula₁:

α₁=max [(p_sell*q_sell)-(p_clear*q_clear)]

The second operation target value α of the corresponding sale of electricity quotient of the bidding strategies is determined as the following formula₂:

α₂=max [q_load*(p_set-p_clear)]

The third operation target value α of the corresponding sale of electricity quotient of the bidding strategies is determined as the following formula₃:

α₃=max (q_clear*p_clear)

The 4th operation target value α of the corresponding sale of electricity quotient of the bidding strategies is determined as the following formula₄:

α₄=max [(q_sell*p_sell)-Δ_penalty]

The 5th operation target value α of the corresponding sale of electricity quotient of the bidding strategies is determined as the following formula₅:

α₅=max [(q_sell*p_sell)-Δ_penalty']

The 6th operation target value α of the corresponding sale of electricity quotient of the bidding strategies is determined as the following formula₆:

α₆=maxq_clear

The 7th operation target value α of the corresponding sale of electricity quotient of the bidding strategies is determined as the following formula₇:

α₇=max (q_clear-Δ_penalty)

The 8th operation target value α of the corresponding sale of electricity quotient of the bidding strategies is determined as the following formula₈:

α₈=max (q_clear-Δ_penalty')

The 9th operation target value α of the corresponding sale of electricity quotient of the bidding strategies is determined as the following formula₉:

α₉=max (q_clear-q_clear')

In above formula, p_sellAnd q_sellThe respectively price and electricity of the sale of electricity contract of sale of electricity Shang Yuqi user signing, p_clear For the corresponding cleaing price of the bidding strategies, q_clearFor the corresponding acceptance of the bid electricity of the bidding strategies, p_setFor listed power price, q_loadFor load prediction electricity, Δ_penaltyPenalty term when for practical loss of capital more than the receptible loss of capital amount of money, Δ_penalty' be It not can guarantee the penalty term when condition got a profit substantially, q_clear' be rival conclusion of the business electricity；

Wherein, penalty term Δ when practical loss of capital is more than the receptible loss of capital amount of money is determined as the following formula_penalty:

Penalty term Δ when not can guarantee the condition got a profit substantially is determined as the following formula_penalty':

In above formula, δ_penaltyIt is penalty factor, π_lossFor the receptible loss of capital amount of money.

Further, the income of bidding according to is updated by the tendency coefficient of selection bidding strategies, comprising:

It is updated as the following formula by the tendency coefficient of selection bidding strategies:

q_t+1(m)=(1-r) q_t(m)+(1-e)R

In above formula, q_t+1(m) for, by the tendency coefficient of selection bidding strategies, r is in the t+1 times iteration bidding strategies set Forgetting factor, e are empirical parameter, q_tIt (m) is the tendency coefficient of m-th of bidding strategies in the t times iteration bidding strategies set, t ∈ [1, T], T are iteration total degree, and m ∈ [1, M], M are bidding strategies sum.

Further, the forgetting factor according to unselected bidding strategies updates the tendency of unselected bidding strategies Coefficient, comprising:

The tendency coefficient of unselected bidding strategies is updated as the following formula:

In above formula, x ∈ [1, M] and x ≠ m, m ∈ [1, M], M are bidding strategies sum, and m is the bidding strategies selected, x For non-selected bidding strategies,；q_t+1It (x) is the tendency system of x-th of bidding strategies in the t+1 times iteration bidding strategies set Number, q_tIt (x) is the tendency coefficient of x-th of bidding strategies in the t times iteration bidding strategies set.

Further, the select probability of the corresponding bidding strategies of tendency coefficient update according to each bidding strategies, comprising:

The select probability p of s-th of bidding strategies in the t+1 times iteration bidding strategies set is determined as the following formula_t+1(s):

In above formula, s ∈ [1, M], M are bidding strategies sum；q_t+1It (s) is s in the t+1 times iteration bidding strategies set The tendency coefficient of a bidding strategies, c are cooling ratio；

Wherein, cooling ratio c is determined as the following formula:

In above formula, q_tIt (s) is the tendency coefficient of m-th of bidding strategies in the t times iteration bidding strategies set, ε is greater than 0 Real number.

Preferably, the step S3, comprising:

If there are the select probabilities of bidding strategies to be greater than 0.99 in the bidding strategies set, which is optimal Bidding strategies；Otherwise, the step S1 is returned.

A kind of sale of electricity quotient intelligent agent bid device, it is improved in that described device includes:

Selecting unit, for selecting bidding strategies according to the select probability of bidding strategies each in bidding strategies set；

Updating unit, for according to by the tendency of selection bidding strategies corresponding bid income and unselected bidding strategies The select probability of each bidding strategies of coefficient update；

Acquiring unit, for obtaining Optimal Bidding Strategies according to the select probability of bidding strategies each after update.

Compared with the immediate prior art, the invention has the benefit that

Technical solution provided by the invention, it is competing by being selected according to the select probability of bidding strategies each in bidding strategies set Valence strategy is respectively bidded according to by the tendency coefficient update of selection bidding strategies corresponding bid income and unselected bidding strategies The select probability of strategy obtains Optimal Bidding Strategies according to the select probability of bidding strategies each after update, it is competing to be able to reflect concentration The decision behavior of different type sale of electricity company, reflects the Decision-making of Bidding behavior of different type sale of electricity quotient, further body in valence transaction The decision predisposition of different sale of electricity quotient in real market is showed.

Detailed description of the invention

Fig. 1 is a kind of flow chart of sale of electricity quotient intelligent agent Bidding system in the embodiment of the present invention；

Fig. 2 is a kind of structural schematic diagram of sale of electricity quotient intelligent agent bid device in the embodiment of the present invention.

Specific embodiment

Specific embodiments of the present invention will be described in further detail with reference to the accompanying drawing.

In order to make the object, technical scheme and advantages of the embodiment of the invention clearer, below in conjunction with the embodiment of the present invention In attached drawing, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that described embodiment is A part of the embodiment of the present invention, instead of all the embodiments.Based on the embodiments of the present invention, those of ordinary skill in the art All other embodiment obtained without making creative work, shall fall within the protection scope of the present invention.

The present invention provides a kind of sale of electricity quotient intelligent agent Bidding systems, as shown in Figure 1, which comprises

101. selecting bidding strategies according to the select probability of bidding strategies each in bidding strategies set；

102. according to by the tendency coefficient update of selection bidding strategies corresponding bid income and unselected bidding strategies The select probability of each bidding strategies；

103. obtaining Optimal Bidding Strategies according to the select probability of bidding strategies each after update.

Further, in the step 101, the initial selected probability of each bidding strategies in bidding strategies set, as the following formula It determines:

Further, the step 101, comprising:

Further, the step 102, comprising:

Specifically, described determine the bidding strategies pair according to by the corresponding e-commerce operation target value of selling of selection bidding strategies The income of bidding answered, comprising:

α₁=max [(p_sell*q_sell)-(p_clear*q_clear)]

α₂=max [q_load*(p_set-p_clear)]

α₃=max (q_clear*p_clear)

α₄=max [(q_sell*p_sell)-Δ_penalty]

α₅=max [(q_sell*p_sell)-Δ_penalty']

α₆=maxq_clear

α₇=max (q_clear-Δ_penalty)

α₈=max (q_clear-Δ_penalty')

α₉=max (q_clear-q_clear')

Specifically, the income of bidding according to is updated by the tendency coefficient of selection bidding strategies, comprising:

q_t+1(m)=(1-r) q_t(m)+(1-e)R

Specifically, the forgetting factor according to unselected bidding strategies updates the tendency system of unselected bidding strategies Number, comprising:

Wherein, as t=1, q_t(m) or q_t(x) it is the initial tendency coefficient of each bidding strategies in bidding strategies set, enables Initial tendency coefficient is 6000, and the initial coefficient that is inclined to is an empirical parameter, guarantees that each strategy has positive choosing at the beginning Probability is selected so as to global convergence.

Specifically, the select probability of the corresponding bidding strategies of tendency coefficient update according to each bidding strategies, comprising:

In above formula, s ∈ [1, M], s both can be m, or x, M are bidding strategies sum；q_t+1(s) it is the t+1 times The tendency coefficient of s-th of bidding strategies, c are cooling ratio in iteration bidding strategies set；

Wherein, cooling ratio c is determined as the following formula:

Further, the step 103, comprising:

If there are the select probabilities of bidding strategies to be greater than 0.99 in the bidding strategies set, which is optimal Bidding strategies；Otherwise, the step S101 is returned.

The present invention also provides a kind of sale of electricity quotient intelligent agent bid devices, as shown in Fig. 2, described device includes:

Further, in the selecting unit, the initial selected probability of each bidding strategies in bidding strategies set, as the following formula It determines:

Further, the selecting unit, is used for:

Further, the updating unit, comprising:

Determining module, for determining the bidding strategies according to by the corresponding e-commerce operation target value of selling of selection bidding strategies Corresponding income of bidding；

First update module, for bidding according to, income is updated by the tendency coefficient of selection bidding strategies；

Second update module, for updating unselected bidding strategies according to the forgetting factor of unselected bidding strategies It is inclined to coefficient；

Third update module, for the select probability for being inclined to the corresponding bidding strategies of coefficient update according to each bidding strategies.

Specifically, the determining module, e-commerce operation is sold according to by selection bidding strategies are corresponding for determining as the following formula Target determines the corresponding income of bidding of the bidding strategies:

Specifically, the determining module, further includes:

First determines submodule, for determining that the corresponding sale of electricity quotient first of the bidding strategies runs target value as the following formula α₁:

α₁=max [(p_sell*q_sell)-(p_clear*q_clear)]

Second determines submodule, for determining the second operation target value of the corresponding sale of electricity quotient of the bidding strategies as the following formula α₂:

α₂=max [q_load*(p_set-p_clear)]

Third determines submodule, for determining that the third of the corresponding sale of electricity quotient of the bidding strategies runs target value as the following formula α₃:

α₃=max (q_clear*p_clear)

4th determines submodule, for determining the 4th operation target value of the corresponding sale of electricity quotient of the bidding strategies as the following formula α₄:

α₄=max [(q_sell*p_sell)-Δ_penalty]

5th determines submodule, for determining the 5th operation target value of the corresponding sale of electricity quotient of the bidding strategies as the following formula α₅:

α₅=max [(q_sell*p_sell)-Δ_penalty']

6th determines submodule, for determining the 6th operation target value of the corresponding sale of electricity quotient of the bidding strategies as the following formula α₆:

α₆=maxq_clear

7th determines submodule, for determining the 7th operation target value of the corresponding sale of electricity quotient of the bidding strategies as the following formula α₇:

α₇=max (q_clear-Δ_penalty)

8th determines submodule, for determining the 8th operation target value of the corresponding sale of electricity quotient of the bidding strategies as the following formula α₈:

α₈=max (q_clear-Δ_penalty')

9th determines submodule, for determining the 9th operation target value of the corresponding sale of electricity quotient of the bidding strategies as the following formula α₉:

α₉=max (q_clear-q_clear')

Tenth determines submodule, penalty term when for determining that practical loss of capital is more than the receptible loss of capital amount of money as the following formula Δ_penalty:

11st determines submodule, for determining penalty term when not can guarantee the condition got a profit substantially as the following formula Δ_penalty':

Specifically, first update module, for being updated as the following formula by the tendency coefficient of selection bidding strategies:

q_t+1(m)=(1-r) q_t(m)+(1-e)R

Specifically, second update module is used to update the tendency coefficient of unselected bidding strategies as the following formula:

Specifically, the third update module for determine in the t+1 times iteration bidding strategies set as the following formula s-th it is competing The select probability p of valence strategy_t+1(s):

The third update module, further includes:

12nd determines submodule, for determining cooling ratio c as the following formula:

Further, the acquiring unit, if for there are the select probabilities of bidding strategies in the bidding strategies set Greater than 0.99, then the bidding strategies are Optimal Bidding Strategies；Otherwise, selecting unit is returned.

It should be understood by those skilled in the art that, embodiments herein can provide as method, system or computer program Product.Therefore, complete hardware embodiment, complete software embodiment or reality combining software and hardware aspects can be used in the application Apply the form of example.Moreover, it wherein includes the computer of computer usable program code that the application, which can be used in one or more, The computer program implemented in usable storage medium (including but not limited to magnetic disk storage, CD-ROM, optical memory etc.) produces The form of product.

The application is referring to method, the process of equipment (system) and computer program product according to the embodiment of the present application Figure and/or block diagram describe.It should be understood that every one stream in flowchart and/or the block diagram can be realized by computer program instructions The combination of process and/or box in journey and/or box and flowchart and/or the block diagram.It can provide these computer programs Instruct the processor of general purpose computer, special purpose computer, Embedded Processor or other programmable data processing devices to produce A raw machine, so that being generated by the instruction that computer or the processor of other programmable data processing devices execute for real The device for the function of being specified in present one or more flows of the flowchart and/or one or more blocks of the block diagram.

These computer program instructions, which may also be stored in, is able to guide computer or other programmable data processing devices with spy Determine in the computer-readable memory that mode works, so that it includes referring to that instruction stored in the computer readable memory, which generates, Enable the manufacture of device, the command device realize in one box of one or more flows of the flowchart and/or block diagram or The function of being specified in multiple boxes.

These computer program instructions also can be loaded onto a computer or other programmable data processing device, so that counting Series of operation steps are executed on calculation machine or other programmable devices to generate computer implemented processing, thus in computer or The instruction executed on other programmable devices is provided for realizing in one or more flows of the flowchart and/or block diagram one The step of function of being specified in a box or multiple boxes.

Finally it should be noted that: the above embodiments are merely illustrative of the technical scheme of the present invention and are not intended to be limiting thereof, to the greatest extent Invention is explained in detail referring to above-described embodiment for pipe, it should be understood by those ordinary skilled in the art that: still It can be with modifications or equivalent substitutions are made to specific embodiments of the invention, and without departing from any of spirit and scope of the invention Modification or equivalent replacement, should all cover within the scope of the claims of the present invention.

Claims

1. a method for selling electricity business intelligence agent bidding, it is characterised in that the method comprises:

S1. Select a bidding strategy according to the selection probability of each bidding strategy in the bidding strategy set;

S2. Update the selection probability of each bidding strategy according to the bidding income corresponding to the selected bidding strategy and the propensity coefficient of the unselected bidding strategy;

S3. Obtain the optimal bidding strategy according to the selection probability of each bidding strategy after the update.

2. The method according to claim 1, wherein, in the step S1, the initial selection probability of each bidding strategy in the bidding strategy set is determined as follows:

In the above formula, p ₁ (s) is the initial selection probability of the s-th bidding strategy in the bidding strategy set, s∈[1,M], M is the total number of bidding strategies.

3. The method of claim 1, wherein the step S1 comprises:

According to the selection probability of each bidding strategy in the bidding strategy set, a roulette algorithm is used to select a bidding strategy from the bidding strategy set.

4. The method of claim 1, wherein the step S2 comprises:

Determine the bidding revenue of the selected bidding strategy according to the target value of the e-commerce retailer operation corresponding to the selected bidding strategy;

Update the propensity coefficient of the selected bidding strategy according to the bidding revenue of the selected bidding strategy;

Update the propensity coefficient of the unselected bidding strategy according to the forgetting factor of the unselected bidding strategy;

The selection probability of the corresponding bidding strategy is updated according to the propensity coefficient of each bidding strategy.

5. The method according to claim 4, wherein determining the bidding revenue corresponding to the bidding strategy according to the target value of the e-commerce sales business operation corresponding to the selected bidding strategy, comprising:

According to the operation target of the e-commerce retailer corresponding to the selected bidding strategy, the bidding revenue corresponding to the selected bidding strategy is determined as follows:

In the above formula, R is the bidding revenue corresponding to the bidding strategy, α _n is the nth operation target value of the e-commerce retailer corresponding to the selected bidding strategy, and θ _n is the e-commerce retailer's nth corresponding to the selected bidding strategy. The weight of n operation target value, β _n is the conversion coefficient of the nth operation target value of the e-commerce retailer corresponding to the selected bidding strategy, n∈[1,9].

6. The method according to claim 5, wherein the first operation target value α ₁ of the electricity seller corresponding to the bidding strategy is determined as follows:

α ₁ =max[(p _sell *q _sell )-(p _clear *q _clear )]

The second operating target value α ₂ of the e-commerce retailer corresponding to the bidding strategy is determined as follows:

α ₂ =max[q _load *(p _set -p _clear )]

The third operation target value α ₃ of the e-commerce retailer corresponding to the bidding strategy is determined as follows:

α ₃ =max(q _clear *p _clear )

The fourth operation target value α ₄ of the e-commerce retailer corresponding to the bidding strategy is determined as follows:

α ₄ =max[(q _sell *p _sell )-Δ _penalty ]

The fifth operation target value α ₅ of the e-commerce retailer corresponding to the bidding strategy is determined as follows:

α ₅ =max[(q _sell *p _sell )-Δ _penalty ']

The sixth operation target value α ₆ of the e-commerce retailer corresponding to the bidding strategy is determined as follows:

α ₆ =maxq _clear

The seventh operational target value α ₇ of the e-commerce retailer corresponding to the bidding strategy is determined as follows:

α ₇ =max(q _clear -Δ _penalty )

The eighth operational target value α ₈ of the e-commerce retailer corresponding to the bidding strategy is determined as follows:

α ₈ =max(q _clear -Δ _penalty ')

The ninth operation target value α ₉ of the e-commerce retailer corresponding to the bidding strategy is determined as follows:

α ₉ =max(q _clear -q _clear ')

In the above formula, p _sell and q _sell are the price and electricity volume of the electricity sales contract signed by the e-commerce company and its users, respectively, p _clear is the clearing price corresponding to the bidding strategy, and q _clear is the bid-winning electricity volume corresponding to the bidding strategy. , p _set is the catalog electricity price, q _load is the load forecast electricity, Δ _penalty is the penalty item when the actual loss exceeds the acceptable amount of loss, Δ _penalty ' is the penalty item when the basic profit cannot be guaranteed, q _clear ' is Competitor's transaction volume;

Among them, the penalty term _Δpenalty when the actual loss exceeds the acceptable loss amount is determined as follows:

The penalty term _Δpenalty ' when the condition of failing to guarantee basic profit is determined as follows:

In the above formula, δ _penalty is the penalty factor, and π _loss is the acceptable loss amount.

7. The method of claim 4, wherein the updating the propensity coefficient of the selected bidding strategy according to the bidding revenue comprises:

Update the propensity coefficient of the selected bidding strategy as follows:

q _t+1 (m)=(1-r)q _t (m)+(1-e)R

In the above formula, q _t+1 (m) is the propensity coefficient of the bidding strategy selected in the t+1 th iteration bidding strategy set, r is the forgetting factor, e is the empirical parameter, and q _t (m) is the t th iteration. The propensity coefficient of the mth bidding strategy in the bidding strategy set, t∈[1,T], T is the total number of iterations, m∈[1,M], M is the total number of bidding strategies.

8. The method of claim 4, wherein the updating the propensity coefficient of the unselected bidding strategy according to the forgetting factor of the unselected bidding strategy, comprising:

Update the propensity coefficient for the unselected bidding strategy as follows:

In the above formula, x∈[1,M] and x≠m, m∈[1,M], m is the selected bidding strategy, x is the unselected bidding strategy, M is the total number of bidding strategies; q _{t+ 1} (x) is the propensity coefficient of the x-th bidding strategy in the t+1-th iteration bidding strategy set, and q _t (x) is the propensity coefficient of the x-th bidding strategy in the t-th iteration bidding strategy set.

9. The method of claim 4, wherein the updating the selection probability of the corresponding bidding strategy according to the propensity coefficient of each bidding strategy comprises:

The selection probability p _t+1 (s) of the s-th bidding strategy in the t+1-th iteration bidding strategy set is determined as follows:

In the above formula, s∈[1,M], M is the total number of bidding strategies; q _t+1 (s) is the propensity coefficient of the s-th bidding strategy in the t+1-th iteration bidding strategy set, and c is the cooling coefficient;

Among them, the cooling coefficient c is determined as follows:

In the above formula, q _t (s) is the propensity coefficient of the m-th bidding strategy in the t-th iteration bidding strategy set, and ε is a real number greater than 0.

10. The method of claim 1, wherein the step S3 comprises:

If the selection probability of a bidding strategy in the bidding strategy set is greater than 0.99, the bidding strategy is the optimal bidding strategy; otherwise, return to step S1.

11. An intelligent proxy bidding device for e-commerce sales, characterized in that the device comprises:

The selection unit is used to select the bidding strategy according to the selection probability of each bidding strategy in the bidding strategy set;

an update unit, configured to update the selection probability of each bidding strategy according to the bidding income corresponding to the selected bidding strategy and the propensity coefficient of the unselected bidding strategy;

The obtaining unit is used to obtain the optimal bidding strategy according to the selection probability of each bidding strategy after updating.