CN115270008A

CN115270008A - Maximum influence owner searching method and system, storage medium and terminal

Info

Publication number: CN115270008A
Application number: CN202211196398.1A
Authority: CN
Inventors: 张九龙; 寇纲; 章宇; 肖辉; 肖峰
Original assignee: Southwestern University Of Finance And Economics
Current assignee: Southwestern University Of Finance And Economics
Priority date: 2022-09-29
Filing date: 2022-09-29
Publication date: 2022-11-01
Anticipated expiration: 2042-09-29
Also published as: CN115270008B

Abstract

The invention discloses a method and a system for searching a maximum influence blogger, a storage medium and a terminal, which are characterized by comprising the following steps: s1, microblog social network data are obtained, wherein the microblog social network data have bloggers, propagation weights among the bloggers and propagation thresholds among the bloggers; s2, abstracting the microblog social network into a directed acyclic graph

The method includes the steps that information of the directed acyclic graph is obtained, V represents a set of nodes, each node corresponds to one individual in the microblog social network, the uncertainty of propagation weight in the social network is discussed to be more consistent with the reality background by combining a robust optimization theory, and meanwhile, a C is designed based on a proposed integer programming model&And (3) CG algorithm solution, and generalization capability on a large-scale social network is improved.

Description

Maximum influence owner searching method and system, storage medium and terminal

Technical Field

The invention belongs to the technical field of network propagation and dynamics, relates to a social network influence maximization problem solving technology, and particularly relates to a maximum influence blogger searching method and system, a storage medium and a terminal.

Background

The development of mobile internet in the past decade has brought various online social platforms to the rise, which has greatly promoted the daily communication and information dissemination of the public. Meanwhile, users of the system form a huge social network. How to screen out the least seed users, through the public praise effect, make it can influence more users, it is a problem with theoretical and economic value in the social network analysis-the influence maximization problem, select limited node as the seed set from the social network initially, entity such as the information is passed outward from the seed node along the edge that is connected in the network. Based on a given propagation model, the most individuals in the network receive entities at the end of the propagation process. The total number of individuals subsequently affected by the seed set (including the seed set) is defined as the impact propagation range of the seed set.

Taking virus-type marketing as an example, suppose that a manufacturer develops a new product, and intends to spread publicity in the form of issuing product promos, coupons and the like, and consumers who purchase the product are expected to publicize among friends and relatives of the consumers. Due to the limited popularization cost of manufacturers, a part of consumers need to be screened for targeted marketing in the early stage. At present, the proportion of internet advertisement delivery in all marketing channels rises sharply. Taking the microblog as an example, a manufacturer can select a blogger corresponding to the product category of the manufacturer, put advertisements (characters, pictures, videos and the like) on the homepage of the manufacturer, attach information and purchasing links of the product and pay certain cost. The field experiment shows that compared with vermicelli of a comprehensive bouquet owner (not concentrated on a single product), the vermicelli of a professional bouquet owner is fewer in number, but has higher tweet attention on the bouquet owner and stronger purchasing conversion power. How to effectively screen bloggers and improve the value of advertisement promotion is a marketing problem which is very worthy of research.

Due to the randomness of the propagation path of the entity in the network, the influence propagation range of the seed node is too complex to calculate. Even if only a seed set containing one node is considered to propagate on a directed acyclic graph, it is extremely difficult to accurately calculate the impact propagation range. In fact, it is difficult to accurately calculate the influence propagation range as a # P problem, and to find the optimal seed set as an NP problem.

The prior art has two categories: one is a greedy approximation algorithm of Monte Carlo method using submodel and monotonicity of influence propagation function and its improvement; another is a heuristic algorithm that takes advantage of the specific characteristics of the model.

The prior art has the defects that:

(1) The greedy approximation algorithm of the Monte Carlo method can theoretically obtain an approximation ratio guarantee lower limit of an optimal influence propagation range, but the Monte Carlo method needs to repeatedly simulate a certain candidate node to propagate in a network, count the total number of activated nodes at the end of each propagation process, calculate an average value and approximate the optimal influence propagation range. To obtain a higher quality solution, the method requires running thousands of simulations, resulting in long computation time and high cost.

(2) Although various heuristic algorithms replace Monte Carlo simulation, the solving speed is improved, and the global optimal seed set cannot be obtained. Meanwhile, in order to ensure the precision of the solution, taking a method based on reverse influence sampling as an example, the number of reverse reachable sets which is large enough needs to be sampled, so as to achieve the approximate ratio guarantee which is the same as the greedy approximation algorithm of the Monte Carlo method.

Disclosure of Invention

In order to solve the problems in the prior art, the invention provides a method for searching a blogger with the maximum influence.

In order to achieve the purpose, the invention adopts the technical scheme that:

the method for searching the blogger with the maximum influence is characterized by comprising the following steps of:

s1, microblog social network data are obtained, wherein the microblog social network data have bloggers, propagation weights among the bloggers and propagation thresholds among the bloggers;

s2, abstracting the microblog social network into a directed acyclic graph

Obtaining a directed acyclicInformation of the graph, V represents a set of nodes, each node corresponding to an individual in the microblog social network,

the directed edge set representing the paired nodes, the number of the nodes in the network is

The number of sides is

；

S3, establishing a two-stage robust optimization model;

step S4, the directed acyclic graph is processed

Inputting the information into a two-stage robust optimization model for calculation;

and S5, returning result data by the two-stage robust optimization model.

Preferably, in step S2,

directed acyclic graph

The information of (2) includes nodes, corresponding propagation weights on each edge, activation thresholds of the nodes, and connection relationships of the edges.

Preferably, in step S3,

in directed acyclic graphs

A virtual node s is introduced to obtain a new augmented graph

New, new

New, new

J is a node;

hypothetical edge

Is located in the interval

Introduction of external excitation

Establishing a two-stage robust optimization model;

（1）

wherein,

representing a first stage decision as a cost of screening the seed set;

wherein,

representing the unit cost of the node j entering the selected seed set;

wherein,

the variables are 0 and 1, and the variables are,

a value equal to 1 indicates that node j is selected as the seed set,

a value of 0 indicates that node j has not been selected as the seed set;

wherein, B represents the number of seed sets;

wherein,

a decision in the second stage is shown,

representing the difference between the external incentive cost and the node activation profit;

wherein,

a state variable representing a node is represented by,

the state variable representing the edge is represented by,

represents an external stimulus;

wherein,

representing the external stimulus introduced by node j,

representing the unit cost of node j introducing an external stimulus,

representing the unit profit of the activated node j in the propagation process;

wherein,

a state variable representing the node j is shown,

a value equal to 1 indicates that node j is activated,

equal to 0 indicates that node j is not activated;

wherein the edge

Representing the outgoing edge of the node i and the incoming edge of the node j;

wherein,

a set of budget uncertainties is represented,

representing an allowable perturbation upper limit of the propagation weight;

wherein,

representing edges

A corresponding propagation weight;

wherein,

is a threshold value for the node(s),

represents a number greater than 10;

wherein,

representing the disturbance propagation weight.

Preferably, in step S4,

the resulting data has an optimal seed set, total number of activated nodes, total cost and external stimuli.

Preferably, the first and second liquid crystal materials are,

with the C & CG algorithm, the algorithm,

wherein the C & CG algorithm framework has an outer layer C & CG algorithm and an inner layer C & CG algorithm;

wherein the outer layer C&CG algorithm for determining optimal seed set

；

Wherein the inner layer C&CG Algorithm for calculating worst scenarios

。

Preferably, the outer C & CG algorithm comprises,

the main problem of the outer layer is that,

（2）

wherein,

is an intermediate variable, l represents the outer layer C&The iteration number of the CG algorithm, L represents the outer layer C&The upper limit of the iteration times of the CG algorithm;

wherein,

representing the external stimulus introduced by node j at the ith iteration,

representing the state variable of node j at the ith iteration,

representing edges

The state variables at the time of the l-th iteration,

representing an uncertain parameter at the first iteration;

the problem of the outer layer is that,

（3）。

preferably, the outer layer C & CG algorithm comprises the steps of,

step S3a1, initialization: upper boundary of problem (1)

Lower boundary of problem (1)

The number of iterations is

The convergence criterion is

；

S3a2, solving an outer layer main problem (2);

step S3a3, judging whether the outer layer main question (2) has a solution or not,

if not, the original problem (1) has no robust feasible solution, and returns a result, and if the original problem has the solution, the next step is carried out;

step S3a4, obtaining an optimal solution

The upper bound of the update problem (1) is

；

Step S3a5, fixing decision variables of the first stage

Solving an outer layer sub-problem (3);

step S3a6, obtaining a worst scene

And corresponding sub-problem objective function

Update

；

Step S3a7, judging whether the requirements are met

，

If the condition is not met, adding a variable

And the constraint conditions (2 c) - (2 h) are met, then the step S3a2 is skipped, and if the conditions are met, the next step is carried out;

step S3a8, returning the optimal seed set

。

Preferably, the inner layer C & CG algorithm includes,

for the outer sub-problem (3), go through

And the equivalence problem (4) is obtained,

（4）

wherein,

denotes an intermediate variable, k denotes an inner layer C&The iteration number of CG algorithm, K represents the inner layer C&The upper limit of the iteration times of the CG algorithm;

wherein,

representing the external stimulus introduced by node j at the kth iteration,

representing the state variable of node j at the kth iteration,

representing edges

The state variable at the k-th iteration,

a derived deterministic parameter representing an outer-layer main question;

the equivalence problem (5) is obtained by utilizing a strong dual theorem,

（5）

wherein,

are newly added variables.

Preferably, the inner layer C & CG algorithm includes,

the main problem of the inner layer is that,

（6）

wherein k = q represents the inner layer C&The CG algorithm converges at the qth iteration,

is a newly added variable;

the problem of the inner layer is solved,

（7）

wherein,

are uncertain parameters.

Preferably, the inner layer C & CG algorithm comprises the steps of,

step S3b1, initializing decision variables of the first stage

Let the upper bound of problem (3) be

Let the lower boundary of problem (3) be

The iteration number is k =0 and the convergence criterion is

；

S3b2, solving an inner layer main problem (6);

step S3b3, obtaining the worst scene

And its corresponding objective function value

Update

；

Step S3b4, fixing uncertain parameters

Solving an inner layer sub-problem (7);

step S3b5, obtaining an optimal solution

Updating the upper boundary of the question (7)

；

Step S3b6, judging whether the condition is satisfied

If not, adding new variable

And the constraint conditions (6 c) - (6 h) are set, the step S3b2 is skipped, and if the constraint conditions are met, the next step is carried out;

step S3b7, returning uncertain parameters

。

A searching system for the most influential blogger is characterized by comprising,

the maximum influence blogger searching method can be executed.

A read storage medium characterized in that,

for storing a specific computer program, the execution of which can implement the maximum influence blogger finding method.

A terminal, comprising:

a memory for storing executable program code;

a processor;

wherein the processor is coupled with the memory;

the processor calls the executable program code stored in the memory to execute the maximum influence blogger searching method.

Provides a method and a system for searching for the maximum influence blogger, a storage medium and a terminal, and has the advantages that,

(1) In the problem of maximization of the influence of the social network, the method is based on a classical linear threshold model, and the propagation weight is assumed to be located in an uncertain set of a certain interval, so that uncertainty is introduced, and the method is more consistent with the current situation that parameters cannot be accurately estimated in a real scene;

(2) The invention designs a double-layer optimization framework and constructs an integer programming model. Based on a Nested Column and Constraint Generation algorithm, the optimal seed set can be accurately solved, and due to the fact that iteration times needed by the algorithm are reduced, the expandability is still achieved on a large-scale social network;

(3) According to the invention, because uncertainty is introduced into parameters in the network, the seed nodes obtained by solving are more robust. Therefore, the advertisement putting effect on the social platform can be effectively simulated, and pricing guidance is provided.

Drawings

FIG. 1 illustrates an outer C & CG flow diagram of the two-stage robust optimization model of the present invention;

FIG. 2 illustrates an inner C & CG flow diagram of the two-stage robust optimization model of the present invention;

FIG. 3 illustrates an overall flow diagram of the two-stage robust optimization model of the present invention;

FIG. 4 shows a social network propagation diagram of 9 nodes.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Referring to fig. 1-4, the embodiments of the present invention are as follows:

example 1:

the method for searching for the maximum influence blogger is characterized by comprising the following steps of:

s2, abstracting the microblog social network into a directed acyclic graph

Obtaining information of a directed acyclic graph, wherein V represents a set of nodes, and each node corresponds to one node in a microblog social networkThe number of the individuals is increased,

The number of sides is

；

S3, establishing a two-stage robust optimization model;

step S4, the directed acyclic graph is processed

and S5, returning result data by the two-stage robust optimization model.

In this embodiment, the microblog social network is abstracted as a directed acyclic graph

Acquiring information of a directed acyclic graph, wherein V represents a set of nodes, each node corresponds to one individual in a microblog social network, and each node has two possible states: active and inactive. The active state represents that the node receives the propagated entity (information, product, technology and the like), and the inactive state represents that the node does not receive the entity;

the method comprises the steps that a directed edge set representing paired nodes is obtained, each directed edge represents that the influence between the paired nodes is directional, the influence of a node i on a node j is different from the influence of the node j on the node i, and each edge corresponds to a propagation weight

The propagation weight can be expressed as the last node in the propagation processThe strength of the impact on the next node. The number of nodes in the network is

The number of sides is

(ii) a For each node j in V define its in-neighbor as

All the nodes at the upper level pointing to j are represented.

Initialization definition

Denotes the set of active nodes at time t, where

Is the initial seed set. The seed nodes are used to influence other nodes in the diffusion process so that the entity propagates through the social network. If from any time t to the next time (t + 1), the set of active nodes does not change, i.e. there is no change

The propagation ends. The invention is based on a linear threshold propagation model and aims at all nodes

Assuming the sum of its propagation weights into neighbors, threshold

Respectively, do not exceed 1, respectively,

at random

At a time, for all inactive nodes, if all of its active-to-neighbor propagation weights are activeThe minimum sum of weights is

I.e. by

Then it indicates that node j was successfully activated.

Example 2:

in the step S2, the process is carried out,

directed acyclic graph

In the step S3, the process is carried out,

in directed acyclic graphs

Introducing virtual node s to obtain augmented graph

，

，

J is a node;

hypothetical edge

Is located in the interval

Introduction of external excitation

Establishing a two-stage robust optimization model;

（1）

wherein,

representing a first stage decision as a cost of screening the seed set;

wherein,

representing the unit cost of the node j entering the selected seed set;

wherein,

the variables are 0 and 1, and the variables are,

a value equal to 1 indicates that node j is selected as the seed set,

a value of 0 indicates that node j is not selected as the seed set;

wherein, B represents the number of seed sets;

wherein,

the decision in the second stage is shown,

wherein,

a state variable representing a node is represented by,

a state variable representing the edge is represented by,

represents an external stimulus;

wherein,

representing the external stimulus introduced by node j,

representing the unit cost of node j introducing an external stimulus,

wherein,

a state variable representing the node j is shown,

a value equal to 1 indicates that node j is activated,

equal to 0 indicates that node j is not activated;

wherein the edge

wherein,

a set of budget uncertainties is represented,

representing an allowable perturbation upper limit of the propagation weight;

wherein,

representing edges

A corresponding propagation weight;

wherein,

is a threshold value for the node(s),

represents a number greater than 10;

wherein,

representing the disturbance propagation weight.

In the internet era, the rise of social platforms accelerates the spread of entities such as information. Social networks formed between individuals are beneficial to the development of commercial activities such as viral marketing, and also enable rumors to be spread more easily, leading to social turbulence. Therefore, it is very urgent to find key influence nodes from the social network, and in fact, researchers usually assume that parameters of an influence propagation model are determined, and in the case of a linear threshold propagation model, key parameter propagation weights and thresholds are estimated from data in the social network by a specific method, and the estimated propagation weights and thresholds in the social network are inaccurate. And the propagation weight and the threshold value in the social network are input data for finding the optimal blogger model, the two data have estimation deviation at the beginning, and the accuracy of the optimal solution solved by the model has no great reference value.

Researchers usually ignore the inaccuracy problem, and use the deterministic propagation weight and threshold value input model, so that the model is also a deterministic model, the deterministic model simplifies the complexity of the problem to a certain extent, but supposing that the background is extremely strong, the difference between the calculation result in the real social network and the actual effect is large, most of the prior art are approximate solutions, the total number of nodes activated by the seed set obtained by the approximate algorithm is greater than or equal to 0.63 multiplied by the total number of nodes activated by the optimal seed set, and obviously, the "optimal solution" of the approximate algorithm is not the optimal solution in the true sense.

Taking an example of marketing a new perfume by a cosmetic manufacturer, the manufacturer hopes that most customers see the new perfume, and because uncertainty of real data is not considered, the perfume is popularized by finding an optimal blogger through an approximate method, and the optimal solution found by the method has great errors. There is a high probability that this situation will occur, and although the influence spread of the blogger is maximized, the blogger is a general class blogger, the conversion rate of the bought of the perfume after the advertisement is placed is low, the cost of the boulder invested by the merchant cannot be returned, and the conversion power is still not available after a period of time, and the merchant has invested a large amount of funds in the boulder, which is undoubtedly a very large loss to the merchant. Particularly, many small and medium-sized enterprises, the advertisement of the blogger cannot be profitable, and the company faces the risk of direct closing. In real life, over 70% of merchants cannot bear high risk with low probability, and therefore, it is necessary to improve the solution accuracy of the model.

In this embodiment, the model introduces the weight of the propagation of the disturbance and the external excitation

Assuming edges, taking into account the uncertainty of the input data

Is located at

And (3) establishing a two-stage robust optimization model, and enabling the fluctuation range of the finally found optimal solution to be smaller through calculation of the model, so that the model is more stable. The final optimal solution returned by the model is robust, and the objective function value corresponding to the solution represents the net gain that can be achieved in the worst case.

The disturbance propagation weight may be adjusted according to social network data, such as the authenticity of the social network data, the range of true propagation weights, and so forth. In one embodimentIn the step S3, in the step S,

and

the proportion relation is that,

g is a proportionality coefficient, and the value range of G is in the interval [0,1]。

The invention obtains better solution under the condition of adding less calculation time compared with the heuristic algorithm, the optimal seed set can reduce the sinking cost, and the solving precision is improved under the condition of ensuring that the solving time does not fluctuate greatly.

Example 3:

in step S4, the resulting data has the optimal seed set, total number of activated nodes, total cost and external stimuli.

In this embodiment, for example, a cosmetic manufacturer promotes a new perfume, the optimal seed set may be used to help the manufacturer screen bloggers from a social network constructed on a microblog platform, and advertise on the blogger's homepage, including video, text, voice, etc., so that fans of the bloggers see advertisements, and most fans purchase, and hope that the purchase conversion efficiency of advertising is improved.

The introduction of the external incentive can reduce the node threshold, the introduction of the external incentive in the practical case can reduce the psychological expectation of the fans and increase the purchasing power, the external incentive can be discount coupons, preferential activities and the like, for example, a manufacturer places an order for a special discount link coupon, and fans unwilling to buy at the original price can be motivated, so that the number of purchasers is increased. If no external incentive is introduced, although the promotion issued by the blogger is seen, the customer is still not willing to buy at the original price, which means that the manufacturer loses the customer, but a discount ticket is issued to the customer, the probability of buying the product by the customer is greatly increased, the customer buys the product, then promotes the product in his blog, and then his fan goes to buy again, thereby achieving the maximization of the propagation influence, and the final accumulated profit is far higher than the cost of issuing the discount ticket.

In this embodiment, the data returned by the two-stage robust optimization model includes the optimal seed set, the total number of activated nodes, the total cost and the external stimulus. The disturbance propagation weight, the disturbance quantity and the output of the external excitation regulation model can be regulated according to the needs of merchants.

Example 4:

with the C & CG algorithm, the algorithm,

wherein the outer layer C&CG algorithm for determining optimal seed set

；

Wherein the inner layer C&CG Algorithm for calculating worst scenarios

。

The outer layer C & CG algorithm includes,

the main problem of the outer layer is that,

（2）

wherein,

wherein,

representing the external stimulus introduced by node j at the ith iteration,

representing the state variable of node j at the ith iteration,

representing edges

The state variables at the time of the l-th iteration,

representing an uncertain parameter at the first iteration;

the problem of the outer layer is that,

（3）。

the outer layer C & CG algorithm comprises the following steps,

step S3a1, initialization: upper boundary of problem (1)

Lower boundary of problem (1)

The number of iterations is

The convergence criterion is

；

S3a2, solving an outer layer main problem (2);

step S3a4, obtaining an optimal solution

The upper bound of the update problem (1) is

；

Step S3a5, fixing decision variables of the first stage

Solving an outer layer sub-problem (3);

step S3a6, obtaining a worst scene

And corresponding sub-problem objective function

Update

；

Step S3a7, judging whether the requirements are met

，

If the condition is not met, adding a variable

step S3a8, returning the optimal seed set

。

The outer C & CG flow chart of the two-stage robust optimization model of the invention is shown in FIG. 1.

The inner layer C & CG algorithm includes,

for external diseasesLayer problem (3), by traversing

And the equivalence problem (4) is obtained,

（4）

wherein,

wherein,

representing the external stimulus introduced by node j at the kth iteration,

representing the state variable of node j at the kth iteration,

representing edges

The state variable at the k-th iteration,

a derived deterministic parameter representing an outer layer main question;

the equivalence problem (5) is obtained by utilizing the strong dual theorem,

（5）

wherein,

are newly added variables.

The inner layer C & CG algorithm includes,

the main problem of the inner layer is that,

（6）

is a newly added variable;

the problem of the inner layer is solved,

（7）

wherein,

are uncertain parameters.

The inner layer C & CG algorithm comprises the following steps,

step S3b1, initializing decision variables of the first stage

Let the upper bound of problem (3) be

Let the lower boundary of problem (3) be

The iteration number is k =0 and the convergence criterion is

；

S3b2, solving an inner layer main problem (6);

step S3b3, obtaining a worst scene

And its corresponding targetFunction value

Update

；

Step S3b4, fixing uncertain parameters

Solving an inner layer sub-problem (7);

step S3b5, obtaining an optimal solution

Updating the upper boundary of the question (7)

；

Step S3b6, judging whether the condition is satisfied

If not, adding new variable

step S3b7, returning uncertain parameters

。

Inner layer C of two-stage robust optimization model of the invention&CG flow diagram is shown in FIG. 2, and the overall flow diagram of the two-stage robust optimization model of the present invention is shown in FIG. 3, wherein the inner dotted frame is the inner layer C&CG flow, wherein an outer layer C is arranged between an inner dotted line frame and an outer dotted line frame&And (5) CG flow. One embodiment of social network propagation of 9 nodes is shown in FIG. 4, where each ellipse represents a node in the social network, where the first row of numbers in the ellipse represents a different node number and the second row of numbers in each ellipse is the node numberA node threshold of the network node; the social network of 9 nodes in the figure, wherein,

，

introducing a virtual node s to obtain an augmented graph, wherein a solid line with an arrow indicates a directed edge of a matched node, a number beside each edge indicates a propagation weight of the edge, a numerical value of a font with a minimum size beside each node in the graph indicates a threshold value of the node, a dotted line with an arrow indicates a directed edge from the virtual node s to 9 nodes, and the propagation weight of the directed edge is fixed to be fixed

. In numerical experiments, the unit cost of the fixed selection seed set is

External excitation

Unit cost of

Due to the fact that

Is a value ratio of

The method is one order of magnitude lower, the cost of reducing the node threshold by introducing external excitation is great, and the unit yield of the activated node is

The magnitude of the propagation weight perturbation is

。

Table 1 shows a community of 9 nodesAnd (4) propagating the experimental result in the two-stage robust optimization model by using the network. And (3) carrying out sensitivity analysis on the experiment, wherein in the experiment, the solving results of the model under different parameter combinations are compared. Wherein

Indicating the total number of active nodes,

representing the external stimulus introduced and TC representing the corresponding minimum total cost under the model. Allowable perturbation ceiling at different propagation weights

Next, as the size B of the seed set increases, TC tends to stabilize gradually, the requirement for external excitation decreases to 0, and when all nodes have been activated, it does not make sense to increase the size of the seed set any more. Meanwhile, under the same seed set size B, the total number of the activated nodes is reduced and the TC is increased along with the increase of the allowable disturbance upper limit of the propagation weight.

As the propagation weight perturbation upper bound increases, the uncertainty increases and node activation becomes more difficult. The purpose of increasing the seed set size B is to make more nodes active, the smaller the total net negative benefit TC should be. However, the relationship between TC and B is not monotonous, and as B increases, TC decreases but gradually stabilizes, and external stimuli (reduced dependence on discount and other preferential activities) decrease from greater than 0 to equal to 0, i.e., the minimum TC can be reached without introducing external stimuli. And as the propagation weight allows the disturbance upper limit to increase, the total number of the activated nodes does not decrease to a certain degree, which indicates that the total number of the activated nodes can achieve net benefits under the worst scenario for the model. The model of the invention can find out the maximum influence blogger under the worst condition and return the corresponding minimum total cost. The influence and the corresponding prediction cost under the worst condition can be predicted by the merchant, and the merchant can be helped to make budget so as to help the merchant make pricing guidance.

In the problem of maximization of the influence of the social network, the method is based on a classical linear threshold model, and the propagation weight is assumed to be located in an uncertain set of a certain interval, so that uncertainty is introduced, and the fact that parameters cannot be accurately estimated in a real scene is better met; the invention designs a double-layer optimization framework and constructs an integer programming model. The method can accurately solve the optimal seed set based on the Tailored seed and constraint generation algorithm, still has popularization value on a large-scale social network due to the fact that iteration times needed by the algorithm are reduced, and the seed nodes obtained by solving are more robust due to the fact that uncertainty is introduced into parameters in the network. Therefore, the advertisement putting effect on the social platform can be effectively simulated, and pricing guidance is provided.

With the current researchers generally assuming that the parameters of the influence propagation model are determined, in the case of a linear threshold propagation model, the key parameter propagation weights thereof are estimated from data in the social network by a specific method, and thus, inaccurate parameters are obtained. Neglecting this problem, the deterministic model simplifies the complexity of the problem to some extent, but assumes a strong background and a general landing practice effect.

Therefore, the invention provides a method for solving the problem of uncertainty of social network parameters by introducing a robust optimization theory, and the uncertainty of propagation weight is mainly considered, so that the method is more consistent with the real influence diffusion range, and the greatest advantage of the method in the prior art is realized. Under the assumption, the model of the invention is more complex, and the classical algorithm is difficult to solve. Therefore, a Nested Column and Constraint Generation algorithm is proposed in a targeted manner, and the integer programming problem is solved accurately under the condition that the second-stage problem meets the Extended relative Complete recovery Property. Although the computational complexity of the method is related to the network scale, the method still has generalization capability on a large-scale network due to the fact that the iteration number of the computation is small.

the maximum influence blogger searching method can be executed.

A read storage medium characterized in that,

for storing a designated computer program, the execution of which may implement the maximum impact blogger finding method.

A terminal, comprising:

a memory for storing executable program code;

a processor;

wherein the processor is coupled with the memory;

In the description of the embodiments of the present invention, it should be understood that the terms "upper", "lower", "front", "rear", "left", "right", "vertical", "horizontal", "center", "top", "bottom", "inner", "outer", and the like indicate orientations or positional relationships.

In the description of the embodiments of the present invention, it is to be understood that "-" and "-" denote ranges of two numerical values, and the ranges include endpoints. For example, "A-B" means a range greater than or equal to A and less than or equal to B. "A to B" represents a range of A or more and B or less.

In the description of the embodiments of the present invention, the term "and/or" herein is only one kind of association relationship describing an associated object, and means that there may be three kinds of relationships, for example, a and/or B, and may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" herein generally indicates that the former and latter related objects are in an "or" relationship.

Although embodiments of the present invention have been shown and described, it will be appreciated by those skilled in the art that changes, modifications, substitutions and alterations can be made in these embodiments without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims

1. The method for searching for the maximum influence blogger is characterized by comprising the following steps of:

s2, abstracting the microblog social network into a directed acyclic graph

Obtaining information of a directed acyclic graph, wherein V represents a set of nodes, each node corresponds to one individual in a microblog social network,

The number of sides is

；

S3, establishing a two-stage robust optimization model;

step S4, the directed acyclic graph is processed

and S5, returning result data by the two-stage robust optimization model.

2. The method of claim 1, wherein in step S2,

directed acyclic graph

3. The maximum influence blogger searching method according to claim 2, wherein in step S3,

in directed acyclic graphs

A virtual node s is introduced to obtain a new augmented graph

New, new

New, new

J is a node;

hypothetical edge

Is located in the interval

Introduction of external excitation

Establishing a two-stage robust optimization model;

（1）

wherein,

representing a first stage decision as a cost of screening the seed set;

wherein,

representing the unit cost of the node j entering the selected seed set;

wherein,

the variables are 0 and 1, and the variables are,

a value equal to 1 indicates that node j is selected as the seed set,

a value of 0 indicates that node j has not been selected as the seed set;

wherein, B represents the number of seed sets;

wherein,

a decision in the second stage is shown,

wherein,

a state variable representing a node is represented by,

the state variable representing the edge is represented by,

represents an external stimulus;

wherein,

representing the external stimulus introduced by node j,

representing the unit cost of node j introducing an external stimulus,

wherein,

a state variable representing the node j is shown,

a value equal to 1 indicates that node j is activated,

equal to 0 indicates that node j is not activated;

wherein the edge

wherein,

a set of budget uncertainties is represented,

representing an allowable perturbation upper limit of the propagation weight;

wherein,

representing edges

A corresponding propagation weight;

wherein,

is a threshold value for the node(s),

represents a number greater than 10;

wherein,

representing the disturbance propagation weight.

4. The method of claim 3, wherein in step S4,

5. The method of claim 4 wherein the maximum influence blogger searches for the specific blogger,

with the C & CG algorithm, the algorithm,

wherein the outer layer C&CG algorithm for determining optimal seed set

；

Wherein the inner layer C&CG Algorithm for calculating worst scenarios

。

6. The method of claim 5, wherein the outer C & CG algorithm includes,

the main problem of the outer layer is that,

（2）

wherein,

is an intermediate variable, l represents the outer layer C&The iteration number of the CG algorithm, L represents the outer layer C&The upper limit of the number of iterations of the CG algorithm;

wherein,

representing the external stimulus introduced by node j at the ith iteration,

representing the state variable of node j at the ith iteration,

representing edges

The state variables at the time of the l-th iteration,

representing an uncertain parameter at the first iteration;

the problem of the outer layer is that,

（3）。

7. the method of claim 5 wherein the outer C & CG algorithm includes the steps of,

step S3a1, initialization: upper boundary of problem (1)

Lower boundary of problem (1)

The number of iterations is

The convergence criterion is

；

S3a2, solving an outer layer main problem (2);

step S3a4, obtaining an optimal solution

The upper bound of the update problem (1) is

；

Step S3a5, fixing decision variables of the first stage

Solving an outer layer sub-problem (3);

step S3a6, obtaining a worst fieldLandscape

And corresponding sub-problem objective function

Update

；

Step S3a7, judging whether the requirements are met

，

If the condition is not met, adding a variable

step S3a8, returning to the optimal seed set

。

8. The method of claim 5, wherein the inner C & CG algorithm includes,

for the outer sub-problem (3), go through

And the equivalence problem (4) is obtained,

（4）

wherein,

wherein,

representing the external stimulus introduced by node j at the kth iteration,

representing the state variable of node j at the kth iteration,

representing edges

The state variable at the time of the kth iteration,

a derived deterministic parameter representing an outer layer main question;

the equivalence problem (5) is obtained by utilizing the strong dual theorem,

（5）

wherein,

is a newly added variable.

9. The maximum influence blogger finding method according to claim 5, wherein the inner C & CG algorithm comprises,

the main problem of the inner layer is that,

（6）

is a newly added variable;

the problem of the inner layer is solved,

（7）

wherein,

are uncertain parameters.

10. The method of claim 5 wherein the inner C & CG algorithm includes the steps of,

step S3b1, initializing decision variables of the first stage

Let the upper boundary of the problem (3) be

Let the lower boundary of problem (3) be

The iteration number is k =0 and the convergence criterion is

；

S3b2, solving an inner layer main problem (6);

step S3b3, obtaining a worst scene

And its corresponding objective function value

Update

；

Step S3b4, fixing uncertain parameters

Solving an inner layer sub-problem (7);

step S3b5, obtaining an optimal solution

Update the upper boundary of the problem (7)

；

Step S3b6, judging whether the condition is satisfied

If not, adding new variable

step S3b7, returning uncertain parameters

。

11. A searching system for the most influential blogger is characterized by comprising,

a method of searching for a maximum influence blogger according to any one of claims 1 to 10 may be performed.

12. A read storage medium characterized in that,

for storing a specific computer program, the execution of which can implement the maximum influence blogger finding method of any one of claims 1-10.

13. A terminal, comprising:

a memory for storing executable program code;

a processor;

wherein the processor is coupled with the memory;

the processor calls the executable program code stored in the memory to execute the maximum impact blogger finding method according to any one of claims 1-10.