CN113573264A

CN113573264A - Pricing processing method and device of 5G slice based on deep reinforcement learning

Info

Publication number: CN113573264A
Application number: CN202010352035.7A
Authority: CN
Inventors: 邢彪; 郑屹峰; 张卷卷; 陈维新; 章淑敏
Original assignee: China Mobile Communications Group Co Ltd; China Mobile Group Zhejiang Co Ltd
Current assignee: China Mobile Communications Group Co Ltd; China Mobile Group Zhejiang Co Ltd
Priority date: 2020-04-28
Filing date: 2020-04-28
Publication date: 2021-10-29

Abstract

The invention discloses a pricing processing method and a pricing processing device of a 5G slice based on deep reinforcement learning, wherein the method comprises the following steps: acquiring historical user side slice use state data and corresponding slice pricing adjustment action data; training historical user side slice use state data and corresponding slice pricing adjustment action data based on a deep reinforcement learning algorithm to obtain a slice pricing model; acquiring the use state data of the user side slice to be processed, and inputting the use state data of the user side slice to be processed into a slice pricing model for calculation to obtain a slice pricing adjustment action result; and providing the slice pricing adjustment action result to the charging center so that the charging center can execute the corresponding pricing adjustment action. By the method, flexibility, rationality and accuracy of network slice pricing can be improved, and slice differentiated pricing can be achieved more effectively.

Description

Pricing processing method and device of 5G slice based on deep reinforcement learning

Technical Field

The invention relates to the technical field of data processing, in particular to a pricing processing method and device of a 5G slice based on deep reinforcement learning.

Background

A Network Slice (Network Slice) is one of the main usage modes of a 5G Network, the Network Slice is an end-to-end logic function and a physical or virtual resource set required by the end-to-end logic function, and includes an access Network, a transmission Network, a core Network and the like, and the Network Slice can be regarded as a virtualized "private Network" in the 5G Network; the network slice is constructed based on a unified infrastructure of network function virtualization, and low-cost and efficient operation is achieved. Network slice techniques may enable logical isolation of a communication network, allowing network elements and functionality to be configured and reused in each network slice to meet specific industry application needs. Network slices allow network elements and functions to be easily configured and reused in each network slice to meet specific requirements. The implementation of network slices is considered to include end-to-end functionality of the core network and the radio access network. Each slice may have its own network architecture, engineering mechanism, and network configuration.

The network slice can provide customized service for each user, and further provides differentiated pricing, and an effective pricing strategy can not only promote the increase of user quantity and income, but also improve the use efficiency of the network.

However, the inventor finds out in the process of implementing the invention that: in the prior art, the slices are usually priced manually, the pricing strategy is not flexible enough, the self-adaption can not be realized according to the change of the environment, and the mode of the differential pricing of the slices can not be adapted.

Disclosure of Invention

In view of the above, the present invention is proposed to provide a method and apparatus for pricing a 5G slice based on deep reinforcement learning that overcomes or at least partially solves the above mentioned problems.

According to one aspect of the invention, a method for pricing a 5G slice based on deep reinforcement learning is provided, which comprises the following steps:

acquiring historical user side slice use state data and corresponding slice pricing adjustment action data;

training historical user side slice use state data and corresponding slice pricing adjustment action data based on a deep reinforcement learning algorithm to obtain a slice pricing model;

acquiring the use state data of the user side slice to be processed, and inputting the use state data of the user side slice to be processed into a slice pricing model for calculation to obtain a slice pricing adjustment action result;

and providing the slice pricing adjustment action result to the charging center so that the charging center can execute the corresponding pricing adjustment action.

Optionally, the method further comprises:

acquiring the use state data of the user side slice after pricing adjustment; acquiring the slice income data after pricing adjustment from the charging center;

calculating according to the user side slice use state data after pricing adjustment and the slice income data after pricing adjustment to obtain a return value;

and feeding back the return value to the slice pricing model so that the slice pricing model can carry out tuning treatment according to the return value.

Optionally, after acquiring the historical user-side slice usage status data, the method further comprises:

carrying out normalization processing on the historical user side slice use state data;

and performing conversion processing on the normalized historical user side slice use state data.

Optionally, providing the slice pricing adjustment action result to the billing center further comprises:

and sending the slice pricing adjustment action result to a network slice management function module so that the network slice management function module can judge whether the slice pricing adjustment action result is effective, and if so, sending a pricing adjustment instruction to a charging center.

Optionally, the user-side slice usage status data specifically includes one or more of the following:

user service level agreement requirements, user usage duration.

According to another aspect of the present invention, there is provided a device for pricing a 5G slice based on deep reinforcement learning, including:

the data acquisition module is suitable for acquiring historical user side slice use state data and corresponding slice pricing adjustment action data;

the model training module is suitable for training historical user side slice use state data and corresponding slice pricing adjustment action data based on a deep reinforcement learning algorithm to obtain a slice pricing model;

the data processing module is suitable for acquiring the using state data of the user side slice to be processed, and inputting the using state data of the user side slice to be processed into the slice pricing model for calculation to obtain a slice pricing adjustment action result;

and the data transmission module is suitable for providing the slice pricing adjustment action result to the charging center so that the charging center can execute the corresponding pricing adjustment action.

Optionally, the data acquisition module is further adapted to: acquiring the use state data of the user side slice after pricing adjustment; acquiring the slice income data after pricing adjustment from the charging center;

the data processing module is further adapted to: calculating according to the user side slice use state data after pricing adjustment and the slice income data after pricing adjustment to obtain a return value;

the data transmission module is further adapted to: and feeding back the return value to the slice pricing model so that the slice pricing model can carry out tuning treatment according to the return value.

Optionally, the data processing module further comprises:

Optionally, the data transmission module is further adapted to:

and sending the slice pricing adjustment action result to a network slice management function module so that the network slice management function module can judge whether the slice pricing adjustment action result is effective, and if so, sending the slice pricing adjustment action result to a charging center.

user service level agreement requirements, user usage duration.

According to yet another aspect of the present invention, there is provided a computing device comprising: the system comprises a processor, a memory, a communication interface and a communication bus, wherein the processor, the memory and the communication interface complete mutual communication through the communication bus;

the memory is used for storing at least one executable instruction, and the executable instruction enables the processor to execute the operation corresponding to the pricing processing method of the 5G slice based on the deep reinforcement learning.

According to still another aspect of the present invention, there is provided a computer storage medium having at least one executable instruction stored therein, the executable instruction causing a processor to perform operations corresponding to the above-mentioned deep reinforcement learning-based pricing processing method for 5G slices.

According to the pricing processing method and device of the 5G slice based on the deep reinforcement learning, historical user side slice use state data and corresponding slice pricing adjustment action data are obtained; training historical user side slice use state data and corresponding slice pricing adjustment action data based on a deep reinforcement learning algorithm to obtain a slice pricing model; acquiring the use state data of the user side slice to be processed, and inputting the use state data of the user side slice to be processed into a slice pricing model for calculation to obtain a slice pricing adjustment action result; and providing the slice pricing adjustment action result to the charging center so that the charging center can execute the corresponding pricing adjustment action. The method utilizes the advantages of deep reinforcement learning in processing high-dimensional states and discrete actions, trains according to the user side slice use states in historical time periods and the selected corresponding slice pricing adjusting actions, obtains a slice pricing model, and further determines the optimal slice instance price adjusting actions according to the user side slice use states to be processed, so that the flexibility, the reasonability and the accuracy of network slice pricing are improved, slice differentiated pricing is more effectively realized, and further the user quantity is promoted to be improved, the slice income is improved, and the use efficiency of a network is improved.

The foregoing description is only an overview of the technical solutions of the present invention, and the embodiments of the present invention are described below in order to make the technical means of the present invention more clearly understood and to make the above and other objects, features, and advantages of the present invention more clearly understandable.

Drawings

Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the invention. Also, like reference numerals are used to refer to like parts throughout the drawings. In the drawings:

FIG. 1 is a flow chart of a method for pricing 5G slices based on deep reinforcement learning according to an embodiment of the present invention;

FIG. 2 is a flowchart illustrating a method for pricing 5G slices based on deep reinforcement learning according to another embodiment of the present invention;

FIG. 3 is a schematic diagram of a network architecture of a 5G network slice in an embodiment of the present invention;

FIG. 4 is a diagram illustrating a deep reinforcement learning model in an embodiment of the invention;

FIG. 5 is a flow diagram illustrating a method for pricing a 5G slice in one embodiment of the invention;

FIG. 6 is a schematic structural diagram of a device for pricing 5G slices based on deep reinforcement learning according to an embodiment of the present invention;

fig. 7 is a schematic structural diagram of a computing device provided by an embodiment of the present invention.

Detailed Description

Exemplary embodiments of the present invention will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the invention are shown in the drawings, it should be understood that the invention can be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the invention to those skilled in the art.

Fig. 3 is a schematic diagram illustrating a network architecture of a 5G network slice in an embodiment of the present invention, where a CSMF (Communication Service Management Function) is responsible for completing requirement subscription and processing of a user Service Communication Service, converting a Communication Service requirement of an operator/third-party client into a requirement for a network slice, and sending the requirement for the network slice (e.g., a request for creating, terminating, modifying an instance of the network slice, etc.) to an NSMF through an interface with the NSMF.

The NSMF (Network Slice Management Function) is responsible for receiving Network Slice requirements sent by the CSMF, managing life cycle, performance, faults and the like of the Network Slice examples, arranging the composition of the Network Slice examples, decomposing the requirements of the Network Slice examples into the requirements of each Network Slice subnet example or Network Function, and sending Network Slice subnet example Management requests to each NSSMF.

The NSSMF (Network Slice Subnet Management Function) receives a Network Slice Subnet deployment requirement issued by the NSMF, manages a Network Slice Subnet instance, organizes the composition of the Network Slice Subnet instance, maps the SLA requirement of the Network Slice Subnet into a QoS (Quality of Service) requirement of a Network Service, and issues a deployment request of the Network Service to the NFVO system of the ETSI NFV domain.

In the embodiment of the invention, a slice pricing model is trained by adopting a deep reinforcement learning mode, wherein the reinforcement learning (reinforcement learning) comprises three elements of state (state), action (action) and reward (reward). The agent needs to take actions according to the current state, and after obtaining the corresponding rewards, the agent improves the actions, so that the agent can take more optimal actions when the agent reaches the same state next time. By training the reinforcement learning algorithm model, the model can fully learn the rules of the complex external environment, make correct actions under different environments and obtain higher accumulated return in long-term interaction.

Q-Learning is a value-based algorithm in the reinforcement Learning algorithm, i.e. the focus is to train a evaluator (critic). Q is Q (S, a), namely in the S State (S belongs to S) at a certain moment, the expectation that the profit can be obtained by taking the Action a (a belongs to A) is taken, the environment can feed back the corresponding rewarded r according to the Action of agent, so the main idea of the algorithm is to construct State and Action into a Q-table to store the Q value, and then the Action capable of obtaining the maximum profit is selected according to the Q value. Although the Q-table method is easy to implement, it will be very time-consuming to train agent through the Q-table method when the space of states and actions is more and more complicated, so the embodiment of the present invention adopts a deep neural network as a function for estimating the Q value, and the deep reinforcement learning method is called DQN.

DQN (Deep Q-network) is a combination of Q-Learning and Deep Learning, i.e., Learning data using neural networks. The DQN does not record the Q value by using a Q table, but predicts the Q value by using a deep neural network to represent a cost function, and learns the optimal action path by continuously updating the neural network. Within the DQN there are two neural networks, one being a relatively fixed-parameter network target-net, used to obtain the Q-target (Q)^target) And the other is eval _ net used to obtain the value of Q evaluation (Q-eval). The Q value is updated according to the following rule:

wherein s is_tIndicates the state at time t, a_tA slice pricing adjustment operation representing the selection at time t, and a pricing adjustment operation a_tAfter that, the state is represented by s_tConversion to s_t+1。

R is the weighted sum of the prize values earned by all activities from the current state until some future state. The goal of DQN is to learn a strategy that maximizes the reduced cumulative reward (discrete cumulative reward) for T time steps:

the Q function may be defined as the expectation of a reduced cumulative reward based on the current state and the selected action, all subsequent actions being made according to policy π:

Q^π(s,a)＝∑s,a[R]

the final goal of learning is to find a strategy that maximizes the Q function:

fig. 1 shows a flowchart of a method for pricing a 5G slice based on deep reinforcement learning according to an embodiment of the present invention, and as shown in fig. 1, the method includes the following steps:

step S110, obtaining historical user side slice using state data and corresponding slice pricing adjustment action data.

Historical user-side slice usage status data is obtained from the NSMF (network slice management function), and pricing adjustment actions corresponding to the selected slice instances are obtained.

The user side slice use state data includes SLA (Service level Agreement) requirements, user use amount, and user use duration. The SLA includes security/privacy, visibility/manageability, reliability/availability, as well as specific traffic characteristics (traffic type, air interface requirements, customized network functions, etc.) and corresponding performance indicators (delay, throughput, packet loss, call drop, etc.). User SLA requirements include: latency (e.g., less than 5ms), throughput, packet loss, dropped call rate, reliability (e.g., 99.999%), service scope, user size, isolation (e.g., strong, medium, weak), security (e.g., strong, medium, weak), access mode, max TP/site (e.g., 5Gbps), etc.

The user side slice use state data also comprises user use amount, namely the number of slice flow or service request used by the user; the user usage duration, i.e. the duration the user uses the slice instance, is also included.

The slice pricing adjustment action belongs to a discrete action space type, and is divided into 201 discrete actions in the embodiment, that is, m% is increased or decreased on the basis of an original price, and m can be an integer between-100 and 100.

And step S120, training historical user side slice use state data and corresponding slice pricing adjustment action data based on a depth reinforcement learning algorithm to obtain a slice pricing model.

And taking the historical user side slice use state data and the corresponding slice pricing adjustment action data as training samples, and training by adopting a deep reinforcement learning algorithm to obtain a slice pricing model.

And step S130, acquiring the using state data of the user side slice to be processed, inputting the using state data of the user side slice to be processed into the slice pricing model for calculation, and obtaining the slice pricing adjustment action result.

After the slice pricing model is obtained through training, the user side slice using state data to be processed are obtained from the NSMF and input into the slice pricing model for calculation, and the slice pricing model outputs a slice pricing adjusting action to obtain a slice pricing adjusting action result.

Step S140, the slice pricing adjustment action result is sent to the charging center, so that the charging center can execute the corresponding pricing adjustment action.

And finally, issuing the slice pricing adjustment action result to a charging center, and executing a corresponding pricing adjustment action by the charging center.

According to the pricing processing method of the 5G slice based on the deep reinforcement learning, provided by the embodiment of the invention, by utilizing the advantages of the deep reinforcement learning in processing a high-dimensional state and discrete actions, training is carried out according to the user side slice using state in a historical time period and the selected corresponding slice pricing adjusting action to obtain a slice pricing model, and then the optimal slice instance price adjusting action is determined according to the user side slice using state to be processed, so that the flexibility, the reasonability and the accuracy of network slice pricing are improved, the slice differentiated pricing is more effectively realized, the user quantity is promoted to be increased, the slice income is increased, and the using efficiency of a network is improved.

Fig. 2 is a flowchart illustrating a method for pricing a 5G slice based on deep reinforcement learning according to another embodiment of the present invention, and as shown in fig. 2, the method includes the following steps:

step S210, obtaining historical user side slice using state data and corresponding slice pricing adjustment action data.

Historical user-side slice usage status is obtained from the NSMF, and corresponding selected slice instance pricing adjustment actions are obtained. The user side slice use state data comprises user service level agreement requirements, user use amount and user use duration.

Step S220 is to normalize the historical user-side slice usage state data and convert the normalized historical user-side slice usage state data.

After acquiring the historical user side slice use state data, the data is subjected to normalization processing and conversion processing. Wherein, the conversion processing is to convert the user side slice using state data into a machine-recognizable form, and specifically comprises: the non-numerical requirement attribute is converted into the numerical type, and all the attributes are standardized. The calculation is done separately for each dimension, subtracting the mean from the data by attribute (e.g., by column) and dividing by the variance. By standardizing the data, the convergence rate of the slice pricing model can be increased, and the accuracy of the model can be improved.

The acquired data set is then divided into training data and test data, for example, taking 80% of the entire data set as training data and the remaining 20% as test data. And training by using a training set, so that the closer the reconstructed data is to the original data, the better the reconstructed data is, and evaluating the verification model by using a test set.

And step S230, training the historical user side slice use state data subjected to normalization processing and conversion processing and corresponding slice pricing adjustment action data based on a depth reinforcement learning algorithm to obtain a slice pricing model.

In the embodiment of the invention, a deep learning framework is used for building a DQN-based deep reinforcement learning model, and a judging device formed by a deep neural network is built to estimate a Q function for evaluating a pricing adjustment action of a slice example. Respectively inputting historical user side slice use state data s comprising user SLA requirements, user use amount, user use duration and corresponding selected slice pricing adjustment action a, outputting value Q (s, a) selected by the action, and outputting the value output by the evaluation device and a target Q value Q^target(s_i,a_i) And comparing and calculating errors, and feeding error signals back to the deep neural network, so that the accuracy of the model is gradually improved, and a pricing adjusting action capable of maximizing the income value of the slice example is selected. The error calculation mode is as follows:

error＝(Q(s_i,,a_i)-Q^target(s_i,,a_i))²＝(Q(s_i,,a_i)-(r_i+max_a(Q^target(s_i+1,,a))))²

fig. 4 shows a schematic diagram of a deep reinforcement learning model in an embodiment of the present invention, where an input layer 1 receives a user-side slice use state(s) of each current network slice, and the input layer 1 passes through two full connection layers (sense), and respectively sets 128 and 64 neurons, where activation functions are both "relu";

input layer 2 receives the corresponding slice instance pricing adjustment action. An input layer 2 passes through two layers of fully connected layers (Dense), 32 neurons and 16 neurons are respectively set, and activation functions are 'relu';

then combining actions and states through a combining layer (merge), respectively setting 64 and 32 neurons through two full-connection layers (Dense), and enabling an activation function to be 'relu'; a random discard layer (dropout) is arranged after the two fully connected layers: the rejection probability is set to be 0.2, and the input neurons are randomly disconnected according to a certain probability (20%) when parameters are updated every time in the training process, so that overfitting is prevented;

the output layer consists of 1 fully connected neuron and outputs a Q value for judging the slice example pricing adjustment action executed in the use state of the user side slice.

The training data is randomly extracted from a memory library that records the actions, rewards, and results (s, a, r, s') for each state. The memory bank is limited in size so that when the data is full, the next data overwrites the first data in the memory bank. In an embodiment of the invention, empirical playback is used to save all stages of data to one playback memory. When the neural network is trained, the neural network is selected from small random batches to be updated instead of using the latest neural network, so that the problem of correlation among samples is solved, and the stability of the system is greatly improved.

Preferably, in order to avoid action selection limitation and enrich data collection, a greedy algorithm is introduced to select the self-healing action: the actions are selected randomly with the epsilon probability and the best known action is selected with the 1-epsilon probability. With the continuous and deep learning, the value of epsilon can become smaller and smaller, and the learning mode is changed from full exploration to deep research.

In the embodiment of the invention, the specific training process is as follows:

initializing the Q function with random weights such that the target Q function Q^targetQ. At each time step t of each round:

(1) given an initialized user-side slice usage state s_tGiving out a slice example pricing adjustment action a based on a greedy algorithm_t；

(2) Calculating to obtain a report r_tTo achieve a new user side slice using state s_t+1；

(3) Will be at time t(s)_t,a_t,r_t,s_t+1) Storing the data into a playback cache;

(4) extract(s) from the playback buffer_i,a_i,r_i,s_i+1) Typically the number of batches;

(5) calculating a target value y-r_i+maxQ^target(s_i+1,a)；

(6) Updating parameters of Q function neural network to make Q(s)_i,,a_i) Is close to the target value y;

(7) assigning the updated Q function neural network weight to Q^target＝Q。

And finally, when the trained model converges, after the off-line training is finished, deriving the calculated neural network weight, so as to obtain the slice pricing model.

In particular, the model is trained for 1000 rounds (epochs 1000), the batch size is set to 32(batch _ size 32), and the playback buffer size is set to 50000. The mean absolute value error mse (mean Squared error) is selected as a loss function, i.e., an objective function (loss ═ mse '), and the gradient descent optimization algorithm selects an adam optimizer for improving the learning speed of the conventional gradient descent (optimizer ═ adam'). The neural network can find the optimal weight value which enables the target function to be minimum through gradient descent, the training error is gradually descended along with the increase of the number of training rounds, and the model is gradually converged.

And S240, acquiring the using state data of the user side slice to be processed, inputting the using state data of the user side slice to be processed into the slice pricing model for calculation, and obtaining a slice pricing adjustment action result.

And step S250, sending the slice pricing adjustment action result to the network slice management function module so that the network slice management function module can judge whether the slice pricing adjustment action result is effective, and if so, sending a pricing adjustment instruction to the charging center so that the charging center can execute the corresponding pricing adjustment action.

Sending the slice pricing adjustment action result to NSMF, judging whether the pricing adjustment action needs to be triggered or not by the NSMF, specifically judging whether the slice pricing adjustment action result is zero or not, and if so, indicating that the slice pricing adjustment action result is invalid; otherwise, the slice pricing adjustment result is valid. If the slice pricing adjustment action result is valid, the slice pricing adjustment action result is sent to the charging center, and the charging center executes the corresponding pricing adjustment action.

And step S260, acquiring the user-side slice use state data after pricing adjustment, and acquiring the slice income data after pricing adjustment from the charging center.

Step S270, calculating according to the user side slice use state data after pricing adjustment and the slice income data after pricing adjustment to obtain a return value; and feeding back the return value to the slice pricing model so that the slice pricing model can carry out tuning treatment according to the return value.

And obtaining the use state data of the user side slice after pricing adjustment, obtaining the income value of the slice after pricing adjustment from a charging center, calculating a return value according to the use state data of the user side slice after pricing adjustment and the income value of the slice after pricing adjustment, feeding the return value back to the slice pricing model, and adjusting the pricing model according to the return value. If the revenue value is elevated from the revenue value in the previous time period, the return value is positive, otherwise the return value is negative.

In specific implementation, the method of this embodiment may be executed according to a predetermined time period, for example, in the process of training the model, the user-side slice usage state data in a plurality of historical periods and the slice pricing action data selected in each period are used as training data; when the slice pricing adjustment action is formulated, the user side slice using state data in the current period is acquired and input into the slice pricing model for calculation, and the corresponding slice pricing adjustment action is output.

Fig. 5 is a schematic flow chart illustrating a pricing processing method for a 5G slice in an embodiment of the present invention, and as shown in fig. 5, the flow includes:

step 1, inputting the use state of the user side slice into a network slice pricing model based on DQN obtained by pre-training in each period.

And step 2, sending the pricing adjustment result output by the pricing model to the NSMF network slice management function so that the NSMF can judge whether the pricing adjustment operation needs to be triggered.

And 3, if the NSMF judges that the pricing adjustment operation needs to be triggered, sending a pricing adjustment instruction to the charging center.

And 4, the charging center implements a pricing adjustment action according to the received pricing adjustment instruction so as to charge the slice user.

And 5, the slice user sends the user side slice use state data after pricing adjustment to a return function.

And step 6, the return function feeds back the return value of the action to the DQN-based network slice pricing model through calculation so that the network slice pricing model can conduct optimization according to the return value.

According to the pricing processing method of the 5G slice based on the depth reinforcement learning, provided by the embodiment of the invention, the advantages of DQN in the depth reinforcement learning in processing high-dimensional states and discrete actions are utilized, training is carried out according to the user side slice using state in a historical time period and the correspondingly selected slice pricing adjusting action, a slice pricing model is obtained, the optimal slice instance price adjusting action is determined based on the pricing model, the pricing adjusting action capable of maximizing the slice instance income value can be selected according to the current user side slice using state, so that the flexibility, rationality and accuracy of network slice pricing can be improved, and the slice differentiation pricing can be more effectively realized. And, by calculating the return value brought by the adjusted slice pricing, the slice pricing model is optimized according to the return value, and the slice pricing model is promoted to output more optimal pricing adjustment action.

Fig. 6 is a schematic structural diagram of a deep reinforcement learning-based 5G slice pricing processing apparatus provided by an embodiment of the present invention, and as shown in fig. 6, the apparatus includes:

the data acquisition module 61 is suitable for acquiring historical user side slice use state data and corresponding slice pricing adjustment action data;

the model training module 62 is adapted to train historical user side slice use state data and corresponding slice pricing adjustment action data based on a deep reinforcement learning algorithm to obtain a slice pricing model;

the data processing module 63 is adapted to obtain the use state data of the user side slice to be processed, and input the use state data of the user side slice to be processed into the slice pricing model for calculation to obtain a slice pricing adjustment action result;

and the data transmission module 64 is suitable for providing the slice pricing adjustment action result to the charging center so that the charging center can execute the corresponding pricing adjustment action.

In an alternative manner, the data acquisition module 61 is further adapted to: acquiring the use state data of the user side slice after pricing adjustment; acquiring the slice income data after pricing adjustment from the charging center;

the data processing module 63 is further adapted to: calculating according to the user side slice use state data after pricing adjustment and the slice income data after pricing adjustment to obtain a return value;

the data transmission module 64 is further adapted to: and feeding back the return value to the slice pricing model so that the slice pricing model can carry out tuning treatment according to the return value.

In an optional manner, the data processing module 63 further includes:

In an alternative manner, the data transmission module 64 is further adapted to:

In an alternative approach, the user-side slice usage status data specifically includes one or more of the following:

user service level agreement requirements, user usage duration.

Embodiments of the present invention provide a non-volatile computer storage medium, where at least one executable instruction is stored in the computer storage medium, and the computer executable instruction may execute the method for pricing a 5G slice based on deep reinforcement learning in any of the above method embodiments.

The executable instructions may be specifically configured to cause the processor to:

In an alternative, the executable instructions cause the processor to:

In an alternative, the executable instructions cause the processor to: after acquiring historical user side slice use state data, carrying out normalization processing on the historical user side slice use state data; and performing conversion processing on the normalized historical user side slice use state data.

In an alternative, the executable instructions cause the processor to:

In an alternative, the user service level agreement requirements, user usage duration.

Fig. 7 is a schematic structural diagram of an embodiment of a computing device according to the present invention, and a specific embodiment of the present invention does not limit a specific implementation of the computing device.

As shown in fig. 7, the computing device may include: a processor (processor)702, a Communications Interface 704, a memory 706, and a communication bus 708.

Wherein: the processor 702, communication interface 704, and memory 706 communicate with each other via a communication bus 708. A communication interface 704 for communicating with network elements of other devices, such as clients or other servers. The processor 702, configured to execute the program 710, may specifically execute relevant steps in the above-described method for pricing a 5G slice based on deep reinforcement learning for a computing device.

In particular, the program 710 may include program code that includes computer operating instructions.

The processor 702 may be a central processing unit CPU, or an Application Specific Integrated Circuit (ASIC), or one or more Integrated circuits configured to implement an embodiment of the present invention. The computing device includes one or more processors, which may be the same type of processor, such as one or more CPUs; or may be different types of processors such as one or more CPUs and one or more ASICs.

The memory 706 stores a program 710. The memory 706 may comprise high-speed RAM memory, and may also include non-volatile memory (non-volatile memory), such as at least one disk memory.

The program 710 may specifically be used to cause the processor 702 to perform the following operations:

In an alternative, the program 710 causes the processor 702 to:

In an alternative, the program 710 causes the processor 702 to: after acquiring historical user side slice use state data, carrying out normalization processing on the historical user side slice use state data; and performing conversion processing on the normalized historical user side slice use state data.

In an alternative, the program 710 causes the processor 702 to: and sending the slice pricing adjustment action result to a network slice management function module so that the network slice management function module can judge whether the slice pricing adjustment action result is effective, and if so, sending a pricing adjustment instruction to a charging center.

In an alternative approach, the user-side slice usage status data specifically includes one or more of the following: user service level agreement requirements, user usage duration.

The algorithms or displays presented herein are not inherently related to any particular computer, virtual system, or other apparatus. Various general purpose systems may also be used with the teachings herein. The required structure for constructing such a system will be apparent from the description above. In addition, embodiments of the present invention are not directed to any particular programming language. It is appreciated that a variety of programming languages may be used to implement the teachings of the present invention as described herein, and any descriptions of specific languages are provided above to disclose the best mode of the invention.

In the description provided herein, numerous specific details are set forth. It is understood, however, that embodiments of the invention may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.

Similarly, it should be appreciated that in the foregoing description of exemplary embodiments of the invention, various features of the embodiments of the invention are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the invention and aiding in the understanding of one or more of the various inventive aspects. However, the disclosed method should not be interpreted as reflecting an intention that: that the invention as claimed requires more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the detailed description are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate embodiment of this invention.

Those skilled in the art will appreciate that the modules in the device in an embodiment may be adaptively changed and disposed in one or more devices different from the embodiment. The modules or units or components of the embodiments may be combined into one module or unit or component, and furthermore they may be divided into a plurality of sub-modules or sub-units or sub-components. All of the features disclosed in this specification (including any accompanying claims, abstract and drawings), and all of the processes or elements of any method or apparatus so disclosed, may be combined in any combination, except combinations where at least some of such features and/or processes or elements are mutually exclusive. Each feature disclosed in this specification (including any accompanying claims, abstract and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.

Furthermore, those skilled in the art will appreciate that while some embodiments herein include some features included in other embodiments, rather than other features, combinations of features of different embodiments are meant to be within the scope of the invention and form different embodiments. For example, in the following claims, any of the claimed embodiments may be used in any combination.

The various component embodiments of the invention may be implemented in hardware, or in software modules running on one or more processors, or in a combination thereof. Those skilled in the art will appreciate that a microprocessor or Digital Signal Processor (DSP) may be used in practice to implement some or all of the functionality of some or all of the components according to embodiments of the present invention. The present invention may also be embodied as apparatus or device programs (e.g., computer programs and computer program products) for performing a portion or all of the methods described herein. Such programs implementing the present invention may be stored on computer-readable media or may be in the form of one or more signals. Such a signal may be downloaded from an internet website or provided on a carrier signal or in any other form.

It should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and that those skilled in the art will be able to design alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The invention may be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The usage of the words first, second and third, etcetera do not indicate any ordering. These words may be interpreted as names. The steps in the above embodiments should not be construed as limiting the order of execution unless specified otherwise.

Claims

1. A pricing processing method of a 5G slice based on deep reinforcement learning comprises the following steps:

training the historical user side slice use state data and corresponding slice pricing adjustment action data based on a deep reinforcement learning algorithm to obtain a slice pricing model;

acquiring user side slice use state data to be processed, and inputting the user side slice use state data to be processed into the slice pricing model for calculation to obtain a slice pricing adjustment action result;

and providing the slice pricing adjustment action result to a charging center so that the charging center can execute a corresponding pricing adjustment action.

2. The method of claim 1, further comprising:

3. The method of claim 1, wherein after obtaining historical user-side slice usage status data, the method further comprises:

4. The method of claim 1, wherein providing the slice pricing adjustment action result to a billing center further comprises:

5. The method according to claim 1, wherein the user-side slice usage status data specifically comprises one or more of:

user service level agreement requirements, user usage duration.

6. A deep reinforcement learning-based pricing processing device for 5G slices, comprising:

the model training module is suitable for training the historical user side slice use state data and the corresponding slice pricing adjustment action data based on a deep reinforcement learning algorithm to obtain a slice pricing model;

and the data transmission module is suitable for providing the slice pricing adjustment action result to a charging center so that the charging center can execute the corresponding pricing adjustment action.

7. The apparatus of claim 6, wherein the data acquisition module is further adapted to: acquiring the use state data of the user side slice after pricing adjustment; acquiring the slice income data after pricing adjustment from the charging center;

8. The apparatus of claim 6, wherein the data processing module further comprises:

9. A computing device, comprising: the system comprises a processor, a memory, a communication interface and a communication bus, wherein the processor, the memory and the communication interface complete mutual communication through the communication bus;

the memory is used for storing at least one executable instruction, and the executable instruction causes the processor to execute the operation corresponding to the pricing processing method of the 5G slice based on the deep reinforcement learning of any one of claims 1-5.

10. A computer storage medium having stored therein at least one executable instruction to cause a processor to perform operations corresponding to the deep reinforcement learning based 5G slice pricing processing method of any of claims 1-5.