CN112446501B

CN112446501B - Method, device and system for acquiring cache allocation model in real network environment

Info

Publication number: CN112446501B
Application number: CN202011197526.5A
Authority: CN
Inventors: 王文东; 崔勇; 阙喜戎; 龚向阳; 单安童; 王莫为; 黄思江; 彭德平; 成晓雨
Original assignee: Beijing University of Posts and Telecommunications
Current assignee: Beijing University of Posts and Telecommunications
Priority date: 2020-10-30
Filing date: 2020-10-30
Publication date: 2023-04-21
Anticipated expiration: 2040-10-30
Also published as: CN112446501A

Abstract

The embodiment of the invention provides a method, a device and a system for acquiring a cache allocation model in a real network environment. The method for acquiring the cache allocation model in the real network environment loads a loadable reinforcement learning model in the switch; when the data transmission triggers a preset allocation condition, a loadable reinforcement learning model is utilized to acquire a cache threshold value of the switch, and cache allocation is carried out; training the reinforcement learning model to be trained by utilizing training data generated by the triggering, and obtaining and storing the loadable reinforcement learning model; and if the number of the saved loadable reinforcement learning models does not meet the preset number condition, returning to execute loading the loadable reinforcement learning models in the switch, otherwise, selecting the loadable reinforcement learning model with the corresponding reward value meeting the preset reward value condition from the saved loadable reinforcement learning models as a cache allocation model. The scheme can improve the acquisition efficiency of the cache allocation model.

Description

Method, device and system for acquiring cache allocation model in real network environment

Technical Field

The present invention relates to the field of network cache allocation technologies, and in particular, to a method, an apparatus, and a system for obtaining a reinforcement learning cache allocation model in a real network environment.

Background

The allocation of shared buffers in the switches is used to optimize the efficiency of data transmission in the network. The goal of shared buffer allocation is to dynamically allocate the shared buffer in the switch, and the data exceeding the transmission capacity of the switch port is temporarily stored by the shared buffer to the maximum extent, so that the retransmission of the data transmitted by the switch is reduced, the completion time of the data transmission is shortened, and the transmission efficiency of the data in the network is improved.

In the related art, shared cache allocation may be performed based on deep reinforcement learning. In particular, statistical information may be simulated in a simulator, such as NS-3 (a discrete event simulator). The statistical information is used for indicating the data transmission condition of the switch, such as the data size and packet loss rate of the switch transmission. And using the obtained statistical information to train the reinforcement learning model, and loading the trained reinforcement learning model on the switch as a cache allocation model. Therefore, the switch can input the current statistical information of the switch into the deep cache allocation model to obtain the cache threshold of the shared cache, and allocate the shared cache by using the cache threshold.

However, since the simulator generally cannot truly reflect the real network situation, obtaining the cache allocation model with the simulator is likely not accurate enough. In addition, when training by using the simulator, in order to ensure uniformity and accuracy, the simulator usually performs steps serially, for example, when simulating data transmission, only performs the step of data transmission, and other steps cannot be performed, so that the training process is easy to stop waiting for the completion of the execution of the previous step for a long time, and the training efficiency is low.

Disclosure of Invention

The embodiment of the invention aims to provide a method, a device and a system for obtaining a reinforcement learning cache allocation model in a real network environment, so as to achieve the effects of improving the accuracy of the cache allocation model obtained by training data and the obtaining efficiency of a cache classification model. The specific technical scheme is as follows:

in a first aspect, an embodiment of the present invention provides a method for acquiring a cache allocation model in a real network environment, where the method includes:

loading a loadable reinforcement learning model in the switch; the switch is used for realizing data transmission between the server and the client;

when the data transmission triggers a preset allocation condition, the loadable reinforcement learning model is utilized to acquire a cache threshold value of the switch, and cache allocation is carried out;

Storing statistical information and a utilized buffer threshold value utilized by buffer allocation corresponding to the current trigger and a reward value corresponding to a reinforcement learning model utilized by historical trigger to obtain training data generated by the current trigger; the reward value corresponding to the reinforcement learning model utilized by the historical trigger is a value obtained by carrying out packet loss rate and throughput generated by the data transmission in a time interval from the historical trigger to the current trigger;

training the reinforcement learning model to be trained by utilizing the training data generated by the triggering, and obtaining and storing the loadable reinforcement learning model;

and if the number of the saved loadable reinforcement learning models does not meet the preset number condition, returning to execute the loadable reinforcement learning models loaded in the switch, otherwise, selecting the loadable reinforcement learning model with the corresponding reward value meeting the preset reward value condition from the saved loadable reinforcement learning models as a cache allocation model.

In a second aspect, an embodiment of the present invention provides an apparatus for acquiring a cache allocation model in a real network environment, where the apparatus includes:

The model loading module is used for loading the loadable reinforcement learning model in the switch; the switch is used for realizing data transmission between the server and the client;

the buffer memory allocation module is used for acquiring a buffer memory threshold value of the switch by utilizing the loadable reinforcement learning model and performing buffer memory allocation when the data transmission triggers a preset allocation condition;

the data acquisition module is used for storing statistical information utilized by buffer allocation corresponding to the current trigger, a utilized buffer threshold value and a reward value corresponding to a reinforcement learning model utilized by historical trigger, and obtaining training data generated by the current trigger; the reward value corresponding to the reinforcement learning model utilized by the historical trigger is a value obtained by carrying out packet loss rate and throughput generated by the data transmission in a time interval from the historical trigger to the current trigger;

the model acquisition module is used for training the reinforcement learning model to be trained by utilizing the training data generated by the current trigger, and acquiring and storing the loadable reinforcement learning model; and if the number of the saved loadable reinforcement learning models does not meet the preset number condition, returning to execute the loadable reinforcement learning models loaded in the switch, otherwise, selecting the loadable reinforcement learning model with the corresponding reward value meeting the preset reward value condition from the saved loadable reinforcement learning models as a cache allocation model.

In a third aspect, an embodiment of the present invention provides a system for acquiring a cache allocation model in a real network environment, where the system includes: the system comprises a switch, a server, a client and a model acquisition agent;

the switch is used for realizing data transmission between the server and the client;

the model acquisition agent is used for loading a loadable reinforcement learning model in the switch; when the data transmission triggers a preset allocation condition, the loadable reinforcement learning model is utilized to acquire a cache threshold value of the switch, and cache allocation is carried out; storing statistical information and a utilized buffer threshold value utilized by buffer allocation corresponding to the current trigger and a reward value corresponding to a reinforcement learning model utilized by historical trigger to obtain training data generated by the current trigger; the reward value corresponding to the reinforcement learning model utilized by the historical trigger is a value obtained by carrying out packet loss rate and throughput generated by the data transmission in a time interval from the historical trigger to the current trigger; training the reinforcement learning model to be trained by utilizing the training data generated by the triggering, and obtaining and storing the loadable reinforcement learning model; and if the number of the saved loadable reinforcement learning models does not meet the preset number condition, returning to execute the loadable reinforcement learning models loaded in the switch, otherwise, selecting the loadable reinforcement learning model with the corresponding reward value meeting the preset reward value condition from the saved loadable reinforcement learning models as a cache allocation model.

The embodiment of the invention has the beneficial effects that:

in the scheme provided by the embodiment of the invention, the training data for obtaining the cache allocation model is obtained based on the data transmission between the server and the client by the switch, and the data transmission between the switch, the server and the client is equivalent to a real network environment. Therefore, compared with the traditional method for acquiring the training data in the simulator, the acquired training data can reflect the real network condition more accurately, so that the accuracy of a cache allocation model acquired by utilizing the training data is improved. And when the loadable reinforcement learning model is loaded for buffer allocation and training data is obtained, the stored training data can be synchronously utilized for training the reinforcement learning model, which is equivalent to asynchronously executing the acquisition of the training data and the training of the model, decoupling the acquisition of the training data and the training of the model, and the training of the model is not required to be carried out after the end of each training data acquisition. Therefore, compared with the traditional method for acquiring training data in a simulator, the method can reduce the stop and other conditions in the training process and improve the acquisition efficiency of the cache allocation model.

Of course, it is not necessary for any one product or method of practicing the invention to achieve all of the advantages set forth above at the same time.

Drawings

In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are necessary for the description of the embodiments or the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the invention and that other embodiments may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a flowchart of a method for obtaining a cache allocation model in a real network environment according to an embodiment of the present invention;

fig. 2 is an exemplary diagram of an application scenario of a method for obtaining a cache allocation model in a real network environment according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of an acquisition system of a cache allocation model in a real network environment according to an embodiment of the present invention;

FIG. 4 is a diagram illustrating an exemplary architecture of an acquisition system for a cache allocation model in a real network environment according to an embodiment of the present invention;

fig. 5 is a schematic structural diagram of an obtaining device of a cache allocation model in a real network environment according to an embodiment of the present invention;

fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

As shown in fig. 1, the method for obtaining the cache allocation model in the real network environment according to an embodiment of the present invention may include the following steps:

s101, loading a loadable reinforcement learning model in the switch.

The switch is used for realizing data transmission between the server and the client.

In a specific application, the loadable reinforcement learning model is a reinforcement learning model for performing cache allocation. The loadable reinforcement learning model is an untrained reinforcement learning model when initially loaded, and a trained reinforcement learning model when not initially loaded. Also, the loading opportunities for the loadable reinforcement learning model may be multiple. For example, the switch may complete one data transmission, for a preset period of time, or the training number of the reinforcement learning model to be trained is equal to the number threshold, or the like.

S102, when the data transmission triggers a preset allocation condition, a loadable reinforcement learning model is utilized to acquire a buffer threshold of the switch, and buffer allocation is carried out.

In an alternative embodiment, the preset allocation conditions may specifically include:

in data transmission, the difference value between the data volume received by any port of the switch and the data volume sent by the port is larger than a difference threshold value, or the data volume lost by any port of the switch is larger than a preset loss threshold value.

In a specific application, when the data transmission performed by the switch is a data packet, the data volume received by any port of the switch is equal to the packet inlet number of the port, the data volume sent by the port is equal to the packet outlet number of the port, and the data volume lost by any port of the switch is equal to the packet loss number of the port. When the data transmission triggers the preset allocation, it indicates that the amount of data received by the switch is greater than the amount of data sent by the switch, so that the received data needs to be cached, and therefore, the cache allocation is needed.

And, the method for obtaining the buffer threshold of the switch by using the loadable reinforcement learning model to allocate the buffer can specifically include: acquiring statistical information of data transmission performed by the switch during the triggering; inputting the acquired statistical information into a loadable reinforcement learning model to obtain a cache threshold of the switch; and controlling the maximum available cache of each port of the switch by using the cache threshold value to realize cache allocation. The statistical information is used for indicating the data transmission condition of the switch, for example, the data size of the switch transmission data, the packet loss rate and the like.

And S103, storing statistical information and a utilized cache threshold value utilized by the cache allocation corresponding to the current trigger and a reward value corresponding to the reinforcement learning model utilized by the historical trigger, and obtaining training data generated by the current trigger.

The reward value corresponding to the reinforcement learning model utilized by the historical trigger is a value obtained based on the packet loss rate and throughput generated by data transmission in the time interval from the historical trigger to the current trigger.

In particular applications, the historical triggers may be multiple. For example, the history trigger may be the first N triggers of the present trigger, where N may be an integer greater than or equal to 1. For example, the history trigger may be the previous trigger of the present trigger, or the previous seven triggers of the present trigger, or the like. All triggers before this trigger can be used as history triggers, which are not limited in this embodiment. In addition, for ease of understanding and rational layout, the manner in which prize values are obtained will be described in detail below in terms of alternative embodiments.

S104, training the reinforcement learning model to be trained by utilizing training data generated by the triggering, and obtaining and storing the loadable reinforcement learning model.

The reinforcement learning model to be trained may be various among others. Illustratively, the reinforcement learning model to be trained may be an untrained reinforcement learning model, e.g., a reinforcement learning model that is initially loaded. Alternatively, the reinforcement learning model to be trained may be an reinforcement learning model in which the training times reach a specified number of times, or the like, by way of example. Because the more the training times are, the higher the accuracy of the reinforcement learning model is not necessarily, a plurality of reinforcement learning models with different training times can be obtained through training. To this end, the reinforcement learning model to be trained may be different models trained different times, for example, the reinforcement learning model to be trained may be a reinforcement learning model trained 10 times, or a reinforcement learning model trained 20 times, or the like.

In addition, for easy understanding and reasonable layout, a specific manner of training the reinforcement learning model to be trained by using the training data generated by the present trigger to obtain and store the loadable reinforcement learning model is described in detail in the form of an alternative embodiment.

S105, if the number of the saved loadable reinforcement learning models does not meet the preset number condition, returning to execute loading the loadable reinforcement learning models in the switch, otherwise, selecting the loadable reinforcement learning model with the corresponding reward value meeting the preset reward value condition from the saved loadable reinforcement learning models as a cache allocation model.

Wherein the preset number of conditions may be various. By way of example, the preset number of conditions may include: the number of reinforcement learning models is greater than a number threshold, or the number of reinforcement learning models is equal to a number threshold. The preset number of conditions is used for ensuring that the number of the saved loadable reinforcement learning models is multiple so as to obtain a plurality of loadable reinforcement learning models with different training times, which is beneficial to improving the accuracy of a cache allocation model selected from the saved loadable reinforcement learning models. Thus, if the number of saved loadable reinforcement learning models does not meet the preset number condition, loading the loadable reinforcement learning models in the switch may be performed back in order to save more loadable reinforcement learning models. In addition, for ease of understanding and rational layout, a specific selection manner of the cache allocation model will be described in detail below in the form of alternative embodiments.

In an optional embodiment, the storing the statistical information and the utilized buffer threshold value utilized by the buffer allocation corresponding to the current trigger and the reward value corresponding to the reinforcement learning model utilized by the historical trigger may specifically include the following steps:

collecting statistical information generated by all ports of the switch when data transmission triggers preset allocation conditions;

obtaining a buffer threshold value utilized by buffer allocation corresponding to the trigger and a reward value utilized by historical trigger;

and assembling and storing the collected statistical information, the acquired cache threshold value and the acquired reward value to obtain training data generated by the triggering.

In a specific application, the statistical information generated by all ports of the switch is used to reflect the data transmission situation performed by the switch, for example, the data size transmitted by the switch, the packet loss rate and other information. Therefore, the statistical information generated by all ports of the switch is equivalent to the internal state information of the switch, and can be used as the state of the reinforcement model utilized by the current trigger. The buffer threshold value used for buffer allocation corresponding to the current trigger is the output of the reinforcement learning model used for the current trigger, and can be used as the decision of the reinforcement learning model used for the current trigger. And, the reward value is used for reflecting the decision effect of the reinforcement learning model, and the performance of the reinforcement learning model can be embodied. Therefore, the reward value corresponding to the reinforcement learning model utilized by the historical trigger can be a value obtained based on the packet loss rate and the throughput generated by data transmission in the time interval from the historical trigger to the current trigger, and correspondingly, the reward corresponding to the reinforcement learning model utilized by the current trigger can be obtained in the next trigger.

Thus, the collected statistical information, the obtained cache threshold value and the obtained reward value can respectively reflect the state, decision and performance of the reinforcement learning model, so that training data generated by the current trigger can be assembled and obtained. And, the specific way of assembling and storing the collected statistical information, the obtained buffer threshold value and the obtained rewarding value to obtain the training data generated by the triggering can be various. For example, the collected statistics, the obtained cache threshold value and the obtained reward value may be spliced into one piece of data, where the one piece of data is training data generated by the triggering; alternatively, the collected statistics, the obtained cache threshold, and the obtained prize value may be constructed as an array, which is the training data generated by the present trigger.

In addition, the training data can be stored in a database, so that the acquisition in the subsequent training is convenient, and the efficiency of cache allocation is improved. The database may be a dis (Remote Dictionary Server, remote dictionary service) database, which is an open-source support network, a memory-based or persistent log-type, key-Value database, for example. In addition, in order to ensure that the buffer threshold value output by different reinforcement learning models corresponds to the rewarding value of the reinforcement learning model, each piece of statistical information, the buffer threshold value and the rewarding value can be identified according to different marks of the corresponding reinforcement learning model, and the marks can exist in the whole process of acquiring the buffer allocation model so as to prevent data disorder.

In an alternative embodiment, the prize values may be obtained by:

acquiring a time interval between the current trigger and the historical trigger;

in the statistical time interval, carrying out packet loss rate and throughput of data transmission;

and taking the packet loss rate and the throughput as the rewarding value.

In a specific application, the time point of each trigger can be recorded, and then the difference between the time point of the current trigger and the time point of the historical trigger is calculated, so as to obtain the time interval between the current trigger and the historical trigger. And, the throughput of data transmission is the time period from the historical trigger to the current trigger, that is, the data volume of the data sent and received by the switch in the time interval. Where the data is in the form of a queue, the amount of data may be the length of the queue. The packet loss rate and throughput can reflect the effect of performing buffer allocation according to the buffer threshold value utilized by the history trigger, and can reflect the performance of the reinforcement learning model corresponding to the buffer threshold value utilized by the history trigger, so that the reinforcement learning model can be used as a reward value.

In an optional implementation manner, the training data generated by the triggering is used to train the reinforcement learning model to be trained, and the loadable reinforcement learning model is obtained and stored, which specifically includes the following steps:

Acquiring training data generated by the triggering;

inputting training data generated by the triggering into a reinforcement learning model to be trained, and training the reinforcement learning model to be trained;

when the number of times of inputting training data to the reinforcement learning model to be trained is equal to the preset number of times, the reinforcement learning model obtained by training the last time of inputting the training data is saved and used as the loadable reinforcement learning model.

In a specific application, when the training data is stored in the database, the training data generated by the current trigger may be read from the database. And, the time for acquiring the training data generated by the triggering can include: the database can be continuously or periodically queried whether new training data arrives, and when the new training data arrives, the training data in the database is read, and the read training data is input into the reinforcement learning model to be trained

And when the number of times of inputting training data to the reinforcement learning model to be trained is equal to the preset number of times, the training data generated by triggering the corresponding data transmission for the preset number of times is indicated to train the reinforcement learning model to be trained. Therefore, the reinforcement learning model obtained by training the last time of input training data is saved at this time and used as a loadable reinforcement learning model for the exchanger to use in the buffer allocation, and the reinforcement learning model is repeatedly and repeatedly updated continuously.

According to the method, the device and the system, training data generated by the triggering are input into the reinforcement learning model to be trained, the reinforcement learning model to be trained is trained, the reinforcement learning model to be trained can be trained each time, training data different from the training data of the reinforcement learning model to be trained last time can be used, and improvement of training effects of the reinforcement learning model is facilitated.

In an optional embodiment, from the plurality of stored loadable reinforcement learning models, a loadable reinforcement learning model with a corresponding reward value satisfying a preset reward value condition is selected as the cache allocation model, and specifically the method may include the following steps:

sorting the stored plurality of loadable reinforcement learning models in order from large to small according to the reward value corresponding to each loadable reinforcement learning model;

the loadable reinforcement learning model arranged in the front designated number of bits is used as a cache allocation model.

In a specific application, the model test script may be launched to cause the model test script to execute the present alternative embodiment. Because the reward value corresponding to any reinforcement learning model can reflect the performance of the reinforcement learning model, the stored plurality of loadable reinforcement learning models are ordered according to the order of the reward value corresponding to each loadable reinforcement learning model from large to small, and the order of the loadable reinforcement learning models obtained through training according to the order of the performance from large to small can be realized. Thus, the loadable reinforcement learning model arranged in the front specified number of bits is a model with relatively better performance among the saved loadable reinforcement learning models, and can be used as a cache allocation model. The specified number can be set according to specific application scenes and training experience. For example, the specified number may be 1,2, 3, and so on.

In order to facilitate understanding, an exemplary description is made in an application scenario of the method for obtaining a cache allocation model in a real network environment provided by the foregoing embodiments and optional embodiments of the present invention. As shown in fig. 2, the number of data transmission performed by the switch in the embodiment of the present invention may be multiple, so as to generate multiple triggers and corresponding training data in parallel, thereby improving the acquisition efficiency of the cache allocation model. Specifically, a plurality of service ends and corresponding clients of each service end can be utilized to perform a plurality of data transmissions, and accordingly, the switch realizes the plurality of data transmissions. One implementation of data transmission corresponds to one environment being operated, whereby a plurality of environments, for example, the environment E1, the environments E2, … …, and the environment EN, can be operated. Moreover, since the same switch is utilized, the same loadable reinforcement learning model can be used to obtain the buffer threshold value for buffer allocation required by the environment operation process, that is, to provide decision services of the reinforcement learning model to each environment. Based on this, the training data collection and storage service may be implemented through the steps described above in connection with acquiring training data in the embodiment of fig. 1 and the optional embodiment of the present invention, and the acquired training data may be stored in a database. In performing the training service, that is, the step of obtaining the loadable reinforcement learning model in the embodiment of fig. 1 and the optional embodiment of the present invention, the loadable reinforcement learning model is obtained, and saved and used for implementing the decision service of the reinforcement learning model. And circulating until the number of the saved loadable reinforcement learning models meets the preset number condition, and determining a buffer allocation model from the saved loadable reinforcement learning models.

As shown in fig. 3, in an embodiment of the present invention, a structure of an acquisition system for a cache allocation model in a real network environment may include: a switch 301, a server 302, a client 303, and a model acquisition agent 304;

the switch 301 is configured to implement data transmission between the server 302 and the client 303;

the model acquisition agent 304 is configured to load a loadable reinforcement learning model in the switch 301; when the data transmission triggers a preset allocation condition, the loadable reinforcement learning model is utilized to acquire a cache threshold value of the switch, and cache allocation is carried out; storing statistical information and a utilized buffer threshold value utilized by buffer allocation corresponding to the current trigger and a reward value corresponding to a reinforcement learning model utilized by historical trigger to obtain training data generated by the current trigger; the reward value corresponding to the reinforcement learning model utilized by the historical trigger is a value obtained by carrying out packet loss rate and throughput generated by the data transmission in a time interval from the historical trigger to the current trigger; training the reinforcement learning model to be trained by utilizing the training data generated by the triggering, and obtaining and storing the loadable reinforcement learning model; and if the number of the saved loadable reinforcement learning models does not meet the preset number condition, returning to execute the loadable reinforcement learning models loaded in the switch, otherwise, selecting the loadable reinforcement learning model with the corresponding reward value meeting the preset reward value condition from the saved loadable reinforcement learning models as a cache allocation model.

In an alternative embodiment, the model acquisition agent 304 is configured to:

collecting statistical information generated by all ports of the switch 301 when the data transmission triggers a preset allocation condition;

obtaining a buffer threshold value utilized by buffer allocation corresponding to the current trigger and the reward value utilized by the historical trigger;

and assembling and storing the collected statistical information, the acquired cache threshold value and the acquired reward value to obtain the training data generated by the triggering.

For ease of understanding, the following describes, by way of example, the structure of an acquisition system of a cache allocation model in a real network environment according to an embodiment of the present invention. As illustrated by way of example in fig. 4. The system for obtaining the cache allocation model in the real network environment can comprise: the system comprises a switch data plane, a terminal h1, a terminal h2, a terminal h3, a terminal h4, a data collection module, a Redis database and training services. Wherein the switch data plane corresponds to switch 301 in the fig. 3 embodiment of the present invention. The terminals h1, h2, h3 and h4 correspond to the server 302 and the client 303 in the embodiment of fig. 3, and one server may correspond to one client, or one server may correspond to a plurality of clients. The data collection module, redis database, and training services correspond to the model acquisition agent 304 in the FIG. 3 embodiment of the present invention.

In a specific application, layer 2 (data link layer) of the OSI model (Open System Interconnection Reference Model, open systems interconnection communication reference model) can be implemented using DPDK (Data Plane Development Kit, data plane development suite) as a switch, which can be referred to as a two-layer switch. The DPDK can be operated based on a Linux system, and is used for a function library and a driving set for rapid data packet processing, so that the data processing performance and throughput can be greatly improved, and the working efficiency of a data plane application program can be improved. Moreover, the terminals h1, h2, h3 and h4 may generate random data to be transmitted, that is, generate traffic, according to different cumulative distribution functions. When a switch in the system is started, a loadable reinforcement learning model, i.e., a loading model, may be loaded. And the server and the client are started, the client initiates a data request to the server, and the server returns the data with the size required by the client. When the data returned by the server is transmitted to the client by the switch, if the triggering condition of the preset distribution condition is triggered, the switch acquires statistical information and inputs the statistical information into the loaded reinforcement learning model, and obtains decision information returned by the model, namely a buffer threshold value, and further utilizes the buffer threshold value to carry out buffer distribution, and completes the data transmission corresponding to the triggering according to the buffer distribution, namely the data transmission requested by the client. In the buffer allocation and the data transmission corresponding to the completion of the triggering, the data collection module can collect training data and store the training data into the Redis database. When new training data exists in the Redis database, the training service can read the new training data from the Redis database to perform the reinforcement learning model to be trained, and generate a model, wherein the generated model is a loadable reinforcement learning model. In addition, the training service can train by acquiring new training data, so that the trained model is updated regularly.

Corresponding to the embodiment of the method, the embodiment of the invention also provides a device for acquiring the cache allocation model in the real network environment.

As shown in fig. 5, the structure of an acquisition device for a buffer allocation model in a real network environment according to an embodiment of the present invention includes:

a model loading module 501 for loading a loadable reinforcement learning model in the switch; the switch is used for realizing data transmission between the server and the client;

the buffer allocation module 502 is configured to acquire a buffer threshold of the switch by using the loadable reinforcement learning model when the data transmission triggers a preset allocation condition, and perform buffer allocation;

a data obtaining module 503, configured to store statistical information and a utilized buffer threshold value utilized by buffer allocation corresponding to the current trigger, and a reward value corresponding to a reinforcement learning model utilized by historical trigger, and obtain training data generated by the current trigger; the reward value corresponding to the reinforcement learning model utilized by the historical trigger is a value obtained by carrying out packet loss rate and throughput generated by the data transmission in a time interval from the historical trigger to the current trigger;

The model obtaining module 504 is configured to train the reinforcement learning model to be trained by using the training data generated by the current trigger, and obtain and store a loadable reinforcement learning model; and if the number of the saved loadable reinforcement learning models does not meet the preset number condition, returning to execute the loadable reinforcement learning models loaded in the switch, otherwise, selecting the loadable reinforcement learning model with the corresponding reward value meeting the preset reward value condition from the saved loadable reinforcement learning models as a cache allocation model.

Optionally, the data acquisition module 503 is specifically configured to:

collecting statistical information generated by all ports of the switch when the data transmission triggers a preset distribution condition;

The embodiment of the invention also provides an electronic device, as shown in fig. 6, which comprises a processor 601, a communication interface 602, a memory 603 and a communication bus 604, wherein the processor 601, the communication interface 602 and the memory 603 complete communication with each other through the communication bus 604,

A memory 603 for storing a computer program;

the processor 601 is configured to execute the program stored in the memory 603, and implement the following steps:

training the reinforcement learning model to be trained by utilizing the training data generated by the triggering, and obtaining and storing the loadable reinforcement learning model; and if the number of the saved loadable reinforcement learning models does not meet the preset number condition, returning to execute the loadable reinforcement learning models loaded in the switch, otherwise, selecting the loadable reinforcement learning model with the corresponding reward value meeting the preset reward value condition from the saved loadable reinforcement learning models as a cache allocation model.

The communication bus mentioned above for the electronic devices may be a peripheral component interconnect standard (Peripheral Component Interconnect, PCI) bus or an extended industry standard architecture (Extended Industry Standard Architecture, EISA) bus, etc. The communication bus may be classified as an address bus, a data bus, a control bus, or the like. For ease of illustration, the figures are shown with only one bold line, but not with only one bus or one type of bus.

The communication interface is used for communication between the electronic device and other devices.

The Memory may include random access Memory (Random Access Memory, RAM) or may include Non-Volatile Memory (NVM), such as at least one disk Memory. Optionally, the memory may also be at least one memory device located remotely from the aforementioned processor.

The processor may be a general-purpose processor, including a central processing unit (Central Processing Unit, CPU), a network processor (Network Processor, NP), etc.; but also digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), field programmable gate arrays (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components.

In yet another embodiment of the present invention, there is further provided a computer readable storage medium having a computer program stored therein, the computer program when executed by a processor implementing the steps of the method for obtaining a cache allocation model in any of the above-mentioned real network environments.

In yet another embodiment of the present invention, a computer program product containing instructions that, when run on a computer, cause the computer to perform the method of obtaining a cache allocation model in any of the real network environments of the above embodiments is also provided.

In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, produces a flow or function in accordance with embodiments of the present invention, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in or transmitted from one computer-readable storage medium to another, for example, by wired (e.g., coaxial cable, optical fiber, digital Subscriber Line (DSL)), or wireless (e.g., infrared, wireless, microwave, etc.). The computer readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains an integration of one or more available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., solid State Disk (SSD)), etc.

It is noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

In this specification, each embodiment is described in a related manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for apparatus and system embodiments, the description is relatively simple, as it is substantially similar to method embodiments, with reference to the description of method embodiments in part.

The foregoing description is only of the preferred embodiments of the present invention and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention are included in the protection scope of the present invention.

Claims

1. The method for obtaining the cache allocation model in the real network environment is characterized by comprising the following steps:

if the number of the stored loadable reinforcement learning models does not meet the preset number condition, returning to execute the loadable reinforcement learning models loaded in the switch, otherwise, selecting the loadable reinforcement learning model with the corresponding reward value meeting the preset reward value condition from the stored multiple loadable reinforcement learning models as a cache allocation model;

the method for obtaining training data generated by the triggering includes the steps of:

2. The method of claim 1, wherein the prize value is obtained by:

counting the packet loss rate and throughput of the data transmission in the time interval;

and taking the packet loss rate and the throughput as the rewarding value.

3. The method according to claim 1, wherein training the reinforcement learning model to be trained using the training data generated by the present trigger, and obtaining and saving the loadable reinforcement learning model, comprises:

acquiring training data generated by the current trigger;

inputting training data generated by the current trigger into the reinforcement learning model to be trained, and training the reinforcement learning model to be trained;

when the number of times of inputting training data to the reinforcement learning model to be trained is equal to the preset number of times, the reinforcement learning model obtained by training the last time of inputting training data is saved and used as the loadable reinforcement learning model.

4. The method of claim 1, wherein the preset dispensing conditions comprise:

in the data transmission, a difference value between the data volume received by any port of the switch and the data volume sent by the port is larger than a difference threshold value, or the data volume lost by any port of the switch is larger than a preset loss threshold value.

5. The method according to claim 1, wherein selecting, as the cache allocation model, a loadable reinforcement learning model in which a corresponding reward value satisfies a preset reward value condition from among the plurality of stored loadable reinforcement learning models, comprises:

6. An apparatus for obtaining a cache allocation model in a real network environment, the apparatus comprising:

The model acquisition module is used for training the reinforcement learning model to be trained by utilizing the training data generated by the current trigger, and acquiring and storing the loadable reinforcement learning model; if the number of the stored loadable reinforcement learning models does not meet the preset number condition, returning to execute the loadable reinforcement learning models loaded in the switch, otherwise, selecting the loadable reinforcement learning model with the corresponding reward value meeting the preset reward value condition from the stored multiple loadable reinforcement learning models as a cache allocation model;

the data acquisition module is specifically configured to:

7. A system for obtaining a cache allocation model in a real network environment, the system comprising: the system comprises a switch, a server, a client and a model acquisition agent;

the model acquisition agent is used for loading a loadable reinforcement learning model in the switch; when the data transmission triggers a preset allocation condition, the loadable reinforcement learning model is utilized to acquire a cache threshold value of the switch, and cache allocation is carried out; storing statistical information and a utilized buffer threshold value utilized by buffer allocation corresponding to the current trigger and a reward value corresponding to a reinforcement learning model utilized by historical trigger to obtain training data generated by the current trigger; the reward value corresponding to the reinforcement learning model utilized by the historical trigger is a value obtained by carrying out packet loss rate and throughput generated by the data transmission in a time interval from the historical trigger to the current trigger; training the reinforcement learning model to be trained by utilizing the training data generated by the triggering, and obtaining and storing the loadable reinforcement learning model; if the number of the stored loadable reinforcement learning models does not meet the preset number condition, returning to execute loading the loadable reinforcement learning models in the switch, otherwise, selecting the loadable reinforcement learning model with the corresponding rewards meeting the preset rewards value condition from the stored multiple loadable reinforcement learning models as a cache allocation model;

The model acquisition agent is specifically configured to: