CN114125746B

CN114125746B - Dynamic CoAP mode selection method and device based on UCB

Info

Publication number: CN114125746B
Application number: CN202111375458.1A
Authority: CN
Inventors: 杨会轩; 李欣; 刘金会
Original assignee: Beijing Huaqing Future Energy Technology Research Institute Co ltd; Shandong Huake Information Technology Co ltd
Current assignee: Beijing Huaqing Future Energy Technology Research Institute Co ltd; Shandong Huake Information Technology Co ltd
Priority date: 2021-11-19
Filing date: 2021-11-19
Publication date: 2022-08-16
Anticipated expiration: 2041-11-19
Also published as: CN114125746A

Abstract

The application relates to a dynamic CoAP mode selection method and equipment based on UCB, wherein the method comprises the following steps: and acquiring the data transmission mode type of the CoAP protocol in the current data transmission scene, and determining the optimal data transmission mode of each data packet in the data transmission process based on the MAB transmission model. Because the MAB transmission model is an online learning model, the data transmission mode combination strategy in the next transmission process can be determined in real time according to the transmission performance of the data packets which are transmitted in the data transmission process, so that the balance between the data transmission delay and the packet loss rate is integrally achieved, and the overall performance of the system is improved.

Description

Dynamic CoAP mode selection method and device based on UCB

Technical Field

The Application relates to the technical field of data transmission of the internet of things, in particular to a dynamic limited Application Protocol (CoAP) mode selection method and device based on Upper Confidence Bound (UCB).

Background

With the application and development of the internet of things technology in a power distribution network, information interaction between cloud and side and between side and end is a key and key point for realizing plug and play of equipment, wide interconnection among equipment, big data processing and related business processes, and a communication protocol is an effective carrier for realizing the information interaction. The CoAP protocol is a resource-limited application protocol in the Internet of things, solves the problems of network congestion, overlong transmission delay and the like of high-concurrency services in the power distribution Internet of things by using a smaller data packet and a faster transmission period, supports DTLS encryption, performs point table configuration by self-description of equipment, and realizes comprehensive perception, interconnection and intercommunication, light weight and high efficiency of a power distribution network and reliable operation of the power distribution services. The CoAP protocol has two data transmission modes, namely a non-confirmed mode and a confirmed mode, the non-confirmed mode does not have a data retransmission mechanism, and a data sending end does not care whether a receiving end loses packet or not, so that the CoAP protocol has lower time delay and higher packet loss rate; the acknowledgement mode has a data retransmission mechanism, and within the maximum retransmission time limit, the sending end can determine whether to retransmit according to the packet loss result fed back by the receiving end, so that the packet loss rate is low, and meanwhile, the transmission delay and the node energy consumption are inevitably increased. The CoAP protocol has different requirements on the packet loss rate and the transmission delay of data in a non-acknowledged mode and an acknowledged mode, the change information of a transmission channel cannot be completely mastered during data transmission in the prior art, and data transmission is generally carried out only based on the same data transmission mode, so that the packet loss rate and the transmission delay cannot meet the transmission requirement at the same time inevitably.

Disclosure of Invention

In order to overcome the problem that the packet loss rate and the transmission delay cannot simultaneously meet the transmission requirement in the related technology at least to a certain extent, the application provides a dynamic CoAP mode selection method and device based on UCB.

The scheme of the application is as follows:

according to a first aspect of the embodiments of the present application, a dynamic CoAP mode selection method based on UCB is provided, including:

acquiring the data transmission mode type of a CoAP protocol in a current data transmission scene;

based on a Multi-arm gambling machine problem (MAB) transmission model, an optimal data transmission mode of each data packet in the data transmission process is determined.

Preferably, in an implementation manner of the present application, the method further includes:

building the MAB transmission model, and taking each data transmission mode as each arm of the MAB transmission model;

the input item of the MAB transmission model is reward obtained after the transmission of the last data packet is finished or initial index of each data transmission mode; wherein the reward represents a preference estimation value of the previous data packet for each data transmission mode class;

and the output item in the MAB transmission model is reward obtained after the transmission of the current data packet is finished.

Preferably, in an implementation manner of the present application, the determining an optimal data transmission mode of each CoAP protocol data packet in a data transmission process based on the MAB transmission model includes:

determining the optimal data transmission mode of the current data packet according to the reward obtained after the transmission of the previous data packet is finished;

transmitting the current data packet according to the optimal data transmission mode of the current data packet, and determining the packet loss rate and the transmission delay in the transmission process of the current data packet;

obtaining rewards of the current data packet to each data transmission mode according to the packet loss rate and the transmission delay in the transmission process of the current data packet;

and using the reward of the current data packet to each data transmission mode as an input item of the MAB transmission model when the next cycle is carried out.

Preferably, in an implementable manner of the present application, the data transmission mode types include: a non-acknowledged mode and an acknowledged mode;

the non-acknowledgement mode has no data return mechanism, and the acknowledgement mode has a data return mechanism.

Preferably, in an implementation manner of the present application, the determining a packet loss ratio and a transmission delay in a current data packet transmission process includes:

counting the number of small data packets contained in the current data packet;

acquiring channel gain of each small data packet in the current data packet during transmission;

when the channel gain during the small data packet transmission is not greater than a preset threshold value, determining that packet loss occurs in the small data packet;

if the data transmission mode of the current data packet is a non-confirmation mode, counting packet loss results according to the number of the small data packets with packet loss;

if the data transmission mode of the current data packet is a confirmation mode, counting the retransmission times of the small data packets with lost packets, and counting packet loss results according to the number of the small data packets with lost packets and the retransmission times of the small data packets with lost packets;

and obtaining the packet loss rate according to the number of small data packets contained in the current data packet and the packet loss result.

if the data transmission mode of the current data packet is a non-confirmation mode, calculating the transmission delay of the current data packet according to the number of the small data packets and the transmission delay of the small data packets;

if the data transmission mode of the current data packet is a confirmation mode, counting the retransmission times of each small data packet, and calculating the transmission delay of the current data packet according to the number of the small data packets, the retransmission times of each small data packet and the transmission delay of the small data packet.

Preferably, in an implementation manner of the present application, obtaining a reward of the current data packet to each data transmission mode according to a packet loss rate and a transmission delay in a transmission process of the current data packet includes:

counting the selection times of each data transmission mode in the current data transmission process;

and obtaining the reward of the current data packet to each data transmission mode according to the reciprocal of the weighted sum of the packet loss rate and the transmission delay and the selection times of each data transmission mode in the current data transmission process.

Preferably, in an implementation manner of the present application, the determining an optimal data transmission mode for each data packet in the data transmission process includes:

and determining the data transmission mode with the maximum preference estimation value as the optimal data transmission mode of the current data packet according to the reward of the current data packet to each data transmission mode.

Preferably, in an implementable manner of the present application, the MAB transmission model is learned based on a UCB algorithm.

According to a second aspect of the embodiments of the present application, there is provided a dynamic CoAP mode selection device based on UCB, including:

the acquisition module is used for acquiring the data transmission mode type of the CoAP protocol in the current data transmission scene;

and the determining module is used for determining the optimal data transmission mode of each data packet in the data transmission process based on the MAB transmission model.

The technical scheme provided by the application can comprise the following beneficial effects: the dynamic CoAP mode selection method based on UCB in the application comprises the following steps: and acquiring the data transmission mode type of the CoAP protocol in the current data transmission scene, and determining the optimal data transmission mode of each data packet in the data transmission process based on the MAB transmission model. Because the MAB transmission model is an online learning model, the data transmission mode combination strategy in the next transmission process can be determined in real time according to the transmission performance of the data packets which are transmitted in the data transmission process, so that the balance between the data transmission delay and the packet loss rate is integrally achieved, and the overall performance of the system is improved.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present application and together with the description, serve to explain the principles of the application.

Fig. 1 is a schematic flowchart of a dynamic CoAP mode selection method based on UCB according to an embodiment of the present application;

fig. 2 is a dynamic CoAP mode selection method based on UCB according to another embodiment of the present application;

fig. 3 is a schematic diagram illustrating a transmission process of an ith large packet in different modes according to an embodiment of the present application;

FIG. 4 is a schematic diagram of a scenario for a simulation experiment provided in an embodiment of the present application;

FIG. 5 is a diagram illustrating simulation results provided by an embodiment of the present application;

FIG. 6 is a schematic diagram of another simulation experiment result provided by an embodiment of the present application;

fig. 7 is a schematic structural diagram of a dynamic CoAP mode selection device based on UCB according to an embodiment of the present application.

Reference numerals are as follows: an acquisition module-21; determination module-22.

Detailed Description

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present application. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present application, as detailed in the appended claims.

A dynamic CoAP mode selection method based on UCB, referring to fig. 1, includes:

s11: acquiring the data transmission mode type of a CoAP protocol in a current data transmission scene;

the data transmission mode categories of the CoAP protocol generally include: a non-acknowledged mode and an acknowledged mode; the non-confirmation mode does not have a data retransmission mechanism, and the data sending end does not care whether the receiving end loses packet, so that the non-confirmation mode has lower time delay and higher packet loss rate; the acknowledgement mode has a data retransmission mechanism, and within the maximum retransmission time limit, the sending end can determine whether to retransmit according to the packet loss result fed back by the receiving end, so that the packet loss rate is low, and meanwhile, the transmission delay and the node energy consumption are inevitably increased.

S12: and determining the optimal data transmission mode of each data packet in the data transmission process based on the MAB transmission model.

The MAB transmission model body is a MAB model, the MAB model is an online learning model, and is used to solve the problem of a dobby game machine in reinforcement learning, and each arm of the MAB transmission model in this embodiment is in each data transmission mode.

Because the MAB transmission model is an online learning model, in this embodiment, the input items of the MAB transmission model are rewards obtained after the transmission of the previous data packet is completed or initial indexes of each data transmission mode; wherein, the preference estimation value of the previous data packet for each data transmission mode type is rewarded; the output items in the MAB transmission model are rewards obtained after the transmission of the current data packet is finished, so that a loop is formed.

Based on the MAB transmission model, determining an optimal data transmission mode for each CoAP protocol data packet in the data transmission process, with reference to fig. 2, specifically includes:

s121: determining the optimal data transmission mode of the current data packet according to the reward obtained after the transmission of the previous data packet is finished;

s122: transmitting the current data packet according to the optimal data transmission mode of the current data packet, and determining the packet loss rate and the transmission delay in the transmission process of the current data packet; the method comprises the following steps:

counting the number of small data packets contained in the current data packet;

when the channel gain during the transmission of the small data packets is not greater than a preset threshold value, determining that packet loss occurs in the small data packets;

if the data transmission mode of the current data packet is the confirmation mode, counting the retransmission times of the small data packets with lost packets, and counting packet loss results according to the number of the small data packets with lost packets and the retransmission times of the small data packets with lost packets;

and obtaining the packet loss rate according to the number of the small data packets contained in the current data packet and the packet loss result.

and if the data transmission mode of the current data packet is the confirmation mode, counting the retransmission times of each small data packet, and calculating the transmission delay of the current data packet according to the number of the small data packets, the retransmission times of each small data packet and the transmission delay of the small data packets.

In this embodiment, one CoAP protocol data packet is defined as one big data packet (hereinafter referred to as a big packet), the total number of the big data packets to be transmitted is set as I, and each big data packet is divided into J small data packets (hereinafter referred to as a small packet). As shown in FIG. 3, the transmission delay of the ith large packet in m-mode is used

(I belongs to I); the transmission delay of the ith large packet and the jth small packet in the mode is used

(J belongs to J).

The data packet transmission mode is divided into a first mode and a second mode (the data transmission mode is a non-acknowledgement mode), wherein the first mode is the non-acknowledgement mode and is marked as m ═ 1, and a receiving end feeds back a result of whether packet loss occurs once to a transmitting end after each large packet transmission is completed; the second mode is an acknowledgement mode, and is marked as m ═ 2, the receiving end feeds back a result to the transmitting end after each packet transmission is completed, and if the packet is lost, data retransmission is required (the data retransmission is also marked as a packet transmission process).

Within any one large packet, the selected transmission mode is unchanged, and the transmission mode of the handover is determined according to the UCB algorithm in the MAB transmission model between large packets.

Introducing an indicator variable x _i,m ，x _i,m 1 means that the ith large packet selects the m mode, and vice versa means that the ith large packet does not select the m mode. It is assumed that the channel state remains unchanged during any packet transmission process, but the channel state between any packet transmission processes changes randomly, and particularly, when the packet adopts mode two for data transmission, the retransmission process of the packet is also denoted as a packet transmission process, i.e. different retransmission processesThe channel state varies randomly between passes.

In this embodiment, in order to simplify the model, channel gains at the nth retransmission of the jth packet of the ith large packet are defined

When channel gain g _i.j Greater than a threshold value G _th In time, no packet loss occurs

Otherwise, packet loss occurs

When the data packet adopts a transmission mode one, the retransmission is not considered, so the number of the small packets lost can be directly counted.

When the data packet adopts the second transmission mode, because retransmission is considered, the method is used when the packet loss occurs in the nth retransmission of the jth small packet of the ith large packet

And (4) showing. Therefore, the packet loss result of the ith big packet and the jth small packet can be expressed as

Where N represents the maximum number of retransmissions.

To sum up, the total number of lost packets of the ith large packet is

Packet loss rate of

When data collected by the sensing terminal of the power distribution internet of things are transmitted, transmission delay can be generated, in the first transmission mode, retransmission is not considered after data packets are lost, and transmission of one packet is carried outThe time delay of the transmission is

In the second transmission mode, after the data packet is lost, the data packet is retransmitted to the receiving end to be successfully received or exceed the maximum retransmission time N, and the transmission delay of one packet is

Thus, the transmission delay of the ith large packet is

S123: obtaining rewards of the current data packet to each data transmission mode according to the packet loss rate and the transmission delay in the transmission process of the current data packet; the method comprises the following steps:

When the method in the embodiment is implemented, two transmission modes are firstly selected in a traversing manner, namely, an initial index is firstly allocated to each mode, and the first wheel respectively selects and transmits each mode to obtain rewards of the first wheel; and selecting the data transmission mode with the maximum data packet preference estimation value for transmission in the secondary round, and updating the reward. And each round of decision is carried out according to the reward obtained after the last round of data packet transmission, the mode with the maximum preference estimation value of the data packet in the last round is selected for transmission, the reward is updated, and the like. The reward is related to the packet loss rate and the retransmission times, the reward is larger when the packet loss rate is lower and the retransmission times are lower, so that the goal of realizing the low packet loss rate through the fewer retransmission times is achieved, and the optimal transmission mode of the current time slot is dynamically selected.

When the ith large packet transmission is obtained, the preference estimation value of the equipment to the mode m is as follows:

in the formula, k _i-1,m Indicating the number of times mode m was selected for the i-th large packet transmission.

According to

The mode with the largest preference estimation value is selected.

According to

k _i,m ＝k _i-1,m +x _i,m Update the prize value and the number of times mode m is selected.

The state update formula is as follows:

when the ith large packet is selected, the number of times the pattern m is selected is:

k _i,m ＝k _i-1,m +x _i,m 。

when I +1, the learning is terminated, i.e. the overall data transmission process is completed.

S124: and using the reward of the current data packet to each data transmission mode as an input item of the MAB transmission model when the next loop is performed.

In some embodiments of the UCB-based dynamic CoAP mode selection method, the MAB transmission model is learned based on the UCB algorithm.

The embodiment provides a UCB algorithm, which is a dynamic CoAP protocol mode selection algorithm based on UCB, the algorithm solves an optimal strategy through the influence of learning state and action on preference estimation values, the system mainly comprises main bodies, environment, state, action, reward and other elements, and the specific analysis is as follows:

(1) the CoAP protocol data packet is used as a main body in the algorithm, the transmission channel information and the performance feedback of the power distribution Internet of things can be sensed in real time, and whether packet loss occurs or not is judged by comparing the channel gain information with a set threshold value;

(2) the state refers to the CoAP protocol mode currently selected by each big data packet;

(3) the environment refers to an objective environment state which cannot be adjusted by the CoAP protocol data packet at will, such as a channel signal-to-noise ratio, a channel gain and the like;

(4) the action means that the CoAP protocol data packet selects the transmission mode of the next packet according to a certain strategy under the condition of the current state;

(5) the reward is the estimated value of the preference of the data packet to the mode, which is defined as the reciprocal of the weighted sum of the delay and the packet loss rate in this embodiment, and once the weighted sum of the delay and the packet loss rate is too high, the action reward is a very small decimal.

To illustrate the superiority of the method provided in this embodiment, a simulation experiment is performed by taking the scenario of fig. 4 as an example. The total number of the big data packets is set to 800, each big data packet is divided into 10 small data packets for transmission, in the transmission process of any small data packet, the channel state is unchanged, the channel state between the transmission processes of any small data packets randomly changes, the channel state is represented by channel gain, and the maximum retransmission time in the acknowledgement mode is set to be 3.

Comparing two data transmission modes of the CoAP protocol, namely, the unacknowledged mode and the acknowledged mode, it can be seen from fig. 5 that the proposed dynamic CoAP protocol mode selection algorithm based on the UCB performs optimally in terms of the weighted sum of the transmission delay and the packet loss rate. The relationship between the optimization target and the selection times is shown in fig. 5, because the UCB algorithm firstly explores at the starting time, the algorithm is not necessarily optimal at the beginning, and with the increase of the selection times, the algorithm can dynamically consider the parameters of time delay and packet loss rate, and dynamically select a better mode in different channel states under the condition that the channel state is unknown, so that the weighted sum of the transmission time delay and the packet loss rate is gradually reduced, and the performance is obviously better than that of a single CoAP protocol data transmission mode. The simulation result shows that the UCB algorithm is used for dynamically selecting the CoAP protocol optimal transmission mode, the relation between the time delay and the data packet transmission reliability can be effectively balanced, and the optimal transmission mode decision is made, so that the differentiated service quality requirement of the distribution network service is met.

As shown in fig. 6, when the packet loss threshold is low, the transmission performance of the acknowledged mode is better, and as the threshold increases, the number of retransmissions in the acknowledged mode increases, and the delay increases, which results in an increasing trend of the weighted sum of the transmission delay and the packet loss, and the performance of the unacknowledged mode is better. Simulation results show that the proposed algorithm is obviously superior to a single CoAP protocol data transmission mode in a certain packet loss rate threshold range, and when the threshold is too large or too small, the proposed algorithm tends to select a mode with better performance under different thresholds, so that better performance is maintained, namely, the overall transmission performance is optimized by dynamically switching the CoAP transmission mode and balancing time delay and packet loss rate.

The dynamic CoAP mode selection method based on UCB in this embodiment can determine a data transmission mode combination strategy in the next transmission process according to the transmission performance of the data packet that has been transmitted in the data transmission process in real time, thereby achieving the balance between the data transmission delay and the packet loss rate as a whole and improving the overall performance of the system.

A dynamic CoAP mode selection apparatus based on UCB, referring to fig. 7, comprising:

an obtaining module 21, configured to obtain a data transmission mode type of a CoAP protocol in a current data transmission scenario;

and a determining module 22, configured to determine an optimal data transmission mode for each data packet in the data transmission process based on the MAB transmission model.

It is understood that the same or similar parts in the above embodiments may be mutually referred to, and the same or similar parts in other embodiments may be referred to for the content which is not described in detail in some embodiments.

It should be noted that, in the description of the present application, the terms "first", "second", etc. are used for descriptive purposes only and are not to be construed as indicating or implying relative importance. Further, in the description of the present application, the meaning of "a plurality" means at least two unless otherwise specified.

Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing specific logical functions or steps in the process, and alternate implementations are included within the scope of the preferred embodiment of the present application in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the present application.

It should be understood that portions of the present application may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, various steps or methods may be implemented in software or firmware stored in a memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.

It will be understood by those skilled in the art that all or part of the steps carried by the method for implementing the above embodiments may be implemented by hardware related to instructions of a program, which may be stored in a computer readable storage medium, and when the program is executed, the program includes one or a combination of the steps of the method embodiments.

In addition, functional units in the embodiments of the present application may be integrated into one processing module, or each unit may exist alone physically, or two or more units are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. The integrated module, if implemented in the form of a software functional module and sold or used as a stand-alone product, may also be stored in a computer readable storage medium.

The storage medium mentioned above may be a read-only memory, a magnetic or optical disk, etc.

In the description of the present specification, reference to the description of "one embodiment," "some embodiments," "an example," "a specific example," or "some examples" or the like means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present application. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.

Although embodiments of the present application have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present application, and that variations, modifications, substitutions and alterations may be made to the above embodiments by those of ordinary skill in the art within the scope of the present application.

Claims

1. A dynamic CoAP mode selection method based on UCB is characterized by comprising the following steps:

building an MAB transmission model, and taking each data transmission mode as each arm of the MAB transmission model; the input items of the MAB transmission model are rewards obtained after the transmission of the last data packet is finished or initial indexes of all data transmission modes; wherein the reward represents a preference estimation value of the previous data packet for each data transmission mode class; the output item in the MAB transmission model is reward obtained after the transmission of the current data packet is finished;

determining an optimal data transmission mode of each data packet in the data transmission process based on the MAB transmission model;

wherein the data transmission mode categories include: a non-acknowledged mode and an acknowledged mode;

the non-confirmation mode is not provided with a data returning mechanism, and the confirmation mode is provided with a data returning mechanism;

the determining the optimal data transmission mode of each CoAP protocol data packet in the data transmission process based on the MAB transmission model comprises the following steps:

using the reward of the current data packet to each data transmission mode as an input item of the MAB transmission model when the next circulation is carried out;

the determining the packet loss rate and the transmission delay in the current data packet transmission process includes:

counting the number of small data packets contained in the current data packet;

acquiring channel gain when each small data packet in the current data packet is transmitted;

obtaining the packet loss rate according to the number of small data packets contained in the current data packet and the packet loss result;

and if the data transmission mode of the current data packet is a confirmation mode, counting the retransmission times of each small data packet, and calculating the transmission delay of the current data packet according to the number of the small data packets, the retransmission times of each small data packet and the transmission delay of the small data packet.

2. The method according to claim 1, wherein obtaining rewards of the current data packet for each data transmission mode according to a packet loss rate and a transmission delay in the transmission process of the current data packet comprises:

3. The method of claim 2, wherein determining the optimal data transmission mode for each data packet during data transmission comprises:

4. The method of claim 1, wherein the MAB transmission model is learned based on a UCB algorithm.

5. A dynamic CoAP mode selection device based on UCB, comprising:

the determining module is used for determining the optimal data transmission mode of each data packet in the data transmission process based on the MAB transmission model;

wherein the MAB transmission model takes each of the data transmission modes as each arm of the MAB transmission model; the input items of the MAB transmission model are rewards obtained after the transmission of the last data packet is finished or initial indexes of all data transmission modes; wherein the reward represents a preference estimation value of the previous data packet for each data transmission mode class; the output item in the MAB transmission model is reward obtained after the transmission of the current data packet is finished;

the data transmission mode categories include: a non-acknowledged mode and an acknowledged mode;

counting the number of small data packets contained in the current data packet;