CN114499842A

CN114499842A - QKD network key resource pre-allocation method based on reinforcement learning

Info

Publication number: CN114499842A
Application number: CN202111679797.9A
Authority: CN
Inventors: 郭邦红; 董博文; 胡敏
Original assignee: South China Normal University
Current assignee: Guangdong Yukopod Technology Development Co ltd
Priority date: 2021-12-31
Filing date: 2021-12-31
Publication date: 2022-05-13
Anticipated expiration: 2041-12-31
Also published as: CN114499842B

Abstract

The invention provides a QKD network key resource pre-allocation method based on reinforcement learning, which can accelerate the quantum key allocation speed, improve the key allocation success rate, reduce the key pool maintenance cost and reduce the key resource waste; under the multi-user concurrent quantum key application scene, key resources in the public key pool are pre-distributed and are placed into a fixed number of sub-key pools to form a resource set. The sub-key pool matched with the key resource request can immediately distribute the key resources therein to the corresponding requesting party, thereby greatly reducing the queuing time of the request; meanwhile, the number of the pre-distributed key resources is predicted by adopting a reinforcement learning method, so that the matching degree of the key resources is improved, the distribution success rate can be improved, and the waste of the key resources is reduced.

Description

QKD network key resource pre-allocation method based on reinforcement learning

Technical Field

The invention relates to the technical field of quantum communication and quantum networks, in particular to a QKD network key resource pre-allocation method based on reinforcement learning.

Background

With the development of quantum computing technology, the traditional key system based on computational complexity is impacted. For the RSA encryption algorithm based on prime factor decomposition, the Shor algorithm utilizes the advantage that a quantum computer can perform parallel computation on a quantum superposition state, can complete decomposition in polynomial time, has exponential-level acceleration compared with a classical algorithm, and poses a threat to an RSA key system.

Quantum Key Distribution (QKD) is based on the Heisenberg inaccuracy principle and the quantum unclonable law, and theoretically guarantees unconditional safety. In recent years, the quantum key distribution technology is rapidly developed, and the point-to-point quantum key distribution technology tends to mature, namely, the quantum key distribution technology is about to enter a large-scale commercial stage.

In the QKD network architecture, quantum keys are generated by a key generation device and stored in a quantum key pool, managed and distributed by a key management module. When quantum secret communication is carried out among multiple users Alice1, Alice2, Alice … …, and Bob1, Bob2, Bob … …, and Bobn respectively, and when quantum key requests arrive at the same time, because a key pool is a shared resource, parallel key distribution cannot be carried out on multiple requests at the same time, and therefore multiple users need to wait in a queue for key resource distribution.

The prior invention patent is as follows: (CN107086908B) proposes a method for creating a sub-key pool for a user to deal with the problem that the user cannot respond quickly in the scenario of a large number of concurrent key requests from multiple users. Creating and maintaining a pool of subkeys for each user, performing resource allocation and reclamation consumes a large amount of resources. Meanwhile, the number of users cannot be estimated, and the key resources and the sub-key pool are difficult to be effectively expanded.

Disclosure of Invention

The invention provides a QKD network key resource pre-allocation method based on reinforcement learning, which can improve the key resource matching degree, improve the allocation success rate and reduce the key resource waste.

In order to achieve the technical effects, the technical scheme of the invention is as follows:

a QKD network key resource pre-allocation method based on reinforcement learning comprises the following steps:

s1: the control layer receives a quantum key request of a user;

s2: carrying out routing selection according to the user request;

s3: judging whether each link on the selected path has a sub-key pool matched with the user key request, if so, returning the key resource to the requester; if not, go to step S4;

s4: judging whether the key resources in the current public key pool can meet the request, if so, distributing the key resources for the user from the public key pool; otherwise, executing step S5;

s5: placing the request in a blocking queue waiting for key resources; if the request waiting time exceeds a threshold value, deleting the request from the queue and returning distribution failure information;

s6: the application layer receives the feedback information and stores the feedback information into a reinforcement learning library;

s7: and randomly extracting a plurality of information from the learning library, inputting the information into the reinforcement learning neural network for training, storing the model and applying a pre-distribution strategy.

Further, in step S1, the user request message includes: user identity ID, quantum key request quantity, source node and destination node.

Further, in step S2, the routing algorithm is: the user request comprises the number of the key requests, the information of the source node and the information of the destination node; selecting n shortest paths by a K shortest path algorithm KSP, calculating the priority D1, D2, … … and Dn of each path according to the key resources and the path length in the key pool passing through each link in each path, and selecting the path with the highest priority.

Further, the specific process of step S2 is:

s21: selecting n shortest paths as alternative routes according to a KSP algorithm;

s22: judging whether each link of the K paths has a sub-key pool matched with the number of the user key requests, if so, executing the step S23; otherwise, go to S24;

s23: selecting the shortest path from the plurality of routes meeting the request matching condition as the route of the request;

s24: for each path selected by the KSP algorithm, taking the resource quantity K in the key pool with the minimum key resource in the public key pool of each link in the path as the key resource of the path; and calculating the priority D of each path, wherein L is the length of the path, and selecting the path with the highest priority as the result of the routing.

Further, the specific process of step S7 is:

s71: acquiring historical information from a learning library, wherein the historical information comprises: the number of keys in the public key pool and each sub-key pool, the link key generation rate, the current time and the network gain;

s72: converting the network information into a vector, inputting the vector into a reinforcement learning neural network, and obtaining output through the neural network;

s73: and allocating a certain number of key resources from the public key pool to be placed in the sub-key pool according to the output of the neural network, namely a pre-allocation strategy.

S74: and carrying out key distribution according to the user request, and feeding back corresponding information and network benefits according to different distribution results.

Further, the network information in step S72 includes: the number of keys in the public key pool and each sub-key pool, the link key generation rate, and the current time.

Further, in step S73, the pre-allocation policy includes: observing the states of the public key pool and the sub-key pool, and when the key resources in the public key pool are greater than a threshold value and an empty sub-key pool exists, inputting the state information of the public key pool and the sub-key pool, the link key generation rate and the time into the reinforcement learning neural network and obtaining K outputs; and selecting one output K from the K outputs as the pre-distribution quantity, dividing key blocks with the quantity of K from the public key pool, and putting the key blocks into an empty sub-key pool.

Further, the reinforcement learning neural network comprises an input layer, a hidden layer and an output layer, wherein: the input layer receives network state information, including: the number of keys in the public key pool and each sub-key pool, the link key generation rate, and the current time.

Further, the neural network input is represented by a vector, and the specific form is as follows: [ p0, p1, p2, … …, pm, q, t ], wherein p0 is the number of key resources in the public key pool; p1, p2, … … pm are the number of key resources in the m sub-key pools, q is the link key generation rate, and t is the current time.

Further, the reward of the neural network is the network profit of the distribution result of the user key request in a period of time; for a user key request, the specific strategy is as follows:

there is a sub-key pool matching the user key request, and the key resources of the sub-key pool can be directly allocated to the requester: the network profit is 2;

there is no sub-key pool matching the user key request, and the user needs to be allocated with the corresponding key resource from the public key pool: the network profit is 1;

the key resources in the public key pool cannot meet the key request, the request is blocked, and the key resources are distributed for the public key pool after waiting for t: the network income is 1/t epsilon (0, 1);

the latency exceeds a threshold after the request is blocked, the request fails: the network revenue is 0.

Compared with the prior art, the technical scheme of the invention has the beneficial effects that:

the invention can accelerate the quantum key distribution speed, improve the success rate of key distribution, reduce the maintenance cost of the key pool and reduce the waste of key resources; under the multi-user concurrent quantum key application scene, key resources in the public key pool are pre-distributed and are placed into a fixed number of sub-key pools to form a resource set. The sub-key pool matched with the key resource request can immediately distribute the key resources therein to the corresponding requesting party, thereby greatly reducing the queuing time of the request; meanwhile, the number of the pre-distributed key resources is predicted by adopting a reinforcement learning method, so that the matching degree of the key resources is improved, the distribution success rate can be improved, and the waste of the key resources is reduced.

Drawings

Fig. 1 is a structural diagram of a QKD network key resource pre-allocation system in embodiment 1;

FIG. 2 is a flow chart of the method of the present invention;

FIG. 3 is a flow diagram of the route distribution of a QKD network key request provided by the present invention;

FIG. 4 is a schematic diagram of a QKD network topology provided by the present invention;

FIG. 5 is a flowchart illustrating a key pre-allocation strategy for a reinforcement learning neural network according to an embodiment of the present invention;

fig. 6 is a diagram of a reinforcement learning neural network structure according to the present invention.

Detailed Description

The drawings are for illustrative purposes only and are not to be construed as limiting the patent;

for the purpose of better illustrating the embodiments, certain features of the drawings may be omitted, enlarged or reduced, and do not represent the size of an actual product;

it will be understood by those skilled in the art that certain well-known structures in the drawings and descriptions thereof may be omitted.

The technical solution of the present invention is further described below with reference to the accompanying drawings and examples.

Example 1

As shown in fig. 1, the embodiment provides a QKD multi-user key resource pre-distribution system based on reinforcement learning, which includes a physical link layer, a control layer and an application layer; the layers are connected through uniformly developed control interfaces.

The physical link layer includes: a plurality of links between each user node and the user node for transmitting classical and quantum information; a quantum key generation device; the quantum key pool is used for storing the key sequence generated by the quantum key generation device and comprises a public key pool and a subkey pool.

The key sequence in the public key pool can be divided or added; when the sub-key pool is empty, the key sequence can be divided from the public key pool and added to the sub-key pool; when the keys in the sub-key pool cannot be distributed for a long time, the key sequence in the sub-key pool can be recycled; to reduce the management overhead, the key sequence in the sub-key pool does not allow further operations.

Preferably, in order to further simplify the system, reduce resource overhead and speed up allocation, the key sequence in the key pool is divided into key blocks by a fixed length, and the user key request and key resource allocation are assigned in units of key blocks.

The control layer includes: the routing module receives a user request to select a key transmission path; the state observation module is used for observing the key amount in the public key pool and the sub key pool in each current link and returning the key amount to the application layer; the key distribution module is used for distributing key resources in the sub key pool or the public key pool to the requesting party; a queuing and scheduling module: and when the key resources in the public key pool are sufficient, calling out the request from the queue and distributing the key resources with the corresponding size.

And the application layer receives the user request, sends an instruction to the control layer, receives feedback information, and updates the reinforcement learning neural network and the key resource pre-allocation strategy.

As shown in fig. 2, the system applies a method for allocating QKD network key resources based on reinforcement learning, which includes the following steps:

s1: the control layer receives a quantum key request of a user;

when the user needs to obtain the quantum key, a quantum key request is sent to the operator of the QKD network. The communication between the user and the operator can be carried by various carriers, such as a TCP protocol, a UDP protocol, and the like, and the following contents are required to be contained in the key request of the user: user identity ID, quantum key request quantity, source node and destination node;

preferably, in order to further simplify the system, reduce resource overhead and speed up allocation, the key sequence in the key pool is divided into key blocks by a fixed length, and the user requests the allocation of keys and key resources in units of key blocks.

S2: carrying out routing selection according to the user request;

based on quantum unclonable law, quantum key transmission usually adopts a hop-by-hop encryption forwarding method, and forwarding times need to be reduced as much as possible to save key resources. Routing typically uses a shortest path algorithm or a K shortest path algorithm (KSP) to select one or more paths. If the key request in the network is in an unbalanced state for a long time, the resource of part of the links is wasted, and the part of the links are congested for a long time. Routing is carried out by combining the key resources in each link key pool and the path length, so that the request load of the whole network can be balanced.

The above routing process includes the following steps S21 to S24, which will be further described with reference to fig. 3 and 4. Wherein FIG. 4 is a network topology diagram comprising 6 nodes and 7 links, wherein the number on a link means "link length; number of keys in public key pool ".

S21: and selecting n shortest paths as alternative routes according to the KSP algorithm. In this embodiment, the source node and the destination node are node a and node f, and n is 2, then path 1 may be selected: a → b → c → e → f (path length 8) and path 2: a → b → e → f (path length is 7).

S22: and judging whether each link of the K paths has a sub-key pool matched with the number of the user key requests. If both exist, go to step S23; otherwise, S24 is executed.

S23: and selecting the shortest path from the plurality of routes meeting the request matching condition as the route of the request.

S24: for each path selected by the KSP algorithm, taking the resource quantity K in the key pool with the minimum key resource in the public key pool of each link in the path as the key resource of the path; the priority D of each path is calculated to be K/L (where L is the length of the path), and the path with the highest priority is selected as the result of this routing.

In this embodiment, K equals 4 in route 1 (b → c), K equals 3 in route 2 (b → e), and the priority D1 equals 4/8 equals 1/2 in route 1; the priority D2 of path 2 is 3/7, D1 > D2, and path 1 is selected as the route of the request.

By the method, the matching degree of the subkey pool and the request in each link, the key resource amount and the path length in the public key pool can be combined, the queuing delay and the blocking probability are reduced, and the load of the whole network is balanced.

s5: placing the request in a blocking queue waiting for key resources; if the request latency exceeds a threshold, the request is removed from the queue and assignment failure information is returned.

S6: and returning the feedback information and the quantum key to the user. The feedback information includes: the number of keys in the public key pool and each sub-key pool, the link key generation rate, the current time, and the network gain. And storing the feedback information into a learning library.

Preferably, the training step of the reinforcement learning neural network and the QKD network key pre-distribution strategy flow described in the present invention, as shown in fig. 5, specifically include:

s71: acquiring current state information, including: the number of keys in the public key pool and each sub-key pool, the link key generation rate, the current time, and the network gain.

The reason for considering the link key generation rate and the current time is that these two factors have an impact on the allocation policy, including:

link key generation rate: to balance cost and demand in a complex network, the key generation rates for each link tend to be different. The system should preferentially assign key requests to links with higher key generation rates, while being more aggressive in key pre-assignment behavior. Because the key generation rate is faster, the request latency is shorter when congested.

The current time: although the requests of each user are random, the overall request density and key requirements may change regularly over time. The system should adjust the pre-allocation policy based on the current time, such as making more aggressive pre-allocation of keys during periods of higher request frequency, resulting in lower overall user latency.

S72: and converting the information into a vector, inputting the vector into the reinforcement learning neural network, and obtaining output through the network.

Preferably, the network information includes: the number of keys in the public key pool and each sub-key pool, the link key generation rate, and the current time.

S73: and allocating a certain number of key resources from the public key pool to be placed in the sub-key pool according to the output of the neural network, namely a pre-allocation strategy. The output of the neural network has different meanings for different reinforcement learning methods. For example, k outputs of the neural network in the DQN (deep Q network) method are the gains of the corresponding behaviors; the k outputs of the neural network in the Policy Gradients method are the probabilities that this action should be taken. According to different output forms, key blocks with the number of k are selected to be divided from the public key pool and placed into an empty subkey pool.

S74: and carrying out key distribution according to the user request, and feeding back corresponding information and network benefits according to different distribution results. Wherein, the distribution result includes: a sub-key pool matched with the user key request exists, and corresponding key resources are distributed to a requester; if the matched sub-key pool does not exist, distributing corresponding key resources for the user from the public key pool; the key resources in the public key pool cannot meet the request, and the request is blocked; the request latency exceeds a threshold and the request fails. For different distribution results, corresponding network benefits are calculated.

To further illustrate the relationship between the reinforcement learning neural network and the key pre-allocation strategy, as shown in fig. 6, the reinforcement learning neural network comprises an input layer, a hidden layer and an output layer, wherein:

the input layer receives network state information, including: the number of keys in the public key pool and each sub-key pool, the link key generation rate, and the current time.

Preferably, the neural network input is represented by a vector, and the specific form is as follows: [ p0, p1, p2, … …, pm, q, t ]. Wherein p0 is the number of key resources in the public key pool; p1, p2, … … pm are the number of key resources in the m sub-key pools, q is the link key generation rate, and t is the current time.

The number of hidden layers and the number of neurons in each layer can affect the training speed and the training effect, and can be set according to the complexity of a specific network.

The number of neurons in the output layer is k (1, 2, 3, … …, k, respectively). The output of the neural network has different meanings for different reinforcement learning methods. For example, k outputs of the neural network in the DQN (deep Q network) method are the gains of the corresponding behaviors; the k outputs of the neural network in the Policy Gradients method are the probabilities that this action should be taken. According to different output forms, key resources with the number of k are selected to be divided from the public key pool and are placed into an empty sub-key pool by using a first-time adaptive algorithm.

Preferably, for the reinforcement learning method of the embodiment, the reward (aware) of the neural network is a network benefit of a distribution result of a user key request in a period of time; for a user key request, the specific strategy is as follows:

The reinforcement learning neural network can meet the performance requirement of the QKD network through training maximization reward (network profit), namely, higher request success rate; shorter request latency; higher key resource utilization.

Example 2

As shown in fig. 2, a method for pre-allocating QKD network key resources based on reinforcement learning includes the following steps:

s1: the control layer receives a quantum key request of a user;

s2: carrying out routing selection according to the user request;

s3: judging whether each link on the selected path has a sub-key pool matched with the user key request, and if yes, returning the key resource to the requester; if not, go to step S4;

Example 3

As shown in fig. 2, a method for allocating QKD network key resources based on reinforcement learning includes the following steps:

s1: the control layer receives a quantum key request of a user;

preferably, in order to further simplify the system, reduce resource overhead and speed up allocation, the key sequence in the key pool is divided into key blocks by a fixed length, and the user requests allocation of keys and key resources in units of key blocks.

S2: carrying out routing selection according to the user request;

based on quantum unclonable law, quantum key transmission usually adopts a hop-by-hop encryption forwarding method, and forwarding times need to be reduced as much as possible to save key resources. Routing typically uses a shortest path algorithm or a K shortest path algorithm (KSP) to select one or more paths. If the key request in the network is in an unbalanced state for a long time, the resource of part of the links is wasted, and the part of the links are congested for a long time. Routing in conjunction with the key resources and path lengths in each link QKP may balance the request load across the network.

In this embodiment, K equals 4(b → c) in path 1, K equals 3(b → e) in path 2, and the priority D1 equals 4/8 equals 1/2 in path 1; the priority D2 of path 2 is 3/7, D1 > D2, and path 1 is selected as the route of the request.

link key generation rate: in order to balance cost and demand in a complex network, the key generation rate of each link is often different. The system should preferentially assign key requests to links with higher key generation rates, while being more aggressive in key pre-assignment behavior. Because the key generation rate is faster, the request latency is shorter when congested.

S73: and allocating a certain amount of key resources from the public key pool to be placed in the sub key pool according to the output of the neural network, namely a pre-allocation strategy. The output of the neural network has different meanings for different reinforcement learning methods. For example, k outputs of the neural network in the DQN (deep Q network) method are the gains of the corresponding behaviors; the k outputs of the neural network in the Policy Gradients method are the probabilities that this action should be taken. According to different output forms, key blocks with the number of k are selected to be divided from the public key pool and placed into an empty subkey pool.

The number of neurons in the output layer is k (1, 2, 3, … …, k, respectively). The output of the neural network has different meanings for different reinforcement learning methods. For example, k outputs of the neural network in the DQN (deep Q network) method are the gains of the corresponding behaviors; the k outputs of the neural network in the Policy Gradients method are the probabilities that this action should be taken. According to different output forms, k key resources are selected to be distributed from the public key pool and put into an empty sub-key pool by using a first-time adaptive algorithm.

the key resources in the public key pool cannot meet the key request, the request is blocked, and the key resources are distributed for the public key pool after waiting for t: the network profit is 1/t E (0, 1);

the latency exceeds a threshold after the request is blocked, the request fails: the network gain is 0.

The reinforcement learning neural network can meet the performance requirement of the QKD network through training maximization reward (network income), namely, higher request success rate; shorter request latency; higher key resource utilization.

The same or similar reference numerals correspond to the same or similar parts;

the positional relationships depicted in the drawings are for illustrative purposes only and are not to be construed as limiting the present patent;

it should be understood that the above-described embodiments of the present invention are merely examples for clearly illustrating the present invention, and are not intended to limit the embodiments of the present invention. Other variations and modifications will be apparent to persons skilled in the art in light of the above description. And are neither required nor exhaustive of all embodiments. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the claims of the present invention.

Claims

1. A QKD network key resource pre-allocation method based on reinforcement learning is characterized by comprising the following steps:

s1: the control layer receives a quantum key request of a user;

s2: carrying out routing selection according to the user request;

2. The method for pre-distributing QKD network key resources based on reinforcement learning according to claim 1, wherein in step S1, the user request message includes: user identity ID, quantum key request quantity, source node and destination node.

3. The method for pre-distributing QKD network key resources based on reinforcement learning according to claim 2, wherein in said step S2, the routing algorithm is: the user request comprises the number of the key requests, the information of the source node and the information of the destination node; selecting n shortest paths by a K shortest path algorithm KSP, calculating the priority D1, D2, … … and Dn of each path according to the key resources and the path length in the key pool passing through each link in each path, and selecting the path with the highest priority.

4. The method for pre-allocating QKD network key resources based on reinforcement learning of claim 3, characterized in that the concrete process of said step S2 is:

5. The method for pre-allocating QKD network key resources based on reinforcement learning of claim 4, characterized in that the concrete process of the step S7 is as follows:

s72: converting the network information into a vector, inputting the vector into a reinforcement learning neural network, and obtaining output through the network;

s73: distributing a certain amount of key resources from the public key pool to be placed into the sub-key pool according to the output of the neural network, namely a pre-distribution strategy;

6. The method for pre-allocating QKD network key resources based on reinforcement learning of claim 5, wherein the network information in the step S72 includes: the number of keys in the public key pool and each sub-key pool, the link key generation rate, and the current time.

7. The reinforcement learning-based QKD network key resource pre-allocation method according to claim 6, wherein in step S73, the pre-allocation policy includes: observing the states of the public key pool and the sub-key pool, and when the key resources in the public key pool are greater than a threshold value and an empty sub-key pool exists, inputting the state information of the public key pool and the sub-key pool, the link key generation rate and the time into the reinforcement learning neural network and obtaining K outputs; and selecting one output K from the K outputs as the pre-distribution quantity, dividing key blocks with the quantity of K from the public key pool, and putting the key blocks into an empty sub-key pool.

8. The method of claim 7, wherein the e-learning neural network comprises an input layer, a hidden layer and an output layer, and wherein: the input layer receives network state information, including: the number of keys in the public key pool and each sub-key pool, the link key generation rate, and the current time.

9. The method of claim 8, wherein the neural network input is represented by a vector, and is in the form of: [ p0, p1, p2, … …, pm, q, t ], wherein p0 is the number of key resources in the public key pool; p1, p2, … … pm are the number of key resources in the m sub-key pools, q is the link key generation rate, and t is the current time.

10. The reinforcement learning-based QKD network key resource pre-allocation method according to claim 9, wherein the reward for the neural network is a network gain of the allocation result of the user key request over a period of time; for a user key request, the specific strategy is as follows: