CN114499842A - QKD network key resource pre-allocation method based on reinforcement learning - Google Patents

QKD network key resource pre-allocation method based on reinforcement learning Download PDF

Info

Publication number
CN114499842A
CN114499842A CN202111679797.9A CN202111679797A CN114499842A CN 114499842 A CN114499842 A CN 114499842A CN 202111679797 A CN202111679797 A CN 202111679797A CN 114499842 A CN114499842 A CN 114499842A
Authority
CN
China
Prior art keywords
key
request
pool
network
resources
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111679797.9A
Other languages
Chinese (zh)
Other versions
CN114499842B (en
Inventor
郭邦红
董博文
胡敏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong Yukopod Technology Development Co ltd
Original Assignee
South China Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by South China Normal University filed Critical South China Normal University
Priority to CN202111679797.9A priority Critical patent/CN114499842B/en
Publication of CN114499842A publication Critical patent/CN114499842A/en
Application granted granted Critical
Publication of CN114499842B publication Critical patent/CN114499842B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L9/00Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
    • H04L9/08Key distribution or management, e.g. generation, sharing or updating, of cryptographic keys or passwords
    • H04L9/0816Key establishment, i.e. cryptographic processes or cryptographic protocols whereby a shared secret becomes available to two or more parties, for subsequent use
    • H04L9/0852Quantum cryptography
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L45/00Routing or path finding of packets in data switching networks
    • H04L45/12Shortest path evaluation
    • H04L45/123Evaluation of link metrics
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L45/00Routing or path finding of packets in data switching networks
    • H04L45/22Alternate routing
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L45/00Routing or path finding of packets in data switching networks
    • H04L45/24Multipath

Landscapes

  • Engineering & Computer Science (AREA)
  • Signal Processing (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • Computer Security & Cryptography (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Electromagnetism (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention provides a QKD network key resource pre-allocation method based on reinforcement learning, which can accelerate the quantum key allocation speed, improve the key allocation success rate, reduce the key pool maintenance cost and reduce the key resource waste; under the multi-user concurrent quantum key application scene, key resources in the public key pool are pre-distributed and are placed into a fixed number of sub-key pools to form a resource set. The sub-key pool matched with the key resource request can immediately distribute the key resources therein to the corresponding requesting party, thereby greatly reducing the queuing time of the request; meanwhile, the number of the pre-distributed key resources is predicted by adopting a reinforcement learning method, so that the matching degree of the key resources is improved, the distribution success rate can be improved, and the waste of the key resources is reduced.

Description

QKD network key resource pre-allocation method based on reinforcement learning
Technical Field
The invention relates to the technical field of quantum communication and quantum networks, in particular to a QKD network key resource pre-allocation method based on reinforcement learning.
Background
With the development of quantum computing technology, the traditional key system based on computational complexity is impacted. For the RSA encryption algorithm based on prime factor decomposition, the Shor algorithm utilizes the advantage that a quantum computer can perform parallel computation on a quantum superposition state, can complete decomposition in polynomial time, has exponential-level acceleration compared with a classical algorithm, and poses a threat to an RSA key system.
Quantum Key Distribution (QKD) is based on the Heisenberg inaccuracy principle and the quantum unclonable law, and theoretically guarantees unconditional safety. In recent years, the quantum key distribution technology is rapidly developed, and the point-to-point quantum key distribution technology tends to mature, namely, the quantum key distribution technology is about to enter a large-scale commercial stage.
In the QKD network architecture, quantum keys are generated by a key generation device and stored in a quantum key pool, managed and distributed by a key management module. When quantum secret communication is carried out among multiple users Alice1, Alice2, Alice … …, and Bob1, Bob2, Bob … …, and Bobn respectively, and when quantum key requests arrive at the same time, because a key pool is a shared resource, parallel key distribution cannot be carried out on multiple requests at the same time, and therefore multiple users need to wait in a queue for key resource distribution.
The prior invention patent is as follows: (CN107086908B) proposes a method for creating a sub-key pool for a user to deal with the problem that the user cannot respond quickly in the scenario of a large number of concurrent key requests from multiple users. Creating and maintaining a pool of subkeys for each user, performing resource allocation and reclamation consumes a large amount of resources. Meanwhile, the number of users cannot be estimated, and the key resources and the sub-key pool are difficult to be effectively expanded.
Disclosure of Invention
The invention provides a QKD network key resource pre-allocation method based on reinforcement learning, which can improve the key resource matching degree, improve the allocation success rate and reduce the key resource waste.
In order to achieve the technical effects, the technical scheme of the invention is as follows:
a QKD network key resource pre-allocation method based on reinforcement learning comprises the following steps:
s1: the control layer receives a quantum key request of a user;
s2: carrying out routing selection according to the user request;
s3: judging whether each link on the selected path has a sub-key pool matched with the user key request, if so, returning the key resource to the requester; if not, go to step S4;
s4: judging whether the key resources in the current public key pool can meet the request, if so, distributing the key resources for the user from the public key pool; otherwise, executing step S5;
s5: placing the request in a blocking queue waiting for key resources; if the request waiting time exceeds a threshold value, deleting the request from the queue and returning distribution failure information;
s6: the application layer receives the feedback information and stores the feedback information into a reinforcement learning library;
s7: and randomly extracting a plurality of information from the learning library, inputting the information into the reinforcement learning neural network for training, storing the model and applying a pre-distribution strategy.
Further, in step S1, the user request message includes: user identity ID, quantum key request quantity, source node and destination node.
Further, in step S2, the routing algorithm is: the user request comprises the number of the key requests, the information of the source node and the information of the destination node; selecting n shortest paths by a K shortest path algorithm KSP, calculating the priority D1, D2, … … and Dn of each path according to the key resources and the path length in the key pool passing through each link in each path, and selecting the path with the highest priority.
Further, the specific process of step S2 is:
s21: selecting n shortest paths as alternative routes according to a KSP algorithm;
s22: judging whether each link of the K paths has a sub-key pool matched with the number of the user key requests, if so, executing the step S23; otherwise, go to S24;
s23: selecting the shortest path from the plurality of routes meeting the request matching condition as the route of the request;
s24: for each path selected by the KSP algorithm, taking the resource quantity K in the key pool with the minimum key resource in the public key pool of each link in the path as the key resource of the path; and calculating the priority D of each path, wherein L is the length of the path, and selecting the path with the highest priority as the result of the routing.
Further, the specific process of step S7 is:
s71: acquiring historical information from a learning library, wherein the historical information comprises: the number of keys in the public key pool and each sub-key pool, the link key generation rate, the current time and the network gain;
s72: converting the network information into a vector, inputting the vector into a reinforcement learning neural network, and obtaining output through the neural network;
s73: and allocating a certain number of key resources from the public key pool to be placed in the sub-key pool according to the output of the neural network, namely a pre-allocation strategy.
S74: and carrying out key distribution according to the user request, and feeding back corresponding information and network benefits according to different distribution results.
Further, the network information in step S72 includes: the number of keys in the public key pool and each sub-key pool, the link key generation rate, and the current time.
Further, in step S73, the pre-allocation policy includes: observing the states of the public key pool and the sub-key pool, and when the key resources in the public key pool are greater than a threshold value and an empty sub-key pool exists, inputting the state information of the public key pool and the sub-key pool, the link key generation rate and the time into the reinforcement learning neural network and obtaining K outputs; and selecting one output K from the K outputs as the pre-distribution quantity, dividing key blocks with the quantity of K from the public key pool, and putting the key blocks into an empty sub-key pool.
Further, the reinforcement learning neural network comprises an input layer, a hidden layer and an output layer, wherein: the input layer receives network state information, including: the number of keys in the public key pool and each sub-key pool, the link key generation rate, and the current time.
Further, the neural network input is represented by a vector, and the specific form is as follows: [ p0, p1, p2, … …, pm, q, t ], wherein p0 is the number of key resources in the public key pool; p1, p2, … … pm are the number of key resources in the m sub-key pools, q is the link key generation rate, and t is the current time.
Further, the reward of the neural network is the network profit of the distribution result of the user key request in a period of time; for a user key request, the specific strategy is as follows:
there is a sub-key pool matching the user key request, and the key resources of the sub-key pool can be directly allocated to the requester: the network profit is 2;
there is no sub-key pool matching the user key request, and the user needs to be allocated with the corresponding key resource from the public key pool: the network profit is 1;
the key resources in the public key pool cannot meet the key request, the request is blocked, and the key resources are distributed for the public key pool after waiting for t: the network income is 1/t epsilon (0, 1);
the latency exceeds a threshold after the request is blocked, the request fails: the network revenue is 0.
Compared with the prior art, the technical scheme of the invention has the beneficial effects that:
the invention can accelerate the quantum key distribution speed, improve the success rate of key distribution, reduce the maintenance cost of the key pool and reduce the waste of key resources; under the multi-user concurrent quantum key application scene, key resources in the public key pool are pre-distributed and are placed into a fixed number of sub-key pools to form a resource set. The sub-key pool matched with the key resource request can immediately distribute the key resources therein to the corresponding requesting party, thereby greatly reducing the queuing time of the request; meanwhile, the number of the pre-distributed key resources is predicted by adopting a reinforcement learning method, so that the matching degree of the key resources is improved, the distribution success rate can be improved, and the waste of the key resources is reduced.
Drawings
Fig. 1 is a structural diagram of a QKD network key resource pre-allocation system in embodiment 1;
FIG. 2 is a flow chart of the method of the present invention;
FIG. 3 is a flow diagram of the route distribution of a QKD network key request provided by the present invention;
FIG. 4 is a schematic diagram of a QKD network topology provided by the present invention;
FIG. 5 is a flowchart illustrating a key pre-allocation strategy for a reinforcement learning neural network according to an embodiment of the present invention;
fig. 6 is a diagram of a reinforcement learning neural network structure according to the present invention.
Detailed Description
The drawings are for illustrative purposes only and are not to be construed as limiting the patent;
for the purpose of better illustrating the embodiments, certain features of the drawings may be omitted, enlarged or reduced, and do not represent the size of an actual product;
it will be understood by those skilled in the art that certain well-known structures in the drawings and descriptions thereof may be omitted.
The technical solution of the present invention is further described below with reference to the accompanying drawings and examples.
Example 1
As shown in fig. 1, the embodiment provides a QKD multi-user key resource pre-distribution system based on reinforcement learning, which includes a physical link layer, a control layer and an application layer; the layers are connected through uniformly developed control interfaces.
The physical link layer includes: a plurality of links between each user node and the user node for transmitting classical and quantum information; a quantum key generation device; the quantum key pool is used for storing the key sequence generated by the quantum key generation device and comprises a public key pool and a subkey pool.
The key sequence in the public key pool can be divided or added; when the sub-key pool is empty, the key sequence can be divided from the public key pool and added to the sub-key pool; when the keys in the sub-key pool cannot be distributed for a long time, the key sequence in the sub-key pool can be recycled; to reduce the management overhead, the key sequence in the sub-key pool does not allow further operations.
Preferably, in order to further simplify the system, reduce resource overhead and speed up allocation, the key sequence in the key pool is divided into key blocks by a fixed length, and the user key request and key resource allocation are assigned in units of key blocks.
The control layer includes: the routing module receives a user request to select a key transmission path; the state observation module is used for observing the key amount in the public key pool and the sub key pool in each current link and returning the key amount to the application layer; the key distribution module is used for distributing key resources in the sub key pool or the public key pool to the requesting party; a queuing and scheduling module: and when the key resources in the public key pool are sufficient, calling out the request from the queue and distributing the key resources with the corresponding size.
And the application layer receives the user request, sends an instruction to the control layer, receives feedback information, and updates the reinforcement learning neural network and the key resource pre-allocation strategy.
As shown in fig. 2, the system applies a method for allocating QKD network key resources based on reinforcement learning, which includes the following steps:
s1: the control layer receives a quantum key request of a user;
when the user needs to obtain the quantum key, a quantum key request is sent to the operator of the QKD network. The communication between the user and the operator can be carried by various carriers, such as a TCP protocol, a UDP protocol, and the like, and the following contents are required to be contained in the key request of the user: user identity ID, quantum key request quantity, source node and destination node;
preferably, in order to further simplify the system, reduce resource overhead and speed up allocation, the key sequence in the key pool is divided into key blocks by a fixed length, and the user requests the allocation of keys and key resources in units of key blocks.
S2: carrying out routing selection according to the user request;
based on quantum unclonable law, quantum key transmission usually adopts a hop-by-hop encryption forwarding method, and forwarding times need to be reduced as much as possible to save key resources. Routing typically uses a shortest path algorithm or a K shortest path algorithm (KSP) to select one or more paths. If the key request in the network is in an unbalanced state for a long time, the resource of part of the links is wasted, and the part of the links are congested for a long time. Routing is carried out by combining the key resources in each link key pool and the path length, so that the request load of the whole network can be balanced.
The above routing process includes the following steps S21 to S24, which will be further described with reference to fig. 3 and 4. Wherein FIG. 4 is a network topology diagram comprising 6 nodes and 7 links, wherein the number on a link means "link length; number of keys in public key pool ".
S21: and selecting n shortest paths as alternative routes according to the KSP algorithm. In this embodiment, the source node and the destination node are node a and node f, and n is 2, then path 1 may be selected: a → b → c → e → f (path length 8) and path 2: a → b → e → f (path length is 7).
S22: and judging whether each link of the K paths has a sub-key pool matched with the number of the user key requests. If both exist, go to step S23; otherwise, S24 is executed.
S23: and selecting the shortest path from the plurality of routes meeting the request matching condition as the route of the request.
S24: for each path selected by the KSP algorithm, taking the resource quantity K in the key pool with the minimum key resource in the public key pool of each link in the path as the key resource of the path; the priority D of each path is calculated to be K/L (where L is the length of the path), and the path with the highest priority is selected as the result of this routing.
In this embodiment, K equals 4 in route 1 (b → c), K equals 3 in route 2 (b → e), and the priority D1 equals 4/8 equals 1/2 in route 1; the priority D2 of path 2 is 3/7, D1 > D2, and path 1 is selected as the route of the request.
By the method, the matching degree of the subkey pool and the request in each link, the key resource amount and the path length in the public key pool can be combined, the queuing delay and the blocking probability are reduced, and the load of the whole network is balanced.
S3: judging whether each link on the selected path has a sub-key pool matched with the user key request, if so, returning the key resource to the requester; if not, go to step S4;
s4: judging whether the key resources in the current public key pool can meet the request, if so, distributing the key resources for the user from the public key pool; otherwise, executing step S5;
s5: placing the request in a blocking queue waiting for key resources; if the request latency exceeds a threshold, the request is removed from the queue and assignment failure information is returned.
S6: and returning the feedback information and the quantum key to the user. The feedback information includes: the number of keys in the public key pool and each sub-key pool, the link key generation rate, the current time, and the network gain. And storing the feedback information into a learning library.
S7: and randomly extracting a plurality of information from the learning library, inputting the information into the reinforcement learning neural network for training, storing the model and applying a pre-distribution strategy.
Preferably, the training step of the reinforcement learning neural network and the QKD network key pre-distribution strategy flow described in the present invention, as shown in fig. 5, specifically include:
s71: acquiring current state information, including: the number of keys in the public key pool and each sub-key pool, the link key generation rate, the current time, and the network gain.
The reason for considering the link key generation rate and the current time is that these two factors have an impact on the allocation policy, including:
link key generation rate: to balance cost and demand in a complex network, the key generation rates for each link tend to be different. The system should preferentially assign key requests to links with higher key generation rates, while being more aggressive in key pre-assignment behavior. Because the key generation rate is faster, the request latency is shorter when congested.
The current time: although the requests of each user are random, the overall request density and key requirements may change regularly over time. The system should adjust the pre-allocation policy based on the current time, such as making more aggressive pre-allocation of keys during periods of higher request frequency, resulting in lower overall user latency.
S72: and converting the information into a vector, inputting the vector into the reinforcement learning neural network, and obtaining output through the network.
Preferably, the network information includes: the number of keys in the public key pool and each sub-key pool, the link key generation rate, and the current time.
S73: and allocating a certain number of key resources from the public key pool to be placed in the sub-key pool according to the output of the neural network, namely a pre-allocation strategy. The output of the neural network has different meanings for different reinforcement learning methods. For example, k outputs of the neural network in the DQN (deep Q network) method are the gains of the corresponding behaviors; the k outputs of the neural network in the Policy Gradients method are the probabilities that this action should be taken. According to different output forms, key blocks with the number of k are selected to be divided from the public key pool and placed into an empty subkey pool.
S74: and carrying out key distribution according to the user request, and feeding back corresponding information and network benefits according to different distribution results. Wherein, the distribution result includes: a sub-key pool matched with the user key request exists, and corresponding key resources are distributed to a requester; if the matched sub-key pool does not exist, distributing corresponding key resources for the user from the public key pool; the key resources in the public key pool cannot meet the request, and the request is blocked; the request latency exceeds a threshold and the request fails. For different distribution results, corresponding network benefits are calculated.
To further illustrate the relationship between the reinforcement learning neural network and the key pre-allocation strategy, as shown in fig. 6, the reinforcement learning neural network comprises an input layer, a hidden layer and an output layer, wherein:
the input layer receives network state information, including: the number of keys in the public key pool and each sub-key pool, the link key generation rate, and the current time.
Preferably, the neural network input is represented by a vector, and the specific form is as follows: [ p0, p1, p2, … …, pm, q, t ]. Wherein p0 is the number of key resources in the public key pool; p1, p2, … … pm are the number of key resources in the m sub-key pools, q is the link key generation rate, and t is the current time.
The number of hidden layers and the number of neurons in each layer can affect the training speed and the training effect, and can be set according to the complexity of a specific network.
The number of neurons in the output layer is k (1, 2, 3, … …, k, respectively). The output of the neural network has different meanings for different reinforcement learning methods. For example, k outputs of the neural network in the DQN (deep Q network) method are the gains of the corresponding behaviors; the k outputs of the neural network in the Policy Gradients method are the probabilities that this action should be taken. According to different output forms, key resources with the number of k are selected to be divided from the public key pool and are placed into an empty sub-key pool by using a first-time adaptive algorithm.
Preferably, for the reinforcement learning method of the embodiment, the reward (aware) of the neural network is a network benefit of a distribution result of a user key request in a period of time; for a user key request, the specific strategy is as follows:
there is a sub-key pool matching the user key request, and the key resources of the sub-key pool can be directly allocated to the requester: the network profit is 2;
there is no sub-key pool matching the user key request, and the user needs to be allocated with the corresponding key resource from the public key pool: the network profit is 1;
the key resources in the public key pool cannot meet the key request, the request is blocked, and the key resources are distributed for the public key pool after waiting for t: the network income is 1/t epsilon (0, 1);
the latency exceeds a threshold after the request is blocked, the request fails: the network revenue is 0.
The reinforcement learning neural network can meet the performance requirement of the QKD network through training maximization reward (network profit), namely, higher request success rate; shorter request latency; higher key resource utilization.
Example 2
As shown in fig. 2, a method for pre-allocating QKD network key resources based on reinforcement learning includes the following steps:
s1: the control layer receives a quantum key request of a user;
s2: carrying out routing selection according to the user request;
s3: judging whether each link on the selected path has a sub-key pool matched with the user key request, and if yes, returning the key resource to the requester; if not, go to step S4;
s4: judging whether the key resources in the current public key pool can meet the request, if so, distributing the key resources for the user from the public key pool; otherwise, executing step S5;
s5: placing the request in a blocking queue waiting for key resources; if the request waiting time exceeds a threshold value, deleting the request from the queue and returning distribution failure information;
s6: the application layer receives the feedback information and stores the feedback information into a reinforcement learning library;
s7: and randomly extracting a plurality of information from the learning library, inputting the information into the reinforcement learning neural network for training, storing the model and applying a pre-distribution strategy.
Example 3
As shown in fig. 2, a method for allocating QKD network key resources based on reinforcement learning includes the following steps:
s1: the control layer receives a quantum key request of a user;
when the user needs to obtain the quantum key, a quantum key request is sent to the operator of the QKD network. The communication between the user and the operator can be carried by various carriers, such as a TCP protocol, a UDP protocol, and the like, and the following contents are required to be contained in the key request of the user: user identity ID, quantum key request quantity, source node and destination node;
preferably, in order to further simplify the system, reduce resource overhead and speed up allocation, the key sequence in the key pool is divided into key blocks by a fixed length, and the user requests allocation of keys and key resources in units of key blocks.
S2: carrying out routing selection according to the user request;
based on quantum unclonable law, quantum key transmission usually adopts a hop-by-hop encryption forwarding method, and forwarding times need to be reduced as much as possible to save key resources. Routing typically uses a shortest path algorithm or a K shortest path algorithm (KSP) to select one or more paths. If the key request in the network is in an unbalanced state for a long time, the resource of part of the links is wasted, and the part of the links are congested for a long time. Routing in conjunction with the key resources and path lengths in each link QKP may balance the request load across the network.
The above routing process includes the following steps S21 to S24, which will be further described with reference to fig. 3 and 4. Wherein FIG. 4 is a network topology diagram comprising 6 nodes and 7 links, wherein the number on a link means "link length; number of keys in public key pool ".
S21: and selecting n shortest paths as alternative routes according to the KSP algorithm. In this embodiment, the source node and the destination node are node a and node f, and n is 2, then path 1 may be selected: a → b → c → e → f (path length 8) and path 2: a → b → e → f (path length is 7).
S22: and judging whether each link of the K paths has a sub-key pool matched with the number of the user key requests. If both exist, go to step S23; otherwise, S24 is executed.
S23: and selecting the shortest path from the plurality of routes meeting the request matching condition as the route of the request.
S24: for each path selected by the KSP algorithm, taking the resource quantity K in the key pool with the minimum key resource in the public key pool of each link in the path as the key resource of the path; the priority D of each path is calculated to be K/L (where L is the length of the path), and the path with the highest priority is selected as the result of this routing.
In this embodiment, K equals 4(b → c) in path 1, K equals 3(b → e) in path 2, and the priority D1 equals 4/8 equals 1/2 in path 1; the priority D2 of path 2 is 3/7, D1 > D2, and path 1 is selected as the route of the request.
By the method, the matching degree of the subkey pool and the request in each link, the key resource amount and the path length in the public key pool can be combined, the queuing delay and the blocking probability are reduced, and the load of the whole network is balanced.
S3: judging whether each link on the selected path has a sub-key pool matched with the user key request, if so, returning the key resource to the requester; if not, go to step S4;
s4: judging whether the key resources in the current public key pool can meet the request, if so, distributing the key resources for the user from the public key pool; otherwise, executing step S5;
s5: placing the request in a blocking queue waiting for key resources; if the request latency exceeds a threshold, the request is removed from the queue and assignment failure information is returned.
S6: and returning the feedback information and the quantum key to the user. The feedback information includes: the number of keys in the public key pool and each sub-key pool, the link key generation rate, the current time, and the network gain. And storing the feedback information into a learning library.
S7: and randomly extracting a plurality of information from the learning library, inputting the information into the reinforcement learning neural network for training, storing the model and applying a pre-distribution strategy.
Preferably, the training step of the reinforcement learning neural network and the QKD network key pre-distribution strategy flow described in the present invention, as shown in fig. 5, specifically include:
s71: acquiring current state information, including: the number of keys in the public key pool and each sub-key pool, the link key generation rate, the current time, and the network gain.
The reason for considering the link key generation rate and the current time is that these two factors have an impact on the allocation policy, including:
link key generation rate: in order to balance cost and demand in a complex network, the key generation rate of each link is often different. The system should preferentially assign key requests to links with higher key generation rates, while being more aggressive in key pre-assignment behavior. Because the key generation rate is faster, the request latency is shorter when congested.
The current time: although the requests of each user are random, the overall request density and key requirements may change regularly over time. The system should adjust the pre-allocation policy based on the current time, such as making more aggressive pre-allocation of keys during periods of higher request frequency, resulting in lower overall user latency.
S72: and converting the information into a vector, inputting the vector into the reinforcement learning neural network, and obtaining output through the network.
Preferably, the network information includes: the number of keys in the public key pool and each sub-key pool, the link key generation rate, and the current time.
S73: and allocating a certain amount of key resources from the public key pool to be placed in the sub key pool according to the output of the neural network, namely a pre-allocation strategy. The output of the neural network has different meanings for different reinforcement learning methods. For example, k outputs of the neural network in the DQN (deep Q network) method are the gains of the corresponding behaviors; the k outputs of the neural network in the Policy Gradients method are the probabilities that this action should be taken. According to different output forms, key blocks with the number of k are selected to be divided from the public key pool and placed into an empty subkey pool.
S74: and carrying out key distribution according to the user request, and feeding back corresponding information and network benefits according to different distribution results. Wherein, the distribution result includes: a sub-key pool matched with the user key request exists, and corresponding key resources are distributed to a requester; if the matched sub-key pool does not exist, distributing corresponding key resources for the user from the public key pool; the key resources in the public key pool cannot meet the request, and the request is blocked; the request latency exceeds a threshold and the request fails. For different distribution results, corresponding network benefits are calculated.
To further illustrate the relationship between the reinforcement learning neural network and the key pre-allocation strategy, as shown in fig. 6, the reinforcement learning neural network comprises an input layer, a hidden layer and an output layer, wherein:
the input layer receives network state information, including: the number of keys in the public key pool and each sub-key pool, the link key generation rate, and the current time.
Preferably, the neural network input is represented by a vector, and the specific form is as follows: [ p0, p1, p2, … …, pm, q, t ]. Wherein p0 is the number of key resources in the public key pool; p1, p2, … … pm are the number of key resources in the m sub-key pools, q is the link key generation rate, and t is the current time.
The number of hidden layers and the number of neurons in each layer can affect the training speed and the training effect, and can be set according to the complexity of a specific network.
The number of neurons in the output layer is k (1, 2, 3, … …, k, respectively). The output of the neural network has different meanings for different reinforcement learning methods. For example, k outputs of the neural network in the DQN (deep Q network) method are the gains of the corresponding behaviors; the k outputs of the neural network in the Policy Gradients method are the probabilities that this action should be taken. According to different output forms, k key resources are selected to be distributed from the public key pool and put into an empty sub-key pool by using a first-time adaptive algorithm.
Preferably, for the reinforcement learning method of the embodiment, the reward (aware) of the neural network is a network benefit of a distribution result of a user key request in a period of time; for a user key request, the specific strategy is as follows:
there is a sub-key pool matching the user key request, and the key resources of the sub-key pool can be directly allocated to the requester: the network profit is 2;
there is no sub-key pool matching the user key request, and the user needs to be allocated with the corresponding key resource from the public key pool: the network profit is 1;
the key resources in the public key pool cannot meet the key request, the request is blocked, and the key resources are distributed for the public key pool after waiting for t: the network profit is 1/t E (0, 1);
the latency exceeds a threshold after the request is blocked, the request fails: the network gain is 0.
The reinforcement learning neural network can meet the performance requirement of the QKD network through training maximization reward (network income), namely, higher request success rate; shorter request latency; higher key resource utilization.
The same or similar reference numerals correspond to the same or similar parts;
the positional relationships depicted in the drawings are for illustrative purposes only and are not to be construed as limiting the present patent;
it should be understood that the above-described embodiments of the present invention are merely examples for clearly illustrating the present invention, and are not intended to limit the embodiments of the present invention. Other variations and modifications will be apparent to persons skilled in the art in light of the above description. And are neither required nor exhaustive of all embodiments. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the claims of the present invention.

Claims (10)

1. A QKD network key resource pre-allocation method based on reinforcement learning is characterized by comprising the following steps:
s1: the control layer receives a quantum key request of a user;
s2: carrying out routing selection according to the user request;
s3: judging whether each link on the selected path has a sub-key pool matched with the user key request, if so, returning the key resource to the requester; if not, go to step S4;
s4: judging whether the key resources in the current public key pool can meet the request, if so, distributing the key resources for the user from the public key pool; otherwise, executing step S5;
s5: placing the request in a blocking queue waiting for key resources; if the request waiting time exceeds a threshold value, deleting the request from the queue and returning distribution failure information;
s6: the application layer receives the feedback information and stores the feedback information into a reinforcement learning library;
s7: and randomly extracting a plurality of information from the learning library, inputting the information into the reinforcement learning neural network for training, storing the model and applying a pre-distribution strategy.
2. The method for pre-distributing QKD network key resources based on reinforcement learning according to claim 1, wherein in step S1, the user request message includes: user identity ID, quantum key request quantity, source node and destination node.
3. The method for pre-distributing QKD network key resources based on reinforcement learning according to claim 2, wherein in said step S2, the routing algorithm is: the user request comprises the number of the key requests, the information of the source node and the information of the destination node; selecting n shortest paths by a K shortest path algorithm KSP, calculating the priority D1, D2, … … and Dn of each path according to the key resources and the path length in the key pool passing through each link in each path, and selecting the path with the highest priority.
4. The method for pre-allocating QKD network key resources based on reinforcement learning of claim 3, characterized in that the concrete process of said step S2 is:
s21: selecting n shortest paths as alternative routes according to a KSP algorithm;
s22: judging whether each link of the K paths has a sub-key pool matched with the number of the user key requests, if so, executing the step S23; otherwise, go to S24;
s23: selecting the shortest path from the plurality of routes meeting the request matching condition as the route of the request;
s24: for each path selected by the KSP algorithm, taking the resource quantity K in the key pool with the minimum key resource in the public key pool of each link in the path as the key resource of the path; and calculating the priority D of each path, wherein L is the length of the path, and selecting the path with the highest priority as the result of the routing.
5. The method for pre-allocating QKD network key resources based on reinforcement learning of claim 4, characterized in that the concrete process of the step S7 is as follows:
s71: acquiring historical information from a learning library, wherein the historical information comprises: the number of keys in the public key pool and each sub-key pool, the link key generation rate, the current time and the network gain;
s72: converting the network information into a vector, inputting the vector into a reinforcement learning neural network, and obtaining output through the network;
s73: distributing a certain amount of key resources from the public key pool to be placed into the sub-key pool according to the output of the neural network, namely a pre-distribution strategy;
s74: and carrying out key distribution according to the user request, and feeding back corresponding information and network benefits according to different distribution results.
6. The method for pre-allocating QKD network key resources based on reinforcement learning of claim 5, wherein the network information in the step S72 includes: the number of keys in the public key pool and each sub-key pool, the link key generation rate, and the current time.
7. The reinforcement learning-based QKD network key resource pre-allocation method according to claim 6, wherein in step S73, the pre-allocation policy includes: observing the states of the public key pool and the sub-key pool, and when the key resources in the public key pool are greater than a threshold value and an empty sub-key pool exists, inputting the state information of the public key pool and the sub-key pool, the link key generation rate and the time into the reinforcement learning neural network and obtaining K outputs; and selecting one output K from the K outputs as the pre-distribution quantity, dividing key blocks with the quantity of K from the public key pool, and putting the key blocks into an empty sub-key pool.
8. The method of claim 7, wherein the e-learning neural network comprises an input layer, a hidden layer and an output layer, and wherein: the input layer receives network state information, including: the number of keys in the public key pool and each sub-key pool, the link key generation rate, and the current time.
9. The method of claim 8, wherein the neural network input is represented by a vector, and is in the form of: [ p0, p1, p2, … …, pm, q, t ], wherein p0 is the number of key resources in the public key pool; p1, p2, … … pm are the number of key resources in the m sub-key pools, q is the link key generation rate, and t is the current time.
10. The reinforcement learning-based QKD network key resource pre-allocation method according to claim 9, wherein the reward for the neural network is a network gain of the allocation result of the user key request over a period of time; for a user key request, the specific strategy is as follows:
there is a sub-key pool matching the user key request, and the key resources of the sub-key pool can be directly allocated to the requester: the network profit is 2;
there is no sub-key pool matching the user key request, and the user needs to be allocated with the corresponding key resource from the public key pool: the network profit is 1;
the key resources in the public key pool cannot meet the key request, the request is blocked, and the key resources are distributed for the public key pool after waiting for t: the network income is 1/t epsilon (0, 1);
the latency exceeds a threshold after the request is blocked, the request fails: the network gain is 0.
CN202111679797.9A 2021-12-31 2021-12-31 QKD network key resource pre-allocation method based on reinforcement learning Active CN114499842B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111679797.9A CN114499842B (en) 2021-12-31 2021-12-31 QKD network key resource pre-allocation method based on reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111679797.9A CN114499842B (en) 2021-12-31 2021-12-31 QKD network key resource pre-allocation method based on reinforcement learning

Publications (2)

Publication Number Publication Date
CN114499842A true CN114499842A (en) 2022-05-13
CN114499842B CN114499842B (en) 2023-06-30

Family

ID=81509888

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111679797.9A Active CN114499842B (en) 2021-12-31 2021-12-31 QKD network key resource pre-allocation method based on reinforcement learning

Country Status (1)

Country Link
CN (1) CN114499842B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115694815A (en) * 2023-01-03 2023-02-03 国网天津市电力公司电力科学研究院 Communication encryption method and device for power distribution terminal
CN117176345A (en) * 2023-10-31 2023-12-05 中电信量子科技有限公司 Quantum cryptography network key relay dynamic routing method, device and system

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190149327A1 (en) * 2017-11-14 2019-05-16 Alibaba Group Holding Limited Method and system for quantum key distribution and data processing
US20210067331A1 (en) * 2016-11-28 2021-03-04 Quantumctek (Guangdong) Co., Ltd. Method for issuing quantum key chip, application method, issuing platform and system
CN112769550A (en) * 2020-12-29 2021-05-07 中天通信技术有限公司 Load balancing quantum key resource distribution system facing data center
CN113179514A (en) * 2021-03-25 2021-07-27 北京邮电大学 Quantum key distribution method and related equipment in relay coexistence scene

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210067331A1 (en) * 2016-11-28 2021-03-04 Quantumctek (Guangdong) Co., Ltd. Method for issuing quantum key chip, application method, issuing platform and system
US20190149327A1 (en) * 2017-11-14 2019-05-16 Alibaba Group Holding Limited Method and system for quantum key distribution and data processing
CN112769550A (en) * 2020-12-29 2021-05-07 中天通信技术有限公司 Load balancing quantum key resource distribution system facing data center
CN113179514A (en) * 2021-03-25 2021-07-27 北京邮电大学 Quantum key distribution method and related equipment in relay coexistence scene

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115694815A (en) * 2023-01-03 2023-02-03 国网天津市电力公司电力科学研究院 Communication encryption method and device for power distribution terminal
CN115694815B (en) * 2023-01-03 2023-03-28 国网天津市电力公司电力科学研究院 Communication encryption method and device for power distribution terminal
CN117176345A (en) * 2023-10-31 2023-12-05 中电信量子科技有限公司 Quantum cryptography network key relay dynamic routing method, device and system
CN117176345B (en) * 2023-10-31 2024-01-09 中电信量子科技有限公司 Quantum cryptography network key relay dynamic routing method, device and system

Also Published As

Publication number Publication date
CN114499842B (en) 2023-06-30

Similar Documents

Publication Publication Date Title
Chen et al. Exploring fog computing-based adaptive vehicular data scheduling policies through a compositional formal method—PEPA
CN114499842B (en) QKD network key resource pre-allocation method based on reinforcement learning
Liu et al. Asynchronous deep reinforcement learning for collaborative task computing and on-demand resource allocation in vehicular edge computing
Cao et al. Multi-tenant provisioning for quantum key distribution networks with heuristics and reinforcement learning: A comparative study
CN108076158B (en) Minimum load route selection method and system based on naive Bayes classifier
CN106411749A (en) Path selection method for software defined network based on Q learning
CN112769550B (en) Load balancing quantum key resource distribution system facing data center
CN112737776B (en) Data center-oriented quantum key resource allocation method for load balancing
Syed et al. Design of resources allocation in 6G cybertwin technology using the fuzzy neuro model in healthcare systems
CN106537824A (en) Method and apparatus for reducing response time in information-centric networks
Chen et al. A heuristic remote entanglement distribution algorithm on memory-limited quantum paths
WO2022143987A1 (en) Tree model training method, apparatus and system
Kamran et al. DECO: Joint computation scheduling, caching, and communication in data-intensive computing networks
Chen et al. Q-DDCA: Decentralized dynamic congestion avoid routing in large-scale quantum networks
Moreira et al. Task allocation framework for software-defined fog v-RAN
Pham Optimizing service function chaining migration with explicit dynamic path
Ikeda et al. Performance evaluation of an intelligent CAC and routing framework for multimedia applications in broadband networks
Yi et al. Cost and security-aware resource allocation in optical data center networks
Wong et al. A Century-Long Challenge in Teletraffic Theory: Blocking Probability Evaluation for Overflow Loss Systems with Mutual Overflow
Li et al. Distributed deep learning inference acceleration using seamless collaboration in edge computing
CN114666805B (en) Optical network planning method and system suitable for multi-granularity service
Chen et al. A quantum key distribution routing scheme for hybrid-trusted QKD network system
Dai et al. The capacity region of entanglement switching: Stability and zero latency
CN113708982B (en) Service function chain deployment method and system based on group learning
CN109474464A (en) A kind of fast network update method based on OpenNF mechanism

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20240102

Address after: 510663 room A105, first floor, ladder a, No. 11, panglv Road, Science City, Guangzhou high tech Industrial Development Zone, Guangdong Province

Patentee after: Guangdong Yukopod Technology Development Co.,Ltd.

Address before: 510898 No. 55, Zhongshan Avenue West, Tianhe District, Guangzhou, Guangdong

Patentee before: SOUTH CHINA NORMAL University

TR01 Transfer of patent right