CN116681787A - Deep reinforcement learning-based graph coloring method - Google Patents

Deep reinforcement learning-based graph coloring method Download PDF

Info

Publication number
CN116681787A
CN116681787A CN202310643641.8A CN202310643641A CN116681787A CN 116681787 A CN116681787 A CN 116681787A CN 202310643641 A CN202310643641 A CN 202310643641A CN 116681787 A CN116681787 A CN 116681787A
Authority
CN
China
Prior art keywords
agent
coloring
graph coloring
graph
reinforcement learning
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310643641.8A
Other languages
Chinese (zh)
Inventor
李小龙
黄珂
陈晓红
董莉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hunan University of Technology
Original Assignee
Hunan University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hunan University of Technology filed Critical Hunan University of Technology
Priority to CN202310643641.8A priority Critical patent/CN116681787A/en
Publication of CN116681787A publication Critical patent/CN116681787A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T11/002D [Two Dimensional] image generation
    • G06T11/001Texturing; Colouring; Generation of texture or colour
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/092Reinforcement learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/90Determination of colour characteristics
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Generation (AREA)

Abstract

A graph coloring method based on deep reinforcement learning; creating a graph coloring environment; using a messaging neural network as an agent for deep reinforcement learning; the intelligent agent randomly explores in the graph coloring environment and stores the data generated in each step into an experience playback pool; randomly sampling the information of the generated data from an experience playback pool to perform estimation update, inputting the current state into the intelligent body to output Q-value of each action, taking the vertex corresponding to the maximum Q-value as the optimal action, calculating a reward value, and training the intelligent body; and coloring the graph by using the trained agent, and giving a coloring scheme and a color number. The application converts the graph coloring problem into a unidirectional Markov chain model, so that the agent does not fall into a circulation track in the exploration process, a strategy for increasing the color number according to the need is designed in the process of increasing the construction solution, priori knowledge of the color number is not needed, and finally, a reward function is designed to guide the agent to learn the optimal coloring strategy.

Description

Deep reinforcement learning-based graph coloring method
Technical Field
The application belongs to the field of deep learning, and particularly relates to a graph coloring method based on deep reinforcement learning.
Background
Graphics shading problems (GCP) are well known combinatorial optimization problems in the field of operations research. The goal of this problem is to color the nodes of the graph with a minimum of colors, while ensuring that neighboring nodes do not have the same color. GCP has many practical applications such as creating schedules, fault diagnosis, mobile radio frequency allocation, and register allocation. Traditional methods for solving the graphics shading problem mainly comprise an accurate algorithm, an approximation algorithm and a heuristic algorithm. Accurate algorithms, such as algorithms using branch-and-bound or other enumeration methods, can produce optimal solutions for small-scale problems. However, due to their non-polynomial time complexity, these methods become difficult to handle for large scale problems. The approximation algorithm can improve the computational efficiency and provide a theoretically optimal solution, but cannot guarantee that the minimum color number is obtained in a scenario where a high quality solution is required. Heuristic algorithms, such as tabu searches, can find good solutions in an acceptable time, but the design of these algorithms requires extensive domain knowledge. Furthermore, the nature of these algorithms is iterative, requiring a new search for each new instance, and thus not applicable to practical scenarios where there is time limitation and flexibility is required. It is therefore crucial and profound to find a graph coloring method that can accommodate the need for fast solutions and has generalized performance.
Disclosure of Invention
The application aims to provide a graph coloring method based on deep reinforcement learning for coloring nodes of a graph.
The technical scheme of the application is as follows:
a graph coloring method based on deep reinforcement learning is characterized in that,
creating a graph coloring environment; using a messaging neural network as an agent for deep reinforcement learning;
the intelligent agent randomly explores in the graph coloring environment by using an epsilon-greedy strategy, and the data generated in each step are stored in an experience playback pool;
when the data in the experience playback pool reaches the quantity M, randomly sampling the Transition information of the generated data from the experience playback pool to perform evaluation updating, inputting the current state into the intelligent body to output Q-value of each action, taking the vertex corresponding to the maximum Q-value as the optimal action, calculating a reward value, and training the intelligent body;
and coloring the graph by using the trained agent, and giving a coloring scheme and a color number.
Further, the creating a graph coloring environment specifically includes:
the graph coloring problem is converted into a unidirectional Markov chain model, potential energy relations among various states are constructed, and then a strategy of increasing the number of colors according to the need is adopted to create the graph coloring environment.
Further, designing a bonus function with IAP directs the agent to learn the optimal coloring strategy.
Further, the design principle of the IAP specifically includes:
IAP should take precedence over other penalties; because the goal of the agent is to maximize the return, when the reward for increasing the color number is less than the reward for an invalid action, the agent may select an invalid action instead of an action that causes the color number to increase, which may result in the agent not being able to complete the graph coloring task in a limited number of steps until the agent's exploration is stopped.
The IAP and subject rewards should remain consistent throughout; excessive punishment can affect the learning of the principal objective by the agent.
Further, the reward function formula is:
wherein χ(s) t ) Is state s t Is used for the color number of the color number,representing a set of invalid actions, a representing the action currently performed, i.e. selecting a colored vertex.
Further, the strategy for increasing the color number according to the need is specifically as follows:
the number of colors is initialized to 1, and the number of colors is increased when the existing number of colors does not satisfy the constraint that the adjacent vertex colors are different in the process of increasing the construction solution.
Further, training the agent.
The application has the technical effects that:
the application provides a graph coloring method based on deep reinforcement learning. The graph coloring problem is converted into a unidirectional Markov chain model, so that an intelligent agent cannot fall into a circulation track in the exploration process, and potential energy relations among all states are constructed. The strategy of increasing the number of colors as needed is designed in the process of incrementally constructing the solution so that a priori knowledge of the number of colors is not required. Finally, a reward function with IAP is designed to guide the agent to learn the optimal coloring strategy.
Drawings
The accompanying drawings illustrate various embodiments by way of example in general and not by way of limitation, and together with the description and claims serve to explain the inventive embodiments. Wherever possible, the same reference numbers will be used throughout the drawings to refer to the same or like parts. Such embodiments are illustrative and not intended to be exhaustive or exclusive of the present apparatus or method.
FIG. 1 illustrates a schematic diagram of a deep reinforcement learning-based graph coloring algorithm framework of the present application;
figure 2 shows a schematic diagram of a unidirectional Markov chain model of the graph coloring process of the present application.
Detailed Description
It should be noted that, without conflict, the embodiments of the present application and features of the embodiments may be combined with each other. The application will be described in detail below with reference to the drawings in connection with embodiments.
A graph coloring method based on deep reinforcement learning,
first, the graphics shading problem GCP is converted to a unidirectional Markov chain model, which can simplify the complexity and ease of handling of the graphics shading process. In one aspect, the application provides that no matter what decision is performed, each state will return to the previous state, which allows the agent to not get into the loop track during the exploration process. On the other hand, based on the change in solution subset, the relationship of potential energy between states on the Markov chain is given. It was then demonstrated that when the return value was maximum, all vertices were colored and the number of colors was minimal.
The initial state s (1, 0) indicates that the graphics-rendering task starts from a blank graph with a color number of 1. The state s (, k) represents the number of colors χ in the current state, and the number of vertices that have been colored k (k.ltoreq.n). Lambda (lambda) (,k) Indicating that state s (χ, k) has been repeatedly accessed λ (,) And twice. When all vertices are colored k=n), i.e. the end state is reached. In addition to the initial state and the end state, there are three transition paths for each state: (1) Coloring the uncolored vertices results in an increase in color, i.e., s (, k) →s (χ+1, k+1); (2) Coloring the uncolored vertex without increasing the number of colors, i.e., s (k) →s (χ, k+1); (3) A vertex that has been colored is selected, the state remains unchanged, i.e
Then, the application provides a deep reinforcement learning framework, which adopts MPNN as an intelligent agent of the deep reinforcement learning and aggregates the related information among the vertexes in the graph. In particular, a reward function based on a unidirectional Markov chain is designed to guide the agent to a state of potential energy increase, so that the agent learns how to reach a state of maximum potential energy in a limited time. Based on the method, a GCP-oriented Q-learning algorithm is provided for training the intelligent agent. In the proposed algorithm, the strategy of increasing the number of colors as needed makes the method unnecessary to obtain a priori knowledge of the optimal number of colors. Specifically, the number of colors is initialized to 1, and the number of colors is increased when the existing number of colors does not satisfy the constraint that the adjacent vertex colors are different in the process of incrementally constructing the solution.
MPNN is used as a graph neural network framework, and can effectively transfer and aggregate information such as colors, degrees and the like of vertexes on a graph in the graph coloring problem. Aiming at the graph coloring problem, the application designs an MPNN architecture, and the message transfer function of the MPNN architecture is as follows:
wherein θ is 1, N (v) is the set of adjacent vertices of vertex v, w, which is the neural network parameter of the k-th layer messaging layer uv Weights for the edges connecting vertex v and vertex u,being characteristic of vertex v in the kth round of messaging, ζ v Is the adjacent edge feature of vertex v.
The update function is as follows:
wherein θ is 2, Neural network parameters of the layer are updated for the k-th layer vertices, with brackets indicating the stitching operation.
The output function of the readout phase is as follows:
wherein θ is 3 And theta 4 V is the set of vertices, which are parameters of the two-layer neural network in the readout phase.
Finally, the application provides a specific implementation of the reward function. To avoid that the state remains stationary, IAPs are added to the bonus function to mask invalid actions generated during the decision process (vertices that have already been colored cannot be selected for coloring again). Since IAPs are difficult to size and are subject to conflict with the primary target rewards, poor results are often caused. Therefore, the application provides the design principle of IAP for GCP: (1) IAP should take precedence over other penalties. Because the goal of the agent is to maximize the return (sum of all rewards), when the rewards for increasing the color number are less than the rewards for ineffective actions, the agent may select an ineffective action instead of an action that causes the color number to increase, which may result in the agent not completing the graph coloring task in a limited number of steps until the agent's exploration is stopped. (2) the IAP and subject rewards should remain consistent throughout. Excessive punishment can affect the learning of the principal objective by the agent. Coordination with the graph coloring rewards is achieved. The reward function formula is as follows:
wherein χ(s) t ) Is state s t Is used for the color number of the color number,representing a set of invalid actions, a representing the action currently performed (i.e., selecting a colored vertex).
Firstly, creating a graph coloring environment according to a Markov chain model and a strategy of increasing the number of colors according to needs, performing coloring actions, coloring vertexes according to the proposed strategy of increasing colors according to needs, calculating a reward value according to a proposed reward function formula with IAP, and finally updating vertex states (such as vertex colors, available colors of vertexes, current color numbers and the like).
Second, the agent randomly explores the data generated in each step (s, a, r, s) on a randomly generated graph with an epsilon-greedy strategy ) (s represents the current state, a represents the action, r represents the reward for executing action a in the current state, s Representing the next state) is stored in the experience playback pool.
Third, when the data in the empirical playback pool reaches a certain amount, randomly sampling (s, a, r, s ) The Transition information of (a) is used for carrying out estimation update, the current state s is input into an intelligent agent to output the Q-value of each action, and the vertex corresponding to the biggest Q-value, namely argmaxQ (s, A) (A is an action set) is taken as the optimal action a * And calculating a reward value and training the intelligent agent.
And fourthly, coloring the graph by using the trained intelligent agent, and giving a coloring scheme and a color number.
The foregoing is only a preferred embodiment of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art, who is within the scope of the present application, should make equivalent substitutions or modifications according to the technical solution of the present application and the inventive concept thereof, and should be covered by the scope of the present application.

Claims (6)

1. A graph coloring method based on deep reinforcement learning is characterized in that,
creating a graph coloring environment; using a messaging neural network as an agent for deep reinforcement learning;
the intelligent agent randomly explores in the graph coloring environment by using an epsilon-greedy strategy, and the data generated in each step are stored in an experience playback pool;
when the data in the experience playback pool reaches M, M is a system parameter, randomly sampling Transition information of the generated data from the experience playback pool to perform estimation update, inputting the current state into the intelligent body to output Q-value of each action, taking the vertex corresponding to the maximum Q-value as the optimal action to calculate a reward value, and training the intelligent body;
and coloring the graph by using the trained agent, and giving a coloring scheme and a color number.
2. The deep reinforcement learning-based graph coloring method according to claim 1, wherein the creating a graph coloring environment is specifically:
the graph coloring problem is converted into a unidirectional Markov chain model, potential energy relations among various states are constructed, and then a strategy of increasing the number of colors according to the need is adopted to create the graph coloring environment.
3. The deep reinforcement learning based graph coloring method of claim 1, wherein designing a reward function with an ineffective action penalty IAP directs an agent to learn an optimal coloring strategy.
4. The deep reinforcement learning-based graph coloring method according to claim 3, wherein the IAP design principle is specifically as follows:
IAP should take precedence over other penalties; because the goal of the agent is to maximize the return, when the reward for increasing the color number is less than the reward for an invalid action, the agent may select an invalid action instead of an action that causes the color number to increase, which may lead to the agent not being able to complete the graph coloring task in a limited number of steps before the exploration is stopped;
the IAP and subject rewards should remain consistent throughout; excessive punishment can affect the learning of the principal objective by the agent.
5. The deep reinforcement learning based graph coloring method of claim 3, wherein the reward function formula is:
wherein χ (x) t ) Is state s t Is used for the color number of the color number,representing a set of invalid actions, a representing the action currently performed, i.e. selecting a colored vertex.
6. The deep reinforcement learning-based graph coloring method according to claim 2, wherein the strategy of increasing the number of colors on demand is specifically as follows:
the number of colors is initialized to 1, and the number of colors is increased when the existing number of colors does not satisfy the constraint that the adjacent vertex colors are different in the process of increasing the construction solution.
CN202310643641.8A 2023-06-01 2023-06-01 Deep reinforcement learning-based graph coloring method Pending CN116681787A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310643641.8A CN116681787A (en) 2023-06-01 2023-06-01 Deep reinforcement learning-based graph coloring method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310643641.8A CN116681787A (en) 2023-06-01 2023-06-01 Deep reinforcement learning-based graph coloring method

Publications (1)

Publication Number Publication Date
CN116681787A true CN116681787A (en) 2023-09-01

Family

ID=87780399

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310643641.8A Pending CN116681787A (en) 2023-06-01 2023-06-01 Deep reinforcement learning-based graph coloring method

Country Status (1)

Country Link
CN (1) CN116681787A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118196439A (en) * 2024-05-20 2024-06-14 山东浪潮科学研究院有限公司 Certificate photo color auditing method based on visual language model and multiple agents

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118196439A (en) * 2024-05-20 2024-06-14 山东浪潮科学研究院有限公司 Certificate photo color auditing method based on visual language model and multiple agents

Similar Documents

Publication Publication Date Title
CN113435606A (en) Method and device for optimizing reinforcement learning model, storage medium and electronic equipment
CN112434171A (en) Knowledge graph reasoning and complementing method and system based on reinforcement learning
CN116681787A (en) Deep reinforcement learning-based graph coloring method
CN113780002A (en) Knowledge reasoning method and device based on graph representation learning and deep reinforcement learning
CN113988508B (en) Power grid regulation strategy optimization method based on reinforcement learning
CN112990485A (en) Knowledge strategy selection method and device based on reinforcement learning
Wu et al. Matrix representation of stability definitions for the graph model for conflict resolution with reciprocal preference relations
CN113947320A (en) Power grid regulation and control method based on multi-mode reinforcement learning
CN109726676A (en) The planing method of automated driving system
Li et al. Solving open shop scheduling problem via graph attention neural network
CN115099606A (en) Training method and terminal for power grid dispatching model
CN114839884A (en) Underwater vehicle bottom layer control method and system based on deep reinforcement learning
CN117273057A (en) Multi-agent collaborative countermeasure decision-making method and device based on reinforcement learning
Tassel et al. An end-to-end reinforcement learning approach for job-shop scheduling problems based on constraint programming
Huang et al. Solving the inverse graph model for conflict resolution using a hybrid metaheuristic algorithm
Ling et al. A deep reinforcement learning based real-time solution policy for the traveling salesman problem
Kim et al. Solving pbqp-based register allocation using deep reinforcement learning
Zhang et al. Universal value iteration networks: When spatially-invariant is not universal
Wang et al. Discovering Lin-Kernighan-Helsgaun heuristic for routing optimization using self-supervised reinforcement learning
CN115186828A (en) Behavior decision method, device and equipment for man-machine interaction and storage medium
CN114995818A (en) Method for automatically configuring optimized parameters from Simulink model to C language
Wu et al. Discrete teaching-learning-based optimization algorithm for traveling salesman problems
Li et al. Diverse Policy Optimization for Structured Action Space
Han et al. Robot path planning in dynamic environments based on deep reinforcement learning
CN113869488A (en) Game AI intelligent agent reinforcement learning method facing continuous-discrete mixed decision

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination