CN114942895B - Address mapping strategy design method based on reinforcement learning - Google Patents
Address mapping strategy design method based on reinforcement learning Download PDFInfo
- Publication number
- CN114942895B CN114942895B CN202210714310.4A CN202210714310A CN114942895B CN 114942895 B CN114942895 B CN 114942895B CN 202210714310 A CN202210714310 A CN 202210714310A CN 114942895 B CN114942895 B CN 114942895B
- Authority
- CN
- China
- Prior art keywords
- bim
- reinforcement learning
- strategy
- network
- learning model
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 230000002787 reinforcement Effects 0.000 title claims abstract description 55
- 238000013507 mapping Methods 0.000 title claims abstract description 50
- 238000000034 method Methods 0.000 title claims abstract description 32
- 238000013461 design Methods 0.000 title abstract description 13
- 239000011159 matrix material Substances 0.000 claims abstract description 40
- 230000002441 reversible effect Effects 0.000 claims abstract description 22
- 230000008901 benefit Effects 0.000 claims abstract description 7
- 230000009471 action Effects 0.000 claims description 42
- 230000008569 process Effects 0.000 claims description 14
- 230000006870 function Effects 0.000 claims description 13
- 230000033001 locomotion Effects 0.000 claims description 12
- 230000009466 transformation Effects 0.000 claims description 10
- 238000004422 calculation algorithm Methods 0.000 claims description 8
- 238000012549 training Methods 0.000 claims description 8
- 238000013528 artificial neural network Methods 0.000 claims description 3
- 230000006835 compression Effects 0.000 claims description 3
- 238000007906 compression Methods 0.000 claims description 3
- 230000004913 activation Effects 0.000 claims description 2
- 238000000844 transformation Methods 0.000 claims description 2
- 230000000875 corresponding effect Effects 0.000 claims 2
- 230000001186 cumulative effect Effects 0.000 claims 1
- 238000003062 neural network model Methods 0.000 claims 1
- 230000005055 memory storage Effects 0.000 abstract 1
- 238000005457 optimization Methods 0.000 description 17
- 230000006872 improvement Effects 0.000 description 4
- 238000010586 diagram Methods 0.000 description 3
- 238000009795 derivation Methods 0.000 description 2
- 238000002474 experimental method Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/10—Address translation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Computing Systems (AREA)
- Computational Linguistics (AREA)
- Biomedical Technology (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Biophysics (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Machine Translation (AREA)
Abstract
The invention relates to an address mapping strategy design method based on reinforcement learning. A binary invertible matrix (Binary Invertible Matrix, BIM) is used to represent the address mapping strategy of the main stream, and the best line cache hit rate address mapping strategy is trained in combination with a reinforcement learning model. The reversibility of the binary reversible matrix BIM enables the effective mapping of physical addresses and addresses of memory storage units, and the BIM has the irreplaceable advantages of flexible expression of address mapping strategies, low hardware overhead and the like.
Description
Technical Field
The invention relates to an address mapping strategy design method based on reinforcement learning.
Background
In a computer architecture, the performance improvement speed of a processor and the performance improvement speed of a memory are always unbalanced and developed, so that memory access delay becomes an important factor for limiting the performance improvement of a system. Since the problem of "memory wall" was proposed, hardware accelerator performance improvement in computer systems has long been one of the key research objectives in computer architecture, and memory controllers have been one of the key to improving accelerator performance. Memory controllers are optimized by students from various angles at home and abroad, and system delay is reduced. Most address mapping strategies have the problems of strong pertinence, incapability of being widely popularized to other applications and insufficient flexibility for realizing high-performance access in the accelerator special for the field.
Disclosure of Invention
The invention aims to provide an address mapping strategy design method based on reinforcement learning, which uses a binary reversible matrix to represent the address mapping strategy of a main stream and trains the address mapping strategy of the optimal line cache hit rate by combining with a reinforcement learning model.
In order to achieve the above purpose, the technical scheme of the invention is as follows: an address mapping strategy design method based on reinforcement learning uses a binary reversible matrix BIM to represent an address mapping strategy, and combines the reinforcement learning model to train the address mapping strategy with optimal line cache hit rate. The implementation mode is as follows: taking one-dimensional expansion of a binary reversible matrix BIM as input of a reinforcement learning model; taking the initial BIM line cache hit rate as the current optimal value H best of the reinforcement learning model; selecting actions by the reinforcement learning model according to the probability to obtain candidate BIMs; when the line cache hit rate obtained by calculating the candidate BIM is higher than that of the current BIM, the reinforcement learning model replaces the current BIM with the candidate BIM; then, recalculating the rewarding value and simultaneously updating the parameters of the reinforcement learning model; the reinforcement learning model is continuously iterated and optimized according to the process, and the trained BIM is obtained through convergence according to a preset stopping rule; and simultaneously, the address mapping strategy with the highest line cache hit rate is obtained.
Compared with the prior art, the invention has the following beneficial effects: the invention discloses an address mapping strategy design method based on reinforcement learning, which combines a binary reversible matrix BIM and reinforcement learning as an address mapping strategy design of a memory controller for the first time. The binary reversible matrix BIM has extremely high flexibility in the expression of the address mapping strategies, and can correctly represent all the current address mapping strategies. In addition, the invention combines a reinforcement learning model based on strategy gradient, so that BIM learns the address mapping strategy with highest line cache hit rate aiming at different access modes of the neural network accelerator. And implementing the trained and learned BIM model in hardware in the memory controller.
Drawings
Fig. 1 is a representation of an address mapping policy.
Fig. 2 is a schematic diagram showing a main stream address mapping policy by BIM.
FIG. 3 is a schematic diagram of reinforcement learning strategy network optimization BIM.
FIG. 4 is an optimized iterative BIM algorithm.
FIG. 5 is a Mini-batch training reinforcement learning model algorithm.
FIG. 6 is a schematic diagram of a reinforcement learning model system workflow.
Detailed Description
The technical scheme of the invention is specifically described below with reference to the accompanying drawings.
The invention relates to an address mapping strategy design method based on reinforcement learning, which uses a binary reversible matrix BIM to represent an address mapping strategy and combines a reinforcement learning model to train an optimal line cache hit rate address mapping strategy. The implementation mode is as follows: taking one-dimensional expansion of a binary reversible matrix BIM as input of a reinforcement learning model; taking the initial BIM line cache hit rate as the current optimal value H best of the reinforcement learning model; selecting actions by the reinforcement learning model according to the probability to obtain candidate BIMs; when the line cache hit rate obtained by calculating the candidate BIM is higher than that of the current BIM, the reinforcement learning model replaces the current BIM with the candidate BIM; then, recalculating the rewarding value and simultaneously updating the parameters of the reinforcement learning model; the reinforcement learning model is continuously iterated and optimized according to the process, and the trained BIM is obtained through convergence according to a preset stopping rule; and simultaneously, the address mapping strategy with the highest line cache hit rate is obtained.
The following is a specific implementation procedure of the present invention.
The principle of the memory address mapping strategy is that addresses and memory cell locations in the DRAM are mapped into specific locations of the DRAM according to certain rules. FIG. 1 illustrates the currently prevailing memory address mapping strategy, which is simplified into an 8-bit physical address by representing the DRAM's different address mapping strategy with simplified address bits. Wherein the first 2 bits are Bank bits, followed by 4-bit row bits, and the last 2 bits are column address bits. Fig. 1 (a) shows BRC, and the policy addresses are mapped to physical addresses in the order of Bank, row and column. Fig. 1 (b) shows RBC, which permutes the Bank bits and the row bits, and places the row bits before the Bank bits, with the column coordinate bits unchanged. Fig. 1 (c) shows bit inversion, i.e., the initial Bank bits and the row bits are arranged in reverse order. And FIG. 1 (d) shows Permutation-based, which exclusive OR the Bank bits with part of the row address bits to generate new Bank address bits. Fig. 1 (e) is a strategy of memory address mapping based on a binary invertible matrix (Binary Invertible Matrix, BIM), which multiplies an initial physical address and the binary invertible matrix to obtain address information of a corresponding BIM address mapping.
All the strategies described above can be represented by binary invertible matrix BIM. The policy implementation is to multiply the original address with BIM to get the required address mapping. The binary reversible matrix BIM consists of 1 or 0, so that the realization of the memory address mapping can be realized by hardware only by an and gate and an exclusive or gate. The AND gate and the exclusive OR gate are used for multiplication and addition operation respectively, and the process can effectively reduce the hardware overhead of memory address mapping. The reversibility of the binary invertible matrix enables an efficient mapping of physical addresses to addresses of memory cells. As shown in fig. 2, the main stream address mapping policies shown in fig. 2 (a) - (d) can each be expressed by BIM. Because BIM has the irreplaceable advantages of the performance, low hardware overhead and the like, the memory controller based on reinforcement learning has obvious advantages of selecting BIM as a carrier of a memory address mapping strategy in a system.
1. Reinforcement learning optimization BIM
The optimization of BIM in the invention mainly comprises the step of performing elementary matrix transformation on a binary identity matrix in a strategy gradient algorithm model. The action space of the reinforcement learning model is composed of all possible row/column switching actions of the binary invertible matrix.
(1) Strategic network design
In the invention, the action of optimizing BIM address mapping strategy with higher access efficiency is learned by using a strategy network pi. The strategy network is designed into two fully-connected layers in cascade, and a non-linear factor is introduced in the strategy network by taking a ReLU as an activation function in the first layer. The output of the second layer of the network is connected with the Softmax function in a fully-connected mode. As shown in fig. 3, is an example of BIM optimization. The design sequentially expands the binary reversible matrix BIM line by line into one-dimensional data serving as an input of a strategy network. Based on the probability distribution, the model will select an action in the action space as the current optimization action for the binary reversible BIM. The BIM is transformed according to the last action to become a new binary reversible matrix BIM, and the binary reversible matrix BIM at the moment is used as the input of the next moment strategy network. In the following example, BIM is simplified into a binary reversible matrix BIM of 6×6 as an address mapping policy, and optimization of BIM can be performed by selecting a corresponding row/column transformation according to the output of the model, so that the optimization is iterated and iterated in a loop.
(2) Motion space optimization
In the binary reversible matrix BIM model, the number of action spaces of the reinforcement learning model isWhere b is the row/column number of the binary invertible matrix. Assuming a binary invertible matrix of 32 x 32 for BIM, the total motion space is 992, i.e. the BIM transform has 992 transform choices at a time. When the training process of the reinforcement learning model requires multiple iterative learning, the action space of the optimization BIM is a very large search space, and in this case, the learning process is reversible, and the performance of the reinforcement learning model is reduced due to the too long search action. In order to solve the problem of overlarge action space, action space compression is performed in the BIM optimization process in the section.
From linear algebra knowledge, it is known that performing an infinite number of row/column exchanges on a binary invertible matrix can be implemented using a plurality of row exchanges. As shown in equation (1), row-switching BIM may be performed by multiplying a transposed matrix M pre on the left side of BIM; column swapping a BIM can be performed by multiplying a transposed matrix M post on the right side of the BIM.
The binary invertible matrix satisfies the switching law and the combining law, and a series of row/column transformations can be equivalently implemented by using the row transformation. Therefore, the present study compresses the motion space into a set of only row-transformed motions. The transformation expression is as follows:
After the above-mentioned compression of the motion space, the motion space has been reduced by half. To optimize the action search space to a greater extent, the present study emphasizes the transformation of BIM into the exchange of the first row and the other rows, with a total of b-1 possible actions. The feasibility foundation of the design is that no matter which two rows of BIM are exchanged, the exchange of the first row and the other two rows can be completed, so that the optimization result of BIM can be ensured not to be influenced. At the same time, the design also adds a hold action NOP in the action space. In summary, the action space optimized by the BIM model is finally optimized into b actions, and if b=32, the number of action spaces is 32.
(3) Iterative optimization
The reinforcement learning model optimizes the address mapping strategy BIM by iteration. Firstly, each row of BIM is unfolded into a one-dimensional matrix to be used as input, and meanwhile, the row cache hit rate H of an initial BIM address mapping strategy is tested and used as the current optimal value H best of the model. Setting k iterations to finish BIM optimization, and carrying out new row hit rate test on each iteration optimization result, wherein if the row hit rate is higher than H best, the BIM is used as the optimal address mapping strategy. The BIM is iteratively optimized in this way. The row hit rate H best also increases with iteration. Address mapping strategy BIM iterative optimization process pseudocode is shown in fig. 4.
2. Model training
In the model training process, the policy network generates the next action a t at the current moment, and the BIM at the current moment is converted into the BIM at the next moment according to the action. After k cycles have elapsed, the policy network obtains a prize value r k=Hk. The maximum jackpot value may be achieved through reinforcement learning. And meanwhile, the address mapping strategy based on BIM with the highest row hit rate can be obtained.
The present invention uses the strategic gradient algorithm mentioned above to iterate the optimization model. The formula for the jackpot value is:
R t=γk+1rk formula (3)
Wherein, gamma is a break factor. The cost function V φ(BIMt) is used primarily to predict the jackpot value, by means of a strategy gradient to update the neural network containing the parameter phi.
The value network and the strategy network intermediate structure are composed of two full-connection layers. The difference is that the output of the value network is a numerical value that describes the predicted jackpot value. The formula for the benefit of an action is used to represent the benefit of an agent selecting this action in the current context versus a policy network to randomly select an action. The specific formula is as follows:
A t=Rt-Vt formula (4)
The maximized objective function is:
The strategy gradient is as follows:
The loss function of the value network is:
The gradient of the value network is:
In the network model, gradient values of parameters calculated by using a back propagation algorithm, lr π and lr v are divided into learning rates of a policy network and a value network (specific formulas refer to fig. 5), and are set to 0.001 in this project.
The invention updates the model parameters according to the Mini-batch method. In the experiment, batch was set to 64, which means that in one Batch, the policy network would make 64 iterative updates. The 64 iterations obtain experience pools (actions, rewards, etc.) that are used to update parameters in the model. However, the Mini-batch method saves all the input data, the calculation result and other data, resulting in serious storage overhead. To solve this problem, the gradients of one Batch are accumulated as parameter gradients in the experiment, and the accumulated gradients of the strategy network and the value network are g θ and g φ (see fig. 5 for specific formulas). An algorithm for training the reinforcement learning model using the Mini-batch method is shown in FIG. 5.
3. Workflow process
The overall process of iterative optimization training of the present invention is shown in fig. 6. The binary reversible matrix BIM one-dimensional expansion of 32 x 32 is used as the input of the strategy network and the value network, and the foreterm derivation is carried out on the strategy network and the value network. The policy network training derivation process determines whether the row hit rate case selection updates the BIM. And selecting actions by the model according to the probability to obtain candidate BIMs. When the calculated line cache hit rate of the BIM is higher than the current BIM, the reinforcement learning model system replaces the current BIM with the candidate BIM. Then, the new BIM line cache hit rate is recalculated, i.e. the reward value is calculated, and the parameters of the two networks are updated. The system is continuously iterated and optimized according to the process, and can converge to obtain a trained BIM strategy according to a set stopping rule. Meanwhile, the address mapping strategy with the highest row hit rate can be obtained, and can be transplanted to the hardware implementation in FPGA MIG IP.
The above is a preferred embodiment of the present invention, and all changes made according to the technical solution of the present invention belong to the protection scope of the present invention when the generated functional effects do not exceed the scope of the technical solution of the present invention.
Claims (3)
1. The method is characterized in that a binary reversible matrix BIM is used for representing an address mapping strategy, and the best line cache hit rate address mapping strategy is trained by combining a reinforcement learning model; the method comprises the following specific implementation modes: taking one-dimensional expansion of a binary reversible matrix BIM as input of a reinforcement learning model; taking the initial BIM line cache hit rate as the current optimal value H best of the reinforcement learning model; selecting actions by the reinforcement learning model according to the probability to obtain candidate BIMs; when the line cache hit rate obtained by calculating the candidate BIM is higher than that of the current BIM, the reinforcement learning model replaces the current BIM with the candidate BIM; then, recalculating the rewarding value and simultaneously updating the parameters of the reinforcement learning model; the reinforcement learning model is continuously iterated and optimized according to the process, and the trained BIM is obtained through convergence according to a preset stopping rule; simultaneously obtaining an address mapping strategy with highest line cache hit rate; the action space number of the reinforcement learning model consists of all possible row/column switching actions of the binary reversible matrix BIM, namelyWhere b is the row/column number of the binary invertible matrix; in order to solve the problem of overlarge action space of the reinforcement learning model, the action space number of the reinforcement learning model is compressed, and the method specifically comprises the following steps:
as shown in the following equation, the row-switching is performed on the BIM by multiplying a transposed matrix M pre on the left side of the BIM; column swapping BIM is performed by multiplying a transposed matrix M post on the right side of BIM:
BIM satisfies the exchange law and the combination law, and a series of line/column transformations are equivalently realized by using line transformation; thus, the motion space is compressed into a set of only row-transformed motions; the transformation expression is as follows:
After the compression of the motion space, the motion space is reduced by half; to optimize the motion search space to a greater extent, the transformation of the binary reversible matrix BIM is forced to be the exchange of the first row and the other rows, the total possible motion is b-1; meanwhile, adding the held action NOP in the action space; the number of action spaces of the reinforcement learning model is finally optimized into b actions.
2. The method for designing an address mapping strategy based on reinforcement learning according to claim 1, wherein the reinforcement learning model is composed of a strategy network and a value network; the strategy network consists of two cascaded full-connection layers, wherein a first layer in the strategy network takes a ReLU as an activation function, and the output of a second layer of the strategy network is connected with a Softmax function in a full-connection mode; in the training process of the reinforcement learning model, the strategy network generates the next action a t at the current moment, and BIM at the current moment is transformed according to the action to generate BIM at the next moment; after the preset iteration number k, the strategy network obtains a reward value r k=Hk,Hk as the line cache hit rate of the BIM after k iterations; the formula for the jackpot value is:
Rt=γk+1rk
wherein, gamma is a break factor;
The value network is composed of two fully connected layers as well as the strategy network intermediate structure, and the difference is that the output of the value network is a numerical value for describing the predicted jackpot value; the formula of the advantage of the action is used for expressing the advantage of selecting the reward value of the corresponding action in the current environment relative to the strategy network to randomly select the action; the specific formula is as follows:
At=Rt-Vt
Wherein A t is an dominance function, and V t is a return value estimated after selecting actions according to a strategy pi in the state of s t;
The maximized objective function is:
Wherein J (theta) is a maximized objective function, and the maximized J (theta) is used for continuously optimizing the parameter theta of the neural network model; pi θ is a strategy gradient algorithm, which is a strategy that parameterizes the strategy pi to pi θ, i.e., learns in the corresponding environment to maximize the jackpot value; BIM t represents the current binary invertible matrix;
The policy gradient, i.e. the bias that maximizes the objective function, is calculated as:
The loss function of the value network is:
The gradient of the value network is:
A cost function V φ(BIMt) for predicting a jackpot value, updating the neural network containing the parameter phi by means of a strategy gradient;
In the reinforcement learning model, gradient values of parameters calculated by a back propagation algorithm are used, and lr π and lr v are learning rates of a strategy network and a value network, respectively.
3. The method of claim 2, wherein the parameters of the reinforcement learning model are updated according to the Mini-Batch method, and the gradient of one Batch is accumulated as the parameter gradient, and the cumulative gradients of the policy network and the value network are g θ and g φ.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210714310.4A CN114942895B (en) | 2022-06-22 | 2022-06-22 | Address mapping strategy design method based on reinforcement learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210714310.4A CN114942895B (en) | 2022-06-22 | 2022-06-22 | Address mapping strategy design method based on reinforcement learning |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114942895A CN114942895A (en) | 2022-08-26 |
CN114942895B true CN114942895B (en) | 2024-06-04 |
Family
ID=82911016
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210714310.4A Active CN114942895B (en) | 2022-06-22 | 2022-06-22 | Address mapping strategy design method based on reinforcement learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114942895B (en) |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111858396A (en) * | 2020-07-27 | 2020-10-30 | 福州大学 | Memory self-adaptive address mapping method and system |
CN113568845A (en) * | 2021-07-29 | 2021-10-29 | 北京大学 | Memory address mapping method based on reinforcement learning |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10452533B2 (en) * | 2015-07-14 | 2019-10-22 | Western Digital Technologies, Inc. | Access network for address mapping in non-volatile memories |
-
2022
- 2022-06-22 CN CN202210714310.4A patent/CN114942895B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111858396A (en) * | 2020-07-27 | 2020-10-30 | 福州大学 | Memory self-adaptive address mapping method and system |
CN113568845A (en) * | 2021-07-29 | 2021-10-29 | 北京大学 | Memory address mapping method based on reinforcement learning |
Non-Patent Citations (1)
Title |
---|
面向图像转置和分块处理的一种高效内存访问策略;沈煌辉;王贞松;郑为民;;计算机研究与发展;20130115(第01期);第188-196页 * |
Also Published As
Publication number | Publication date |
---|---|
CN114942895A (en) | 2022-08-26 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109948029B (en) | Neural network self-adaptive depth Hash image searching method | |
JP7451008B2 (en) | Quantum circuit determination methods, devices, equipment and computer programs | |
CN113098714B (en) | Low-delay network slicing method based on reinforcement learning | |
CN112491818B (en) | Power grid transmission line defense method based on multi-agent deep reinforcement learning | |
CN109951392B (en) | Intelligent routing method for medium and large networks based on deep learning | |
CN112613608A (en) | Reinforced learning method and related device | |
CN117875397B (en) | Parameter selection method and device to be updated, computing equipment and storage medium | |
CN114942895B (en) | Address mapping strategy design method based on reinforcement learning | |
CN114254545A (en) | Parallel connection refrigerator system load control optimization method, system, equipment and medium | |
CN112131089B (en) | Software defect prediction method, classifier, computer device and storage medium | |
CN117520956A (en) | Two-stage automatic feature engineering method based on reinforcement learning and meta learning | |
CN116502779A (en) | Traveling merchant problem generation type solving method based on local attention mechanism | |
CN116185498A (en) | Integrated memory and calculation chip, and calculation method and device thereof | |
CN109582911A (en) | For carrying out the computing device of convolution and carrying out the calculation method of convolution | |
CN114444697A (en) | Knowledge graph-based common sense missing information multi-hop inference method | |
CN110766133B (en) | Data processing method, device, equipment and storage medium in embedded equipment | |
Tang et al. | Modeling and optimization of a class of networked evolutionary games with random entrance and time delays | |
Ventura | Quantum computational intelligence: answers and questions | |
CN116306948B (en) | Quantum information processing device and quantum information processing method | |
CN117492371B (en) | Optimization method, system and equipment for active power filter model predictive control | |
CN116151171B (en) | Full-connection I Xin Moxing annealing treatment circuit based on parallel tempering | |
Li et al. | A One-Shot Reparameterization Method for Reducing the Loss of Tile Pruning on DNNs | |
CN117775224A (en) | Marine hybrid power energy management method and system based on MOEAD algorithm | |
CN118014054B (en) | Mechanical arm multitask reinforcement learning method based on parallel recombination network | |
US11983606B2 (en) | Method and device for constructing quantum circuit of QRAM architecture, and method and device for parsing quantum address data |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |