CN113377884A - Event corpus purification method based on multi-agent reinforcement learning - Google Patents

Event corpus purification method based on multi-agent reinforcement learning Download PDF

Info

Publication number
CN113377884A
CN113377884A CN202110773927.9A CN202110773927A CN113377884A CN 113377884 A CN113377884 A CN 113377884A CN 202110773927 A CN202110773927 A CN 202110773927A CN 113377884 A CN113377884 A CN 113377884A
Authority
CN
China
Prior art keywords
data
training
agent
reinforcement learning
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110773927.9A
Other languages
Chinese (zh)
Other versions
CN113377884B (en
Inventor
后敬甲
王悦
白璐
崔丽欣
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Central university of finance and economics
Original Assignee
Central university of finance and economics
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Central university of finance and economics filed Critical Central university of finance and economics
Priority to CN202110773927.9A priority Critical patent/CN113377884B/en
Publication of CN113377884A publication Critical patent/CN113377884A/en
Application granted granted Critical
Publication of CN113377884B publication Critical patent/CN113377884B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/288Entity relationship models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Machine Translation (AREA)

Abstract

The invention relates to a multi-agent reinforcement learning-based event corpus purification method, which comprises the steps of initializing and resetting an environment and an agent before model training begins, and setting corresponding training parameters; the intelligent agent forms a series of data required by training by executing corresponding purification optimization actions in the environment, samples the data and stores the data in a data cache region for subsequent training; when the data quantity in the data cache region reaches a set value, training and updating the real networks of all the agents by using the data; after the real network is updated, updating the target networks of all the intelligent agents by a method of untimed parameter replication; and repeating the steps until the training times reach the preset training times. The method is based on the purification optimization of the labeled data, so that the problem of data label noise in the training process of the sequence labeling combined extraction model is solved, and the effect of the event entity relationship combined extraction task is improved.

Description

Event corpus purification method based on multi-agent reinforcement learning
Technical Field
The invention relates to the field of multi-agent reinforcement learning methods, in particular to an event corpus purification method based on multi-agent reinforcement learning.
Background
Reinforcement Learning (MARL) is a machine learning method, and can be divided into single-agent reinforcement learning and multi-agent reinforcement learning according to different numbers of agents, wherein the multi-agent reinforcement learning has wider application scenarios, and is a key tool for solving many real-world problems. In the multi-agent reinforcement learning, according to the difference of the task relationship of the agents, the method can be divided into the following steps: fully cooperative tasks, fully competing tasks, and hybrid tasks, where we consider only fully cooperative tasks.
In the multi-agent reinforcement learning training under the complete cooperative task, the agents take the maximization of the joint reward as a target, select actions according to own strategies, execute the actions to obtain corresponding rewards and feedback in the environment to update the own strategies, and circularly execute the steps until the joint reward value converges to the maximum value, and each agent achieves the optimal strategy under the current environment.
At present, the MADDPG (Multi-Agent Deep Deterministic Policy Gradient) algorithm is one of leading-edge reinforcement learning methods in a Multi-Agent environment, solves the problem that the traditional value-based algorithm (such as DQN) is difficult to apply in a continuous environment, simultaneously improves the training efficiency of the traditional strategy-based algorithm (DPG) by adding a Deep learning method, and further improves the training effect by introducing an experience playback pool and a training mechanism of 'centralized training and distributed execution'.
However, maddppg still suffers from poor exploratory and sub-optimal properties for the joint solution space, namely: in a multi-agent reinforcement learning environment, along with the increase of the number of agents, the size of the joint strategy space is exponentially increased, so that the exploration completion degree of the agent to the strategy space is reduced in the training process, the training result tends to converge to a global suboptimal solution, and a better training effect cannot be achieved.
Entity-related extraction refers to the simultaneous detection of entity references from unstructured text and identification of semantic relationships between them. Traditional entity relationship extraction methods handle this task in a serial fashion, i.e., extracting entities first, and then identifying their relationships. The serial processing mode is simple, the two subtasks are independent and flexible, each subtask corresponds to one sub model, but the correlation between the two subtasks is also ignored.
The entity relationship joint extraction combines entity identification and relationship extraction by using a single model, can effectively integrate entity information and relationship information, achieves better effect compared with a serial entity relationship extraction method, but the entity and relationship are required to be extracted respectively in essence, so that the model can generate additional redundant information.
In order to solve the problem that an entity-relationship combined extraction model generates additional redundant information, a research proposes that a combined extraction task is converted into a label problem, and entities and relationships thereof are directly extracted by using a sequence labeling model through establishing labels with relationship information, without independently identifying the entities and the relationships.
The sequence labeling joint extraction model is an efficient event joint extraction model, but a large amount of high-quality labeling data is needed in the training process, and the automatic labeling of the data can be effectively realized by a remote supervision method, but the remote supervision method assumes that: if two entities have a relationship in a given corpus, all sentences containing both entities will refer to the relationship. This results in a large set of labeled data from the remote surveillance method, which is problematic in terms of label noise that can adversely affect the joint extraction model.
Disclosure of Invention
Aiming at the technical problems, the invention provides an event corpus purification method based on multi-agent reinforcement learning.
In order to solve the problems in the prior art, the invention provides an event corpus purification method based on multi-agent reinforcement learning, which comprises the following steps:
before the model training begins, the environment and the intelligent agent need to be initialized and reset, and corresponding training parameters are set;
the intelligent agent forms a series of data required by training by executing corresponding purification optimization actions in the environment, samples the data and stores the data in a data cache region for subsequent training;
when the data quantity in the data cache region reaches a set value, training and updating the real networks of all the agents by using the data;
after the real network is updated, updating the target networks of all the intelligent agents by a method of untimed parameter replication;
and repeating the steps until the training times reach the preset training times.
Preferably, before the model training is started, the environment and the agent need to be initialized and reset, and setting the corresponding training parameters specifically includes: and performing data preprocessing on the event corpus, and inputting the corpus as an environment parameter of the multi-agent reinforcement learning model.
Preferably, the agent forms a series of data required for training by performing corresponding refining optimization actions in the environment, samples the data and stores the data in a data buffer, so as to be used for subsequent training, specifically including:
the multi-agent reinforcement learning model generates an action set of the agent group according to the input environment parameters;
the intelligent agent group executes the action set, selects corresponding event knowledge from the corpus and forms an event knowledge set;
mapping the event knowledge set into a word vector, and inputting the word vector into a sequence labeling joint model;
and the sequence labeling combined model labels the input word vectors and compares the input word vectors with the test set so as to verify the event purification effect of the current multi-agent reinforced model and output an evaluation index.
Preferably, when the number of data in the data buffer reaches a set value, starting to train and update the real networks of all the agents using the data specifically includes:
and converting the evaluation index into a reward value according to a preset reward function, and feeding the reward value back to the training of the multi-agent reinforcement learning model so as to optimize the model.
Preferably, after the updating of the real network is completed, the updating of the target networks of all the agents by the sporadic parameter replication method further includes:
and extracting network parameters of each layer of each agent as parameter vectors, subtracting the parameter vectors of each layer one by one to obtain pairwise parameter vector differences among the multiple agents, multiplying the parameter vector differences by differentiation factors, and feeding back the multiplied parameter vector differences to the updated agents to finish the final updating of the agents.
Compared with the prior art, the event corpus purification method based on multi-agent reinforcement learning has the following beneficial effects:
1. based on the multi-agent reinforcement learning environment, the research on the exploration degree of the joint strategy space is improved, the training effect of the multi-agent reinforcement learning algorithm is improved, and a multi-agent reinforcement learning model is optimized;
2. the invention extracts the parameters in each intelligent agent sub-network, forms the parameter vector to represent the strategy of the intelligent agent, reduces the repeatability of strategy exploration among the intelligent agents and promotes the exploration degree of a joint strategy solution space by maximizing the difference among the parameter vectors;
3. the invention is based on the optimized multi-agent reinforcement learning model, and the labeled data is purified and optimized, so that the problem of data label noise in the training process of the sequence labeling combined extraction model is solved, and the effect of the event entity relationship combined extraction task is improved.
Drawings
Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the invention. Also, like reference numerals are used to refer to like parts throughout the drawings. In the drawings:
fig. 1 is a flowchart of an event corpus purification method based on multi-agent reinforcement learning according to an embodiment of the present invention.
Fig. 2 is a training flowchart of an event corpus purification method based on multi-agent reinforcement learning according to an embodiment of the present invention.
Fig. 3 is a flowchart of a data sampling portion of a multi-agent reinforcement learning-based event corpus purification method according to an embodiment of the present invention.
Fig. 4 is a network updating flow chart of the multi-agent reinforcement learning-based event corpus purification method according to the embodiment of the present invention.
Fig. 5 is a schematic structural diagram of a sequence annotation model of an event corpus purification method based on multi-agent reinforcement learning according to an embodiment of the present invention.
FIG. 6 is another flowchart of an event corpus refining method based on multi-agent reinforcement learning according to an embodiment of the present invention.
Detailed Description
Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
As shown in FIG. 1, the present invention provides a method for purifying an event corpus based on multi-agent reinforcement learning, which comprises
S1, before model training begins, the environment and the intelligent agent need to be initialized and reset, and corresponding training parameters are set;
s2, the agent forms a series of data needed by training by executing corresponding purification optimization actions in the environment, samples the data and stores the data in a data buffer area for subsequent training;
s3, when the number of data in the data buffer area reaches a set value, training and updating the real network of all the agents by using the data;
s4, after the real network is updated, updating the target networks of all the intelligent agents by a method of irregular parameter replication;
and S5, repeating the steps until the training times reach the preset training times.
The invention provides an event corpus purification method based on multi-agent reinforcement learning, which mainly comprises the following steps: the method comprises four parts of training environment and parameter initialization, data sampling, real network training and target network updating.
The invention mainly consists of two models, namely: the method comprises a multi-agent reinforcement learning model based on a neural network parameter vector difference maximization strategy search method and a sequence labeling combined model based on a Bi-LSTM-CRF structure.
In the invention, the sequence labeling module is used as an effect verification and reward feedback part of a corpus purification model, and the selected model structure is shown in figure 5 and mainly comprises two layers, namely: a Bi-LSTM layer and a CRF layer.
The Bi-LSTM model has excellent performance in a sequence labeling task, long-term context information can be effectively combined and utilized, and meanwhile, the fitting capacity of a neural network to nonlinear data is also achieved, but as the optimization target is to find labels with the maximum probability at each moment, and then the labels form a sequence, the phenomenon that the output of the model to the labeled sequence is discontinuous is often caused.
The CRF model can be well complemented with the advantages and disadvantages of the Bi-LSTM model to a certain extent, the CRF model has the advantage that the whole input text can be scanned through the feature template, so that more consideration is given to linear weighted combination of local features of the whole text, and the optimization target of the CRF model is a sequence with the highest occurrence probability rather than a label with the highest occurrence probability at each position of the sequence. The CRF model has the disadvantages that firstly, the selection of the characteristic template needs to have certain prior knowledge on the training corpus, the characteristics which have important influence on the labeling need to be analyzed from the statistical data of the related information in the corpus, the overfitting of the model can be caused by too much number of the characteristics, the under-fitting of the model can be caused by too little number of the characteristics, and the combination of the characteristics is a difficult task; secondly, the CRF model is limited by the window size specified by the feature template during the training process, and long-term context information is difficult to consider.
Based on the complementary advantages and disadvantages of the two models, the Bi-LSTM-CRF model combining the two models is selected, namely a linear CRF layer is added on a hidden layer of the traditional Bi-LSTM model to be used as a sequence labeling module in the invention to verify the training effect of a corpus purification model, and the training result is fed back to the training of the corpus purification model again to optimize the model.
As shown in fig. 5, before the model training starts, the environment and the agent need to be initialized and reset, and the setting of the corresponding training parameters specifically includes: and performing data preprocessing on the event corpus, and inputting the corpus as an environment parameter of the multi-agent reinforcement learning model.
As shown in fig. 6, the agent forms a series of data required for training by performing corresponding refinement optimization actions in the environment, samples the data and stores the data in a data buffer, so as to be used for subsequent training, specifically including:
the multi-agent reinforcement learning model generates an action set of the agent group according to the input environment parameters;
the intelligent agent group executes the action set, selects corresponding event knowledge from the corpus and forms an event knowledge set;
mapping the event knowledge set into a word vector, and inputting the word vector into a sequence labeling joint model;
and the sequence labeling combined model labels the input word vectors and compares the input word vectors with the test set so as to verify the event purification effect of the current multi-agent reinforced model and output an evaluation index.
As shown in fig. 6, when the number of data in the data buffer reaches a set value, the starting of training and updating the real network of all the agents using the data specifically includes:
and converting the evaluation index into a reward value according to a preset reward function, and feeding the reward value back to the training of the multi-agent reinforcement learning model so as to optimize the model.
Data sampling and agent network updates are detailed as follows:
as shown in fig. 2, the detailed steps of data sampling are as follows:
step 1-1: initializing sampling process parameters: the maximum data storage capacity max-epsilon-length, the sampled and stored data quantity t is 1;
step 1-2: acquiring a state X of a current environment, wherein the X is a vector formed by a series of environment parameters;
step 1-3: each Agent i takes the environment state X as input, generates An action Ai through the network operation of a real Actor in the Agent i, and all actions selected by agents form An action group A (A1, A2, … and An);
step 1-4: all agents perform respective actions in the current environment, namely: in the environment state X, executing the action group A (A1, A2, …, An), obtaining a new environment state X' and simultaneously obtaining a joint reward value R;
step 1-5: obtaining a complete data tuple (X, A, R, X') and storing the complete data tuple in a data cache pool D;
step 1-6: updating the current environmental state: x' > X;
step 1-7: executing the steps until the data replacement amount in the data cache pool D reaches the maximum data storage amount, namely: and when t is greater than max-epsilon-length, finishing data sampling and starting learning.
As shown in fig. 3, the detailed steps of the agent network update are as follows:
the following operations are performed for each pair of all Agent agents i:
step 2-1: randomly sampling a data tuple (X, A, R, X') of minipatch from a data cache pool D, wherein the size of the minipatch can be set autonomously;
step 2-2: calculating a target Q value according to the randomly sampled data tuples;
step 2-3: updating a real criticic network of the Agent i in a mode of minimizing a loss function, and calculating the loss function by taking an actual Q value and a target Q value as factors;
step 2-4: updating a real Actor network of the Agent i in a gradient descending manner, and calculating a strategy gradient of the model network;
step 2-5: parameter vectors of an Actor network and a Critic network of the Agent i are respectively extracted and recorded as: mi and Ni;
step 2-6: and (3) making a difference between the parameter vector of Agent i and the parameter vector of Agent (i-1), and recording as: Sub-Mi and Sub-Ni;
step 2-7: multiplying the Sub-Mi and the Sub-Ni by a differentiation factor beta, and feeding back and updating the original network respectively;
step 2-8: the steps are circulated until all the agents finish the updating of the real network;
step 2-9: and updating the target networks of all the agents in a soft updating mode, namely: the parameters of the real network are periodically copied into the target network.
The target Q value is:
Figure BDA0003153522780000081
wherein X is an environmental state characterizing parameter, aiIs an action, Q is a Q-valued computation function whose parameters are x and aiR refers to the reward value R, γ refers to the decay factor;
the loss function is:
Figure BDA0003153522780000082
s is the total number of agents in the environment, yjIs the agent target Q value, QuIs the actual Q value of the agent;
the strategy gradient is:
Figure BDA0003153522780000083
mu is the agent policy, and sigma is the policy network input parameter;
periodically copying the parameters of the real network into the target network by adopting the following formula:
θ′i←τθi+(1-τ)θ′i
theta refers to the network parameter, and tau refers to the coefficient of parameter replication at the time of network update.
The sequence labeling joint extraction model is an efficient event entity relationship joint extraction model, but a large amount of high-quality labeling data is needed in the training process, and although the amount of the labeled data can be effectively increased through a remote supervision method, the labeled data set generated by the sequence labeling joint extraction model has the problem of label noise, and the label noise can have adverse effects on the model. Aiming at the problem, the invention carries out purification and optimization on the labeled data based on the improved multi-agent reinforcement learning model, thereby solving the problem of data label noise in the training process of the sequence labeling combined extraction model, and improving the effect of the event entity relationship combined extraction task.
The embodiment of the invention provides an event corpus purification method based on multi-agent reinforcement learning. On the basis of the original MADDPG training process, after the strategy of the intelligent agent is updated, network parameters of each layer of each intelligent agent are extracted to be used as parameter vectors, then the parameter vectors of each layer are subtracted one by one to obtain pairwise parameter vector differences among the multiple intelligent agents, and then the parameter vector differences are multiplied by differentiation factors to be fed back to the updated intelligent agent, so that the final updating of the intelligent agent is completed. By the method of maximizing the vector difference of the neural network parameters, the exploration degree of the intelligent agent on the joint strategy space in the training process is expanded, so that the training result further approaches to the global optimal solution.
Multi-agent reinforcement learning (MARL) is a key tool to solve many real-world problems, while reinforcement learning algorithms in a multi-agent environment face typical problems: with the increase of the number of the agents, the solution space of the joint strategy is exponentially increased, so that poor strategy space exploratory property and strategy suboptimal property which are difficult to avoid by the algorithm are caused. Through the research on the strategy space exploration method, the exploration efficiency of the intelligent agent on the joint strategy solution space is optimized, the exploration degree on the joint strategy solution space is increased, the joint strategy solution space further tends to be covered by the full strategy solution space, and therefore the current optimal strategy is closer to the global optimal solution.
The intelligent agent group is independent to the exploration of the strategy solution space, and the random exploration process cannot avoid the repeated coverage of the strategy solution space, so that the exploration efficiency is reduced to a certain degree. The invention provides a strategy exploration method for maximizing the difference of neural network parameter vectors, which extracts the parameter vectors of each agent forming the neural network, combines the exploration of agent groups on strategy solution spaces, and avoids repeated exploration on the strategy solution spaces to a certain extent by maximizing the difference of the agent parameter vectors, thereby improving the exploration degree of the combined strategy solution spaces, further tending to the coverage of the full strategy solution spaces, further improving the training effect compared with the original algorithm and improving the model.
It should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and that those skilled in the art will be able to design alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The invention may be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The usage of the words first, second and third, etcetera do not indicate any ordering. These words may be interpreted as names.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is intended to include such modifications and variations.

Claims (5)

1. An event corpus purification method based on multi-agent reinforcement learning is characterized by comprising the following steps: comprises that
Before the model training begins, the environment and the intelligent agent need to be initialized and reset, and corresponding training parameters are set;
the intelligent agent forms a series of data required by training by executing corresponding purification optimization actions in the environment, samples the data and stores the data in a data cache region for subsequent training;
when the data quantity in the data cache region reaches a set value, training and updating the real networks of all the agents by using the data;
after the real network is updated, updating the target networks of all the intelligent agents by a method of untimed parameter replication;
and repeating the steps until the training times reach the preset training times.
2. The method for refining the multi-agent reinforcement learning-based event corpus as claimed in claim 1, wherein the initialization reset of the environment and the agents is required before the model training starts, and the setting of the corresponding training parameters specifically comprises: and performing data preprocessing on the event corpus, and inputting the corpus as an environment parameter of the multi-agent reinforcement learning model.
3. The multi-agent reinforcement learning-based event corpus refining method as claimed in claim 1, wherein the agent forms a series of data required for training by performing corresponding refining optimization actions in the environment, samples the data and stores the data in the data buffer for subsequent training specifically comprises:
the multi-agent reinforcement learning model generates an action set of the agent group according to the input environment parameters;
the intelligent agent group executes the action set, selects corresponding event knowledge from the corpus and forms an event knowledge set;
mapping the event knowledge set into a word vector, and inputting the word vector into a sequence labeling joint model;
and the sequence labeling combined model labels the input word vectors and compares the input word vectors with the test set so as to verify the event purification effect of the current multi-agent reinforced model and output an evaluation index.
4. The method for refining the multi-agent reinforcement learning-based event corpus as claimed in claim 1, wherein said starting to train and update the real networks of all agents using the data when the number of data in the data buffer reaches a set value specifically comprises:
and converting the evaluation index into a reward value according to a preset reward function, and feeding the reward value back to the training of the multi-agent reinforcement learning model so as to optimize the model.
5. The method for refining event corpus based on multi-agent reinforcement learning according to claim 1, wherein after updating the real network, updating the target networks of all agents by sporadic parameter replication method further comprises:
and extracting network parameters of each layer of each agent as parameter vectors, subtracting the parameter vectors of each layer one by one to obtain pairwise parameter vector differences among the multiple agents, multiplying the parameter vector differences by differentiation factors, and feeding back the multiplied parameter vector differences to the updated agents to finish the final updating of the agents.
CN202110773927.9A 2021-07-08 2021-07-08 Event corpus purification method based on multi-agent reinforcement learning Active CN113377884B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110773927.9A CN113377884B (en) 2021-07-08 2021-07-08 Event corpus purification method based on multi-agent reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110773927.9A CN113377884B (en) 2021-07-08 2021-07-08 Event corpus purification method based on multi-agent reinforcement learning

Publications (2)

Publication Number Publication Date
CN113377884A true CN113377884A (en) 2021-09-10
CN113377884B CN113377884B (en) 2023-06-27

Family

ID=77581381

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110773927.9A Active CN113377884B (en) 2021-07-08 2021-07-08 Event corpus purification method based on multi-agent reinforcement learning

Country Status (1)

Country Link
CN (1) CN113377884B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114897168A (en) * 2022-06-20 2022-08-12 支付宝(杭州)信息技术有限公司 Fusion training method and system of wind control model based on knowledge representation learning

Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109947954A (en) * 2018-07-09 2019-06-28 北京邮电大学 Multitask coordinated recognition methods and system
CN109978176A (en) * 2019-03-05 2019-07-05 华南理工大学 A kind of multiple agent cooperative learning methods based on state dynamic sensing
CN110008332A (en) * 2019-02-13 2019-07-12 阿里巴巴集团控股有限公司 The method and device of trunk word is extracted by intensified learning
CN110110086A (en) * 2019-05-13 2019-08-09 湖南星汉数智科技有限公司 A kind of Chinese Semantic Role Labeling method, apparatus, computer installation and computer readable storage medium
CN110807069A (en) * 2019-10-23 2020-02-18 华侨大学 Entity relationship joint extraction model construction method based on reinforcement learning algorithm
JP2020046792A (en) * 2018-09-18 2020-03-26 Zホールディングス株式会社 Information processor, information processing method and program
CN110990590A (en) * 2019-12-20 2020-04-10 北京大学 Dynamic financial knowledge map construction method based on reinforcement learning and transfer learning
CN111160035A (en) * 2019-12-31 2020-05-15 北京明朝万达科技股份有限公司 Text corpus processing method and device
CN111312354A (en) * 2020-02-10 2020-06-19 东华大学 Breast medical record entity identification and annotation enhancement system based on multi-agent reinforcement learning
CN111382575A (en) * 2020-03-19 2020-07-07 电子科技大学 Event extraction method based on joint labeling and entity semantic information
CN112487811A (en) * 2020-10-21 2021-03-12 上海旻浦科技有限公司 Cascading information extraction system and method based on reinforcement learning
CN112541339A (en) * 2020-08-20 2021-03-23 同济大学 Knowledge extraction method based on random forest and sequence labeling model
CN112801290A (en) * 2021-02-26 2021-05-14 中国人民解放军陆军工程大学 Multi-agent deep reinforcement learning method, system and application

Patent Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109947954A (en) * 2018-07-09 2019-06-28 北京邮电大学 Multitask coordinated recognition methods and system
JP2020046792A (en) * 2018-09-18 2020-03-26 Zホールディングス株式会社 Information processor, information processing method and program
CN110008332A (en) * 2019-02-13 2019-07-12 阿里巴巴集团控股有限公司 The method and device of trunk word is extracted by intensified learning
CN109978176A (en) * 2019-03-05 2019-07-05 华南理工大学 A kind of multiple agent cooperative learning methods based on state dynamic sensing
CN110110086A (en) * 2019-05-13 2019-08-09 湖南星汉数智科技有限公司 A kind of Chinese Semantic Role Labeling method, apparatus, computer installation and computer readable storage medium
CN110807069A (en) * 2019-10-23 2020-02-18 华侨大学 Entity relationship joint extraction model construction method based on reinforcement learning algorithm
CN110990590A (en) * 2019-12-20 2020-04-10 北京大学 Dynamic financial knowledge map construction method based on reinforcement learning and transfer learning
CN111160035A (en) * 2019-12-31 2020-05-15 北京明朝万达科技股份有限公司 Text corpus processing method and device
CN111312354A (en) * 2020-02-10 2020-06-19 东华大学 Breast medical record entity identification and annotation enhancement system based on multi-agent reinforcement learning
CN111382575A (en) * 2020-03-19 2020-07-07 电子科技大学 Event extraction method based on joint labeling and entity semantic information
CN112541339A (en) * 2020-08-20 2021-03-23 同济大学 Knowledge extraction method based on random forest and sequence labeling model
CN112487811A (en) * 2020-10-21 2021-03-12 上海旻浦科技有限公司 Cascading information extraction system and method based on reinforcement learning
CN112801290A (en) * 2021-02-26 2021-05-14 中国人民解放军陆军工程大学 Multi-agent deep reinforcement learning method, system and application

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
方宝富 等: "稀疏奖励下基于情感的异构多智能体强化学习", 《模式识别与人工智能》, vol. 34, no. 03, pages 223 - 231 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114897168A (en) * 2022-06-20 2022-08-12 支付宝(杭州)信息技术有限公司 Fusion training method and system of wind control model based on knowledge representation learning

Also Published As

Publication number Publication date
CN113377884B (en) 2023-06-27

Similar Documents

Publication Publication Date Title
Kurach et al. Neural random-access machines
Lin et al. An efficient deep reinforcement learning model for urban traffic control
CN113469186B (en) Cross-domain migration image segmentation method based on small number of point labels
CN113657561A (en) Semi-supervised night image classification method based on multi-task decoupling learning
CN113139651A (en) Training method and device of label proportion learning model based on self-supervision learning
CN109558898B (en) Multi-choice learning method with high confidence based on deep neural network
CN116089883B (en) Training method for improving classification degree of new and old categories in existing category increment learning
CN114463605A (en) Continuous learning image classification method and device based on deep learning
CN113742488A (en) Embedded knowledge graph completion method and device based on multitask learning
CN113537365A (en) Multitask learning self-adaptive balancing method based on information entropy dynamic weighting
JP6230987B2 (en) Language model creation device, language model creation method, program, and recording medium
US20240127087A1 (en) Machine learning knowledge management based on lifelong boosting in presence of less data
CN113377884A (en) Event corpus purification method based on multi-agent reinforcement learning
CN113095229A (en) Unsupervised domain self-adaptive pedestrian re-identification system and method
CN108829846A (en) A kind of business recommended platform data cluster optimization system and method based on user characteristics
US11875250B1 (en) Deep neural networks with semantically weighted loss functions
CN113590748B (en) Emotion classification continuous learning method based on iterative network combination and storage medium
CN110334395A (en) The satellite momentum wheel fault diagnosis method and system of initialization EM algorithm based on JADE
CN110443344B (en) Momentum wheel fault diagnosis method and device based on K2ABC algorithm
Lyu et al. Elastic Multi-Gradient Descent for Parallel Continual Learning
JP2019133496A (en) Content feature quantity extracting apparatus, method, and program
CN111753995B (en) Local interpretable method based on gradient lifting tree
US20220405599A1 (en) Automated design of architectures of artificial neural networks
CN116958695A (en) Target domain information guiding-based transfer learning image classification method and device
CN116030502A (en) Pedestrian re-recognition method and device based on unsupervised learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant