CN117332784A

CN117332784A - Intelligent knowledge enhancement method based on hierarchical graph attention and dynamic meta-learning

Info

Publication number: CN117332784A
Application number: CN202311278469.7A
Authority: CN
Inventors: 屠静; 王亚; 苏岳; 万晶晶; 李伟伟
Original assignee: Zhuoshi Future Beijing technology Co ltd
Current assignee: Zhuoshi Future Beijing technology Co ltd
Priority date: 2023-09-28
Filing date: 2023-09-28
Publication date: 2024-01-02

Abstract

The invention provides an intelligent knowledge enhancement method based on hierarchical graph attention and dynamic element learning, which comprises the following steps: s101, performing data preprocessing to obtain structured data; s102, training an adaptive transfer learning model based on the structured data; s103, performing feature extraction by adopting a self-adaptive transfer learning model after training; s104, constructing a plurality of hierarchical graph attention networks based on different feature extraction results according to different subtasks, and optimizing the hierarchical graph attention networks by adopting a dynamic element learning algorithm; s105, outputting results of different subtasks is achieved by adopting the optimized hierarchical graph attention network, and the method has three aspects of accurately capturing a complex knowledge structure, flexibly achieving inter-domain knowledge migration and self-adaption and efficiently responding to dynamic environment changes, so that the method has wide application prospect and practical value in the field of intelligent knowledge enhancement, and provides a powerful and flexible tool for solving practical problems.

Description

Intelligent knowledge enhancement method based on hierarchical graph attention and dynamic meta-learning

Technical Field

The invention relates to the technical field of knowledge enhancement learning, in particular to an intelligent knowledge enhancement method based on hierarchical diagram attention and dynamic element learning.

Background

Today, with the rapid development of big data and artificial intelligence technology, knowledge acquisition, representation and enhancement are becoming important research directions. In particular, in the fields of medicine, finance, education, etc., how to accurately and efficiently extract knowledge from huge data sources, and how to enhance and optimize the knowledge through intelligent algorithms are currently important challenges. The Graphic Neural Network (GNN), transfer learning, meta learning, which are currently employed alone, generally have the following problems when enhancing and optimizing these knowledge: (1) Graph Neural Network (GNN): the graphic neural network is a powerful structured data learning framework which emerges in recent years. The method can capture complex relations between objects, but the traditional GNN lacks support for a hierarchical structure, and some methods learn relations and entity representations in a knowledge graph by using the GNN, but usually only pay attention to a static graph structure, and neglect the hierarchy and the dynamics of knowledge; (2) transfer learning: migration learning aims at migrating knowledge in one field to another field, reduces the workload of manual feature engineering, and although some field-adaptive migration learning schemes exist, many existing methods cannot adapt well to the characteristics of different fields, and mainly focus on specific tasks, such as image classification or text analysis, and lack versatility and expandability; (3) meta learning: meta learning is a method of learning how to learn. While it can provide flexibility between tasks, conventional meta-learning schemes often lack adaptability to dynamic environments and are often complex, require a large number of manual adjustments, and are not directly combined with knowledge representation and enhancement.

Disclosure of Invention

The invention aims to provide an intelligent knowledge enhancement method based on hierarchical graph attention and dynamic element learning, so as to solve the problems in the background technology.

The invention is realized by the following technical scheme: an intelligent knowledge enhancement method based on hierarchical graph attention and dynamic meta learning, the method comprising the steps of:

s101, acquiring general original data in any main field, and preprocessing the data to obtain structured data;

s102, training an adaptive transfer learning model based on the structured data;

s103, performing feature extraction by adopting a self-adaptive transfer learning model after training based on subtasks related to the main field;

s104, constructing a plurality of hierarchical graph attention networks based on different feature extraction results according to different subtasks, and optimizing the hierarchical graph attention networks by adopting a dynamic element learning algorithm;

s105, outputting results of different subtasks by adopting the optimized hierarchical graph attention network.

Optionally, the data preprocessing process includes: inputting the relevant original data into a named entity recognition model, and changing the relevant original data into a structural representation by the named entity recognition model, wherein the named entity recognition model comprises an embedding layer, a self-attention mechanism layer, a multi-head attention layer and an output layer.

Optionally, the adaptive migration learning model includes a Bert language pre-training layer, a bidirectional long and short term memory network BiLSTM layer, a conditional random field CRF layer, and a migration learning module, where the Bert language pre-training model is used to vectorize the structured data and convert the structured data into a machine-readable form, the bidirectional long and short term memory network BiLSTM layer is used to further process the vectorized data and extract vectorized feature data, the conditional random field CRF layer is used to decode an output result of the bidirectional long and short term memory network BiLSTM to obtain a prediction labeling sequence, and the migration learning module is used to migrate parameters of the adaptive migration learning model trained based on general raw data to a new model in a specific target field.

Optionally, based on subtasks related to the main field, the feature extraction is performed by adopting a self-adaptive transfer learning model after training, which specifically comprises:

based on a subtask related to a main field, obtaining structured data of a specific target field related to the subtask, and constructing a deep migration learning model based on the target field;

migrating training parameters of a Bert language pre-training layer in the self-adaptive migration learning model to the target domain-based deep migration learning model to perform word embedding on the input structured data of a specific target domain to obtain each word vector in all sentences;

transferring training parameters of a BiLSTM layer in the self-adaptive transfer learning model to a deep transfer learning model based on a target domain, and then inputting the word vector into the deep transfer learning model based on the target domain for training;

and transferring training parameters of a conditional random field CRF layer in the self-adaptive transfer learning model to a target domain-based deep transfer learning model, and decoding an output result of the target domain by the target domain deep transfer learning model to obtain a characteristic output result.

Optionally, for each subtask, a hierarchical graph attention network related to the subtask is constructed, wherein nodes in the hierarchical graph attention network represent different object entities in the subtask, and the characteristics output by the deep migration learning model based on the target domain are used as attributes of the nodes in the corresponding hierarchical graph attention network.

Optionally, the dynamic meta-learning algorithm adopts a meta-learning scheme based on FOMAML.

Optionally, in the FOMAML-based meta-learning scheme, parameters of the hierarchical attention network are dynamically adjusted according to the loss functions of different subtasks.

Compared with the prior art, the invention has the following beneficial effects:

the invention provides an intelligent knowledge enhancement method based on hierarchical graph attention and dynamic element learning, which comprises the steps of 1. Capturing and representing the hierarchical structure and complexity of knowledge by adopting a hierarchical graph attention network. Compared with the simple relation mining in the prior art, the method can draw a richer and more accurate knowledge graph. The importance of different layers and relations is reasonably measured by introducing an attention mechanism, so that a real-world complex knowledge structure is reflected more accurately;

2. more flexible cross-domain migration capability: through BERT-based field self-adaptive migration learning, the invention realizes flexible migration and self-adaptation among different fields. The existing migration learning method is poor in effect when large differences exist between the source field and the target field, the characteristic of the embodiment of the application is beneficial to breaking barriers among the fields and promoting knowledge sharing and integration among different fields, and therefore the universality and expansibility of the model are improved; 3. more efficient dynamic adaptation: the meta learning scheme based on FOMAML can be adopted to quickly adapt to the change of task demands and environmental conditions. Compared with the existing static model and complex meta-learning scheme, the scheme not only enables the model to respond to changes in real time, but also reduces the calculation and storage requirements. The method provides an effective solution for maintaining the real-time performance and accuracy of the model in a dynamic and continuously-changing environment, and the advantages enable the method to have wide application prospect and practical value in the field of intelligent knowledge enhancement, and provide a powerful and flexible tool for solving the practical problem.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, it being obvious that the drawings in the following description are only preferred embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flow chart of an intelligent knowledge enhancement method based on hierarchical attention and dynamic meta-learning.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, exemplary embodiments according to the present invention will be described in detail with reference to the accompanying drawings. It should be apparent that the described embodiments are only some embodiments of the present invention and not all embodiments of the present invention, and it should be understood that the present invention is not limited by the example embodiments described herein. Based on the embodiments of the invention described in the present application, all other embodiments that a person skilled in the art would have without inventive effort shall fall within the scope of the invention.

In the following description, numerous specific details are set forth in order to provide a more thorough understanding of the present invention. It will be apparent, however, to one skilled in the art that the invention may be practiced without one or more of these details. In other instances, well-known features have not been described in detail in order to avoid obscuring the invention.

It should be understood that the present invention may be embodied in various forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the invention to those skilled in the art.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. As used herein, the term "and/or" includes any and all combinations of the associated listed items.

In order to provide a thorough understanding of the present invention, detailed structures will be presented in the following description in order to illustrate the technical solutions presented by the present invention. Alternative embodiments of the invention are described in detail below, however, the invention may have other implementations in addition to these detailed descriptions.

Referring to fig. 1, a method for intelligent knowledge enhancement based on hierarchical attention and dynamic meta learning, the method comprising the steps of:

specifically, the data preprocessing process includes: inputting relevant original data into a named entity recognition model, and changing the relevant original data into a structural representation by the named entity recognition model, wherein the named entity recognition model comprises an embedded layer, a self-attention mechanism layer, a multi-head attention layer and an output layer, and the core of the self-attention mechanism layer is to capture the dependency relationship inside a sequence by calculating the relationship between each element and other elements in an input sequence. It relates to the following formula:

Attention(Q,K,V)＝softmax(QK^T/sqrt(d_k))*V

q, K, V represent the query, key and value, respectively, and d_k is the dimension of the key. The softmax function is used to convert the attention score into a probability distribution. The formula describes determining weights for different values by query and key similarity calculations, allowing the model to give different attention to different parts of the input sequence.

While multi-headed self-attention means that the self-attention process described above does not proceed only once, but rather in parallel multiple times, each "head" focusing on different information. Each header has its own weight matrix of queries, keys and values that capture information in different aspects.

Through a multi-headed self-attention mechanism, the model can understand complex relationships between individual elements in the input sequence. This is particularly important for Named Entity Recognition (NER) because the recognition of an entity often depends on its context. For example, a word may have different meanings in different contexts, and multiple self-attention can effectively capture these dependencies, improving the accuracy of NER.

Illustratively, the specific steps of the named entity recognition model to transform the relevant raw data into a structured representation are as follows:

(1) Text cleaning: removing irrelevant symbols, punctuations, spaces, etc. in the text, converting the text into a normalized form, setting one sample in the original data set D as D, and cleaning the sample as D ', wherein the process can be expressed as D' =clean (D), and Clean represents a cleaning function;

(2) Word segmentation: dividing the text into words or phrases, taking d 'as an input unit of the model, taking a cleaned sample as a w, and taking the segmented sample as w, wherein the process can be expressed as w=token (d'), and the token represents a word segmentation function;

(3) Word drying and word shape reduction: converting words into basic forms thereof to reduce the number of words required to be processed by a model, setting w as a word-segmented sample, and setting a word-Stem-shaped sample as s, wherein the process can be expressed as s=stem (w), wherein Stem represents a word drying function;

(4) Part-of-speech tagging and word sense disambiguation: determining the part of speech of each word and solving the ambiguity of the word sense according to the context, setting s as a sample after word stem, and setting a sample after part of speech tagging and word sense disambiguation as p, wherein the process can be expressed as p=pos(s), and POS represents a part of speech tagging and word sense disambiguation function;

constructing a word vector: converting each word or phrase into a vector in high dimensional space allows the model to process the text data, assuming p as a part-of-speech tagging and word sense disambiguation sample, and v as a word vectorized sample, which can be expressed as v=vector (p), where vector represents a function that constructs a word vector. In summary, through the above 5 steps, the named entity recognition model changes the relevant raw data into a structured representation, wherein the outputs of the multiple heads are combined and processed through a linear layer to form the final output of the self-attention layer, which includes the complex interactions of the parts of the input sequence, providing rich information for the subsequent steps. In summary, the data preprocessing step effectively captures the internal dependency of the input sequence through a multi-head self-attention mechanism, and provides powerful support for Named Entity Recognition (NER). The design is exquisite, and the strong attention capability is combined with the diversity of the multi-head mechanism, so that deep understanding of input data is obtained. It is understood that in the medical community, a multi-headed self-attention mechanism based on a transducer is used for Named Entity Recognition (NER) to identify key entities such as diseases, drugs, symptoms, etc.

specifically, the self-adaptive migration learning model comprises a Bert language pre-training layer, a bidirectional long-short-term memory network BiLSTM layer, a conditional random field CRF layer and a migration learning module, wherein the Bert language pre-training model is used for vectorizing the structured data and converting the structured data into a machine-readable form, the bidirectional long-short-term memory network BiLSTM layer is used for further processing the vectorized data and extracting vectorized characteristic data, the conditional random field CRF layer is used for decoding an output result of the bidirectional long-short-term memory network BiLSTM to obtain a prediction labeling sequence, and the migration learning module is used for migrating parameters of the self-adaptive migration learning model trained based on general original data to a new model in a specific target field.

Illustratively, the Bert language pre-training layer serves as a feature extractor to migrate learned features from the source task to the target task. The specific process is as follows:

pre-training stage: the BERT is pre-trained on a large amount of unlabeled data, learning a generic language representation.

Fine tuning: fine tuning is performed for specific source and target tasks. The fine tuning process of the model can be expressed by the following formula:

L(θ)＝ΣL_t(y_t,f(x_t；θ))

wherein: l (θ) is the total loss function. L_t (y_t, f (x_t; θ)) is the loss function of the target task, where y_t is the target variable, x_t is the input feature, and θ is the model parameter.

Domain adaptation: the domain adaptation is mainly to align domain differences between the source task and the target task, so that knowledge of the source task can be better migrated to the target task. This may involve some domain-aligned techniques, such as domain-invariant feature learning.

further, the process specifically comprises the following steps:

S104, constructing a plurality of hierarchical graph attention networks based on different feature extraction results according to different subtasks, and optimizing the hierarchical graph attention networks by adopting a dynamic element learning algorithm.

It can be understood that for constructing a plurality of hierarchical graph attention networks based on different feature extraction results according to different subtasks, wherein nodes in the hierarchical graph attention networks represent different object entities in the subtasks, and features output by the deep migration learning model based on the target domain serve as attributes of the nodes in the corresponding hierarchical graph attention networks.

In the invention, the hierarchical graph attention mechanism adopts a multi-layer graph convolution, and each layer can capture different range of neighborhood information. The specific operation is as follows:

attention calculation: the attention weight of each node to its neighbors is calculated as:

α_{ij}＝softmax_j(LeakyReLU(a^T[W*x_i||W*x_j]))

wherein: α_ { ij } is the attention weight of node i to node j. W is a weight matrix for linearly transforming node characteristics. a is a weight vector used to calculate the importance of a node pair. The i indicates a connection operation.

Feature update: the new feature representation for each node is calculated as:

h_i^{(l)}＝σ(Σ_jα_{ij}*W*x_j)

wherein: h_i { (l) } is a new feature representation of node i in the first layer. Sigma is a nonlinear activation function, such as ReLU. Σj is the sum of all neighbors j of node i.

Layered structure: through multi-layer graph rolling operation, neighborhood information in a farther range can be captured layer by layer, so that deep understanding of the whole graph is realized, and through a hierarchical graph attention mechanism, the model can capture local characteristics of nodes and understand the global structure of the whole graph. This is critical to many knowledge driven tasks such as recommendation systems, knowledge graph mining, etc.

Illustratively, in the medical field, the multi-level medical graph model includes the following

A first layer: the individual layer of the patient comprises basic information, medical history, experimental results and the like of the individual.

A second layer: and a disease classification layer for classifying and connecting different diseases and symptoms.

Third layer: treatment protocol layers contain various drugs, surgery and other treatment methods.

Further by way of example, for a diabetic patient, a personalized treatment regimen is needed whose map model is:

individual layer: at the individual level, HGAT first analyzes the patient's personal information including age, gender, weight, blood glucose level, family history, etc.

Disease classification layer: HGAT would link the patient to a disease classification of diabetes and further identify other underlying diseases or symptoms associated with diabetes.

Treatment plan layer: at this level, the HGAT will analyze various possible treatment regimens, including medication, diet control, exercise programs, etc., and recommend a personalized treatment regimen based on the patient's personal information and disease classification.

Knowledge enhancement: through the hierarchical attention mechanism, the HGAT is able to automatically mine useful information from various layers and use this information to augment existing medical knowledge bases for more accurate diagnosis and treatment in the future.

For optimizing the hierarchical graph attention network by adopting a dynamic meta-learning algorithm, the dynamic meta-learning algorithm in the embodiment of the application adopts a meta-learning scheme based on FOMAML, and the aim of meta-learning is to train a model by training a series of tasks, so that the model can be quickly adapted to new tasks through a small number of samples and a small number of gradient updates. Assume that there is a set of tasks T, each task t_i being defined by a loss function l_i. The goal of meta-learning is to find a set of initialization parameters θ that are quickly adaptable to all tasks.

Whereas FOMAML is a model-agnostic meta-learning algorithm, applicable to any differentiable model. The key is that only one step of the degree information is used for meta-updating. The method comprises the following specific steps:

intra-task update: for each task t_i, the parameters are first adjusted by several gradient updates. Specifically, the gradient is calculated with the training set of task t_i and the parameters are updated:

wherein: θ' _i is the parameter updated by task t_i, α is the learning rate,is the gradient of the loss function L i with respect to θ.

Meta-update: and calculating the meta-update according to the updated parameters of all the tasks. Specifically, the gradient is calculated with the test set of tasks t_i and the meta-parameters are updated:

wherein: beta is the meta-learning rate. Σi is the sum of all tasks.

Illustratively, in the medical field, the disease type and treatment regimen are very diverse, and at the same time, the individual differences of patients are also large. Thus, a highly flexible and adaptable model is needed to handle different types of medical tasks, such as disease diagnosis, personalized treatment regimen generation, drug recommendation, etc.

Task definition:

task one: disease diagnosis (e.g. cancer diagnosis, diabetes diagnosis, etc.)

Task two: drug recommendation (for specific diseases or symptoms)

Task three: personalized treatment plan generation (e.g., sports and diet plans)

Task four: high risk group prediction (e.g., heart disease high risk patients)

Meta learning using FOMAML:

first, the model may be pre-trained over multiple tasks. The meta-learning scheme of FOMAML is used to dynamically adjust model parameters according to the loss functions of different tasks.

Dynamic task adjustment:

for example, in a cancer diagnosis task (task one), the model findings have high accuracy, but perform poorly in a drug recommendation task (task two). Based on the FOMAML algorithm, the model may automatically adjust to optimize performance on task two without significantly affecting performance of task one.

Task fine tuning and personalization:

the model can be fine-tuned quickly for newly added tasks or patient-specific data. For example, a personalized treatment regimen is required for a particular patient (task three), and the model can be quickly trimmed based on the patient's data to generate a targeted treatment regimen.

The foregoing description of the preferred embodiments of the invention is not intended to be limiting, but rather to enable any modification, equivalent replacement, improvement or the like to be made within the spirit and principles of the invention.

Claims

1. An intelligent knowledge enhancement method based on hierarchical graph attention and dynamic meta learning is characterized by comprising the following steps:

2. The method for intelligent knowledge enhancement based on hierarchical attention and dynamic meta-learning of claim 1, wherein the data preprocessing process comprises: inputting the relevant original data into a named entity recognition model, and changing the relevant original data into a structural representation by the named entity recognition model, wherein the named entity recognition model comprises an embedding layer, a self-attention mechanism layer, a multi-head attention layer and an output layer.

3. The method for intelligent knowledge enhancement based on hierarchical attention and dynamic element learning according to claim 2, wherein the adaptive migration learning model comprises a Bert language pre-training layer, a two-way long and short term memory network BiLSTM layer, a conditional random field CRF layer and a migration learning module, the Bert language pre-training model is used for vectorizing the structured data to be machine-readable form, the two-way long and short term memory network BiLSTM layer is used for further processing the vectorized data and extracting vectorized characteristic data, the conditional random field CRF layer is used for decoding the output result of the two-way long and short term memory network BiLSTM to obtain a prediction labeling sequence, and the migration learning module is used for migrating parameters of the adaptive migration learning model trained based on general original data to a new model in a specific target field.

4. The method for intelligent knowledge enhancement based on hierarchical attention and dynamic meta-learning according to claim 3, wherein the feature extraction is performed by adopting a self-adaptive migration learning model after training based on subtasks related to a main domain, and specifically comprises the following steps:

5. The intelligent knowledge enhancement method based on hierarchical attention and dynamic meta-learning according to claim 4, wherein for each subtask, a hierarchical attention network related to the subtask is constructed, nodes in the hierarchical attention network represent different object entities in the subtask, and features output by a deep migration learning model based on a target domain serve as attributes of the nodes in the corresponding hierarchical attention network.

6. The method for intelligent knowledge enhancement based on hierarchical attention and dynamic meta-learning of claim 1, wherein the dynamic meta-learning algorithm employs a meta-learning scheme based on FOMAML.

7. The method for intelligent knowledge enhancement based on hierarchical attention and dynamic meta-learning of claim 6 wherein parameters of hierarchical attention network are dynamically adjusted according to loss functions of different subtasks in FOMAML-based meta-learning scheme.