CN116484868A

CN116484868A - Cross-domain named entity recognition method and device based on diffusion model generation

Info

Publication number: CN116484868A
Application number: CN202310483045.8A
Authority: CN
Inventors: 李雄; 李刚; 杨恩好; 崔广; 袁庆龙; 李婵娟
Original assignee: Zhongke Zidong Information Technology Beijing Co ltd
Current assignee: Zhongke Zidong Information Technology Beijing Co ltd
Priority date: 2023-04-28
Filing date: 2023-04-28
Publication date: 2023-07-25

Abstract

The method and the device for identifying the cross-domain named entity based on the generated diffusion model are provided, and feature extraction is carried out on source data through a first generated diffusion model to obtain an original generation set, wherein the source data are text data with completed labeling; fine tuning the first generated diffusion model according to the original generation set and a preset excitation model to obtain a first target generated diffusion model; extracting characteristics of target data through a second generation diffusion model and the first target generation diffusion model to obtain a target generation set, wherein the second generation diffusion model and the first target generation diffusion model have the same network framework, and the target data and the source data belong to different fields; and inputting the target generation set into a pre-trained named entity recognition model to obtain an entity naming result aiming at the target data, and realizing named entity recognition among different domains quickly and efficiently.

Description

Cross-domain named entity recognition method and device based on diffusion model generation

Technical Field

The present invention belongs to the field of computer technology, and is especially one kind of cross-domain named entity identifying method and device based on diffusion model generation.

Background

Named Entity Recognition (NER) is an information extraction process by which entities mentioned in unstructured text can be identified and classified, such as text recognition in the medical field, multimodal information analysis in the news field. Machine learning models are typically used to perform named entity recognition. However, to perform accurate named entity recognition, a large amount of annotation data (e.g., unstructured sentences including pre-categorized words) is required to train the machine learning model. Furthermore, name entities are typically domain-specific, that is, many name entities are typically domain-specific. Thus, machine learning models trained using training data associated with a particular domain may generally effectively perform named entity recognition for that particular domain only, and not for any other domain.

Because of the large amount of effort, computing resources and time required to construct large amounts of annotation data, typically only training data associated with a small number of domains is available, existing annotation data sets are small in size, and there are differences in the entity types of the different data sets, and multiple data sets cannot be directly consolidated. Therefore, how to quickly and efficiently implement named entity recognition between different domains is a technical problem that needs to be solved at present.

Disclosure of Invention

In view of the foregoing problems in the prior art, an object of the present invention is to provide a method and an apparatus for identifying a named entity across domains based on generating a diffusion model, which can quickly and efficiently identify named entities between different domains.

In order to solve the technical problems, the specific technical scheme is as follows:

in one aspect, provided herein is a method for identifying a cross-domain named entity based on generating a diffusion model, the method comprising:

extracting features of source data through a first generation diffusion model to obtain an original generation set, wherein the source data is marked text data;

fine tuning the first generated diffusion model according to the original generation set and a preset excitation model to obtain a first target generated diffusion model;

extracting characteristics of target data through a second generation diffusion model and the first target generation diffusion model to obtain a target generation set, wherein the second generation diffusion model and the first target generation diffusion model have the same network framework, and the target data and the source data belong to different fields;

and inputting the target generation set into a pre-trained named entity recognition model to obtain an entity naming result aiming at the target data.

Further, the first generated diffusion model is obtained by:

acquiring training set data;

establishing an initial generation model with a Markov chain structure;

based on the Markov chain structure, performing normal differential equation-based diffusion processing on the training set data in a potential space of a variation automatic encoder to obtain a plurality of continuous diffusion variables, wherein the diffusion variables meet the following Gaussian distribution:q _t is a forward edge distribution; n () is a gaussian distribution; x is x _t Is the t diffusion variable; gamma ray _t Is. />Is the square of the t-th real vector; i is a vector;

training the initial generation model based on a preset score model and the diffusion variable until a converged first generation diffusion model is obtained, wherein the preset score modelThe scoring function in the model is:where e is the noise variable.

Further, the expression of the ordinary differential equation in the first generated diffusion model is as follows:

wherein β (t) is a gaussian distribution parameter.

Further, the preset excitation model is obtained through the following steps:

determining a plurality of generation sentences according to the original generation set;

marking the quality of a plurality of generated sentences by a manual annotation mode to obtain sentence sequences based on marking;

Training the initial excitation model through a preset excitation loss function and the statement sequence until the training excitation model is obtained, wherein the preset excitation loss function is expressed as:

and carrying out standardization processing on the training excitation model by using the deviation to obtain the preset excitation model.

Further, the fine tuning of the first generated diffusion model according to the original generation set and a preset excitation model to obtain a first target generated diffusion model includes:

inputting the generated sentences in the original generation set into the preset excitation model, and selecting sentences with scores greater than zero into an optimized generation set;

and fine tuning the first generated diffusion model through a near-end strategy optimization algorithm according to the optimization generation set to obtain a first target generated diffusion model.

Further, the loss function corresponding to the near-end policy optimization algorithm is:

wherein,,

wherein: wherein: e (E) _t Is expected; r is (r) _t (θ) is the ratio between the old and new policies; the E is a super-ginseng;an estimate of a dominance function; pi _θ Is a strategy; a, a _t Is the motion vector s _t A state vector.

Further, feature extraction is performed on target data through the second generated diffusion model and the first target generated diffusion model to obtain a target generation set, including:

Establishing a second generation diffusion model based on the network structure of the first target generation diffusion model, wherein a Markov chain in the second generation diffusion model has joint distribution based on noise y as follows:x ₀ the final output of the diffusion model, factorized variance distribution conditioned on noise y, is:

and inputting the target data into a second generation diffusion model to obtain a target generation set, wherein the target generation set comprises a sentence sequence with marks.

Further, the pre-trained named entity recognition model comprises an input sequence encoder, a tag encoder and a tag predictor;

the input sequence encoder comprises a BERT model trained by source domain data, expressed as: [ h ] ₁ ，h ₂ ，…，h _N ]＝f _In (x ₁ ，x ₂ ，…，x _N )，h _i Is d ₁ A marker vector of the dimension;

the tag encoder includes a pre-trained Bi-LSTM model, expressed as: [ e ] ₁ ，e ₂ ，…，e _N ]＝f _In (s ₁ ，s ₂ ，…，s _N )，s _k Is obtained by label embedding the label sequence through a label lookup table G, e _k Is the output of Bi-LSTM, G E R ^k×d2 K represents the number of unique tags in the source domain or the target domain, and d2 is the tag embedded specification;

the tag predictor comprises a Bi-Attention model, and is used for carrying out knowledge fusion on the output results of the input sequence encoder and the tag predictor so as to obtain an entity naming result of the target data.

Further, inputting the target generation set into a pre-trained named entity recognition model to obtain an entity naming result for the target data, including:

determining a sentence sequence with marks according to the target generation set;

coding the sentence sequence with the mark in the BERT model which is completed by training by taking the sentence sequence with the mark as an input sequence so as to obtain a mark representation vector corresponding to the sentence sequence with the mark;

randomly initializing and constructing a label lookup table based on a label marked before in a source domain or a target domain;

performing feature extraction on the label marked before based on the label lookup table and Bi-LSTM which is trained in advance to obtain a label expression vector;

projecting a tag representation vector through a full link layer neural network to the same dimension as the tag representation vector;

obtaining label background information of an input sequence according to the projected label expression vector and a preset attention weight calculation rule;

according to the label background information and the label expression vector, combining a preset attention module, and calculating to obtain label context information;

connecting the label background information and the label context information to obtain label perception information;

And carrying out fusion processing on the label perception information and the label representation vector to obtain an entity naming result aiming at the target data.

In another aspect, there is provided herein a cross-domain named entity recognition apparatus based on generating a diffusion model, the apparatus comprising:

the original generation set acquisition module is used for extracting characteristics of source data through the first generation diffusion model to obtain an original generation set, wherein the source data is marked text data;

the first target generation and diffusion model determining module is used for fine tuning the first generation and diffusion model according to the original generation set and a preset excitation model to obtain a first target generation and diffusion model;

the target generation set acquisition module is used for extracting characteristics of target data through a second generation diffusion model and the first target generation diffusion model to obtain a target generation set, the second generation diffusion model and the first target generation diffusion model have the same network framework, and the target data and the source data belong to different fields;

and the entity naming result obtaining module is used for inputting the target generation set into a pre-trained naming entity recognition model to obtain an entity naming result aiming at the target data.

By adopting the technical scheme, the cross-domain named entity identification method and device based on the generated diffusion model perform feature extraction on source data through the first generated diffusion model to obtain an original generation set, wherein the source data is marked text data; fine tuning the first generated diffusion model according to the original generation set and a preset excitation model to obtain a first target generated diffusion model; extracting characteristics of target data through a second generation diffusion model and the first target generation diffusion model to obtain a target generation set, wherein the second generation diffusion model and the first target generation diffusion model have the same network framework, and the target data and the source data belong to different fields; and inputting the target generation set into a pre-trained named entity recognition model to obtain an entity naming result aiming at the target data, and realizing named entity recognition among different domains quickly and efficiently.

The foregoing and other objects, features and advantages will be apparent from the following more particular description of preferred embodiments, as illustrated in the accompanying drawings.

Drawings

In order to more clearly illustrate the embodiments herein or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described below, it being obvious that the drawings in the following description are only some embodiments herein and that other drawings may be obtained according to these drawings without inventive effort to a person skilled in the art.

FIG. 1 illustrates a flow chart of steps of a method for identifying a cross-domain named entity based on generating a diffusion model provided by embodiments herein;

FIG. 2 illustrates a distinguishing schematic of a denoising-generating diffusion model and a denoising-generating diffusion repair model in embodiments herein;

FIG. 3 illustrates a schematic diagram of an excitation pattern training process in embodiments herein;

FIG. 4 illustrates an overall framework diagram of the methods provided in embodiments herein;

FIG. 5 is a schematic structural diagram of a cross-domain named entity recognition device based on generating a diffusion model according to an embodiment of the present disclosure;

fig. 6 shows a schematic structural diagram of a computer device provided in embodiments herein.

Description of the drawings:

501. an original generation set acquisition module;

502. a first target generation diffusion model determination module;

503. A target generation set acquisition module;

504; and an entity naming result obtaining module.

Detailed Description

The following description of the embodiments of the present disclosure will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are only some, but not all embodiments of the disclosure. All other embodiments, based on the embodiments herein, which a person of ordinary skill in the art would obtain without undue burden, are within the scope of protection herein.

It should be noted that the terms "first," "second," and the like in the description and claims herein and in the foregoing figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments described herein may be capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, apparatus, article, or device that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed or inherent to such process, method, article, or device.

Named entity recognition may be used in a variety of applications. For example, named entity recognition may be used to enhance understanding and interpretation of unstructured sentences by providing context and meaning to words. The interpretation of the sentence may then be used to create structured data for storage, analysis, and/or automated response. In one example, named entity recognition may be used to interpret descriptions provided by third parties (e.g., product descriptions from merchants). In another example, named entity recognition may be used to interpret a user's meaning and/or intent based on the user's chat utterance. The user's meaning and/or intent may then be used to generate an automated response for the user.

As described above, name entities are typically domain specific. That is, a name entity that is applicable to one domain is generally not applicable to another domain. For example, a name entity, such as a scientific student, a compound, etc., is specific to the scientific domain only (and not other domains), while a name entity, such as a musician, a prize, a song, etc., is specific to the music domain only (and not other domains). Furthermore, when certain words are classified under different domains, they may be associated with different name entities. For example, the word "Persia prince" may be classified as a person in the history domain, as a movie title in the movie domain, or as a game title in the video game domain.

In the prior art, a large amount of annotation data is generally required to train a machine learning model for named entity recognition of a specific domain, so that a large amount of computing resources are consumed, the effect is very general, and how to learn other domains through entity recognition of other domains which have already completed training becomes a technical problem to be solved at present.

In order to solve the above problems, the embodiments of the present disclosure provide a method for identifying a named entity across domains based on generating a diffusion model, which can quickly and efficiently identify named entities between different domains. FIG. 1 is a schematic diagram of steps of a method for identifying a cross-domain named entity based on generating a diffusion model according to embodiments herein, which provides the method operational steps as described in the examples or flowcharts, but may include more or fewer operational steps based on conventional or non-inventive labor. The order of steps recited in the embodiments is merely one way of performing the order of steps and does not represent a unique order of execution. When a system or apparatus product in practice is executed, it may be executed sequentially or in parallel according to the method shown in the embodiments or the drawings. As shown in fig. 1, the method may include:

A cross-domain named entity recognition method based on generating a diffusion model, the method comprising:

s101: extracting features of source data through a first generation diffusion model to obtain an original generation set, wherein the source data is marked text data;

s102: fine tuning the first generated diffusion model according to the original generation set and a preset excitation model to obtain a first target generated diffusion model;

s103: extracting characteristics of target data through a second generation diffusion model and the first target generation diffusion model to obtain a target generation set, wherein the second generation diffusion model and the first target generation diffusion model have the same network framework, and the target data and the source data belong to different fields;

s104: and inputting the target generation set into a pre-trained named entity recognition model to obtain an entity naming result aiming at the target data.

It will be appreciated that the first generated diffusion model may be a denoising diffusion model (Denoising Diffusion Probabilistic Model, DDPM) and the second generated diffusion model may be a denoising diffusion recovery model (Denoising Diffusion Probabilistic Repair Model, DDRM), wherein the source data is already annotated data of the source domain, and wherein the entity relationships in the source domain data are also already annotated. The target domain corresponding to the target data is in an adjacent relationship with the source domain, so that the entity relationship in the target domain can be learned by learning the entity relationship of the data in the source domain.

Specifically, a Denoising Diffusion Probability Model (DDPM) is a gradual generation model, and the DDPM is a model that firstly builds a data sample, gradually changes to random noise, then performs inverse transformation from the random noise to the data sample, and finally generates a required data sample by repeatedly performing the inverse transformation. During the training process, the generated model learns a model distribution p _θ (x) It approximates the conditional distribution q (x) from the samples, which build up the Markov chain structure by indexing T within the total number of steps specified by the hyper-parameters T. We can train the inverse mean function approximator mu _θ Predicting e by modifying its parameters, followed by a total of T gradient descentAnd obtaining a complete random noise statement.

The Denoising Diffusion Repair Model (DDRM) solves the problem of unsupervised learning by using a pre-trained generated diffusion model, and a common method is usually to solve the problem in an iterative mode, which has high requirement on calculation amount and is sensitive to adjustment of super parameters, while the DDRM can achieve a very good effect by only needing few NFEs (neural function evaluation). We define DDRM as the marcarboff chain with condition y, y is the condition we introduce, is the optimal solution of DDPM, plays the role of pretraining, x ₀ Is the final diffusion output. Similar to DDPM, it also has a conditional distribution q (x), and a model distribution p _θ (x) A. The invention relates to a method for producing a fibre-reinforced plastic composite The mathematical evidence shows that the optimal solution of DDPM is also the optimal solution of DDRM.

In the embodiment of the present specification, the first generated diffusion model is obtained by:

acquiring training set data;

establishing an initial generation model with a Markov chain structure;

training the initial generation model based on a preset score model and the diffusion variable until a converged first generation diffusion model is obtained, wherein a score function in the preset score model is as follows:where E is the noise variable。

It will be appreciated that the first generation diffusion model in this embodiment is improved by the noise reduction diffusion model, DDPM is an excellent image generation model, but we cannot apply it directly to text generation, requiring some specific improvements. There are many excellent applications of Diffusion models in the text generation field that can be referred to, such as the method of Diffusion-LM, which converts discrete words in forward propagation into a series of continuous potential vectors by word embedding, then forward diffuses each continuous vector continuously with gaussian noise, and backward continuously denoises and finally quantizes each potential vector to a nearest word embedding. However, the model has a large defect that the generation quality of the modeling whole sentence is not displayed, and the use scene is greatly limited because the denoising generation needs to be started again after the generation of the other defect is of a fixed length in advance.

First, the following denoising diffusion module is introduced: the diffusion model is a generative model with a Markov chain structure, the structure of which can be represented by the formula where xεR ⁿ ：

x _T →x _T-1 →…→x ₁ →x ₀

It has the following joint distribution:

after all x variables are derived, only x is retained ₀ As a sample of the generated model, a fixed factor may be introduced for the variational inference distribution in order to train the diffusion model:

this results in the lower bound of Evidence (ELBO) of the maximum likelihood target, where evidence refers to the probability density of the data or observable variable, the lower bound of evidence refers to the logarithmic form of evidence to the left and its lower bound to the right.

One particular attribute of generating a diffusion model is that for all T < T,and q ^(t) Are all selected as conditional gaussian distributions, x _t Can be regarded as x ₀ The result of being corrupted by gaussian noise, therefore, ELBO can be reduced to the following denoising auto encoder:

wherein,,is a θ parameterized neural network whose function is from noisy x _t And recover an observation without noise interference.

When the method is improved, the potential space in a Variation Automatic Encoder (VAE) can be diffused, and the method is also diffused in a continuous space, unlike a Diffusion-LM, training word embedding is not needed, and a series of optimization skills are adopted, so that the Diffusion process is only responsible for controlling the text property in a low-dimensional continuous space, after the potential vector has corresponding properties, the potential vector is delivered to a decoder to generate the text, and the method has three advantages, on one hand, the limitation of fixed-length generation of the Diffusion-LM can be avoided, and on the other hand, the text smoothness is ensured because the text generation is still delivered to the autoregressive decoder.

A Score-model is built such that the training objectives of DDPM are exactly equivalent to those of Score-model, and the forward-diffused diffusion kernel is known to be Gaussian, taking advantage of the aggregate nature of the Gaussian distribution, x _t Concerning x ₀ Still a gaussian distribution, expressed as follows:

from the above Gaussian distribution expression, the following Score-Function expression can be obtained, as follows:

the core-random differential equation (SDE) of the DDPM can be connected with the Ordinary Differential Equation (ODE), and the SDE in the diffusion process has a definite ODE corresponding solution, so that the SDE represents huge calculation amount of thousands of steps of a diffusion model can be avoided by means of an ODE solver, and meanwhile, the diffusion is continuously unified from discrete directions, so that the condition control generation is particularly natural; the forward process is known as an SDE, and the backward process is also known as an SDE, so that the specific form of the ODE corresponding to the reverse SDE corresponding to the DDPM can be obtained as follows:

in the embodiment of the present specification, the procedure for creating the denoising diffusion repair model, that is, the second generation diffusion model, is as follows:

1) Variant targeting of DDRM

The key idea of DDRM is to find an unsupervised solution that is also suitable for supervised learning objectives, which can also be defined as a markov chain for any linear inverse problem:

x _T →x _T-1 →…→x ₁ →x ₀

Unlike before, one more condition y is:

x ₀ is the final output of the diffusion model, consider the following factorial variation distribution conditioned on y:

the ELBO target of the y-based diffusion model was derived, and we next demonstrated the relationship of the distribution of DDRM to that of DDPM.

2) Diffusion process for data recovery

First, consider the singular value decomposition of H and diffuse in its tensor space, the core idea is to sum the noise in condition y with x _1:T To ensure the diffusion result x ₀ Consistent with the expected results. By using SVD, we identified the missing x data in y and used a diffusion process for synthesis. At the same time, the condition y also undergoes a denoising process, for example, when using noise, the sentence space is also tensor space, and the model should complement the missing tensor in space, that is, the structure of the missing complement sentence, and for general H its singular value decomposition can be expressed as:

H＝U∑V ^T

wherein U is E R ^m×m ，V∈R ^n×n Are all orthogonal matrices, Σr ^m×n Is a rectangular diagonal matrix containing the singular values of H, ordered downwards, we shorthand the values of the vector space,is vector->Index i,/->Is vector->And because V is an orthogonal matrix, we can go through the pair +. >Restoring x by multiplying V _t For->We can define the variation distribution as:

where τ is a hyper-parameter that controls the variance of the transformation, the above-described construction considers the different cases of each index of the tensor space, and if the corresponding singular value is 0, y does not directly provide any information to the index, the update being more analogous to conventional unconditional generation, and if the singular value is not 0, the update process considers the information provided by y, depending on whether the noise level of the tensor space is greater than the noise level in the diffusion model.

Since we will q ^(t) Defined as a series of gaussian conditions, we also distribute the model p _θ Also defined as a series of distribution conditions, similar to DDPM, our goal is to obtain x at each step t ₀ We use the notation x for the prediction of (2) _θ，t To represent the model f _θ (x _t+1 Predictions made by t+1), we will alsoDefined as->Is the i-th index of (c).

We can define DDRM with training parameters θ as follows:

theoretically we have to give each a different H and σ _y The linear inverse problem of (2) retrains a different model, which is not the case in practice.

According to theorem, it is known that: hypothesis modelAnd->There is no weight sharing at t+.t', then when τ=1 and +. > In this case, the ELBO target of DDRM may be rewritten to the form of DDPM target. The learning goal of the DDRM is a weighted square error in the tensor space, so the pre-trained DDPM model is well approximate to the optimal solution, so we can use the approximate diffusion model for the linear inverse problem with different super parameters, i.e. we can use the optimal solution of the DDPM to make the approximate optimal solution of the DDRM according to different data fields, and the detailed structure diagram of the DDPM and the DDRM is shown in fig. 2.

Therefore, in the proposed model, the ODE corresponding to the DDRM, which is also the ODE corresponding to the DDRM, can be expressed as the following formula:

in this embodiment of the present disclosure, in step S101, the process of fine tuning the first generated diffusion model to obtain the first target generated diffusion model may be a reinforcement learning process, specifically, when generating the diffusion model to generate text, some data that is not helpful to the user at all is often generated, which is generally called that the model is inconsistent with the user 'S requirement, which is also called that the generated data is inconsistent with the user' S requirement, and what we need to do is fine tuning the language model through human feedback, so that the generated diffusion model has a strong randomness, and avoiding occurrence of such toxic data is particularly important for a language model deployed and used in an application program.

In summary, we have made progress in optimizing the task of identifying named entities based on generating a challenge model by having the generated diffusion model generate data of the same class as the target domain data according to the user's mind, we first resort to reinforcement learning of human feedback, which technique uses human preferences as a reward signal to fine-tune our model, we extract source domain data features through DDPM, annotate by staff, build a training dataset, then we build a Reward Model (RM) on this dataset, i.e. preset incentive model, to predict what data we need, what the optimal solution of DDPM should be, finally we use this RM as a reward function, fine-tune our supervised learning baseline using PPO algorithm to maximize rewards.

Further, the preset excitation model is obtained through the following steps:

As shown in fig. 3, a schematic diagram of the incentive model (or incentive model) training process is shown.

The basic idea of RLHF is to directly optimize a language model with human feedback using reinforcement learning, i.e. using artificial feedback of the generated text as a performance measure, and further using this feedback as a penalty to optimize the model, RLHF is a complex concept involving multiple models and different training phases, which can be decomposed in three steps:

1. pre-training a Language Model (LM);

2. aggregating the generated data and obtaining a reward pattern (RM);

3. and fine-tuning the LM by a reinforcement learning mode.

The DDPM-based generation model is a pre-training language model, the model generated data is training rewarding model RM data, the idea of ChatGPT can be used for reference, a labeling ordering sequence mode is adopted to replace direct scoring, a labeling person can only order different answers, and the idea of replacing absolute tasks with relative tasks can facilitate the labeling person to make a unified labeling result.

Assuming now a ranked sequence, to train a scoring model we use the following loss function:

to normalize the differences better, we pull the value between 0 and 1 by a sigmoid function for every two differences, and as can be seen from the formula, the value of loss is equal to the sum of all "reward in the front minus reward in the back" in the ordered list, we want the model to maximize the difference between the "good sentence" and "bad sentence" scores, and because the gradient descent process is actually equivalent to doing the minimization operation, we need to take a negative number for loss again:

loss＝-loss

firstly, we acquire a data set through DDPM, each row is a sequencing sequence, the front row is biased to be positive, the rear row is biased to be negative, we expect to train a reward model through the sequence, when sentences are biased to be positive, the representative practical value is larger, on a backstone, we select ERNIE as a reference model, and a pore_output layer of the model is connected with a linear layer to obtain one-dimensional reward.

We have found that if we simply distribute the comparison results to one dataset, we can cause the trained reward model to be over fitted. Instead, we will have all of the hints in each The comparison is trained as a batch element, which greatly improves the computational efficiency. Because the fitting is no longer performed, it improves the accuracy of verification and reduces log loss, specifically, we normalize the reward model with bias before the reinforcement learning process begins, facilitating the label presentation to achieve an average score of 0 before reinforcement learning begins.

It will be appreciated that the fine-tuning task of the first generated diffusion model may be represented as a Reinforcement Learning (RL) task, and embodiments of the present disclosure may employ a near-end policy optimization algorithm (Proximal policy Optimization, PPO) to accomplish the fine-tuning, a policy (policy) being a language model that accepts hints and returns a series of texts (or probability distributions of texts), the action space of this policy being all of the tokens corresponding to the vocabulary of the LM, the observation space being a sequence of possible input tokens, and the reward function being a combination of the preference model and the policy transition constraints. The behavior strategy is used for producing data, the target strategy refers to a strategy needing to update and optimize, if two strategies are one strategy, we call On Policy and On-line strategy, otherwise call Off Policy and Off-line strategy.

The core of the PPO algorithm is to use the following policy loss function:

wherein:

for representing the ratio between new and old policies, e is a super parameter, which is used to ensure that when multiple policy updates are performed using the same batch of data, the difference between the new and old policies is not too large, and in general we set e=0.2. In the policy loss functionFor estimation of the dominance function, we generally use GAE to calculate the dominance function:

wherein,,the method comprises the following steps:

in summary, PPO is a new strategy gradient reinforcement learning method that alternately samples data through interactions with the environment and uses a random gradient-rising strategy to replace the objective function, whereas standard strategy gradient methods require a complete gradient update to be performed on each data sample, PPO allows for small batch updates over multiple time periods, is easier to implement than trusted region strategy optimization, has less sample complexity, and has better balance. The specific fine tuning process is not described in the embodiment of the present specification.

In the embodiment of the present description, the named entity recognition problem is solved as a tag sequence problem, where named entities can be regarded as tags of a tag, given an input sequence x with N tags, the goal of NER is to output a corresponding tag sequence y of the same length, i.e. modeling P (x|y). Our named entity recognition module is mainly composed of three parts, an input sequence encoder that encodes an input sequence X, a tag encoder that encodes a previously tagged tag, and a tag predictor that predicts a tagged NER tag.

In the cross-domain NER task, a data set is obtained from each of the source domain and the target domain, denoted as D _S And D _t The purpose of it is from D _S Learning valuable information and transferring it to D _t . The general NER framework principle can be represented by the following formula:

in order to better utilize the relationship between the label and the label, the framework adopted by the prior method can be expressed as an autoregressive model, and the principle is as follows:

/>

further, the input sequence encoder includes a BERT model trained from source domain data, expressed as: [ h ] ₁ ，h ₂ ，…，h _N ]＝f _In (x ₁ ，x ₂ ，…，x _N )，h _i Is d ₁ A marker vector of the dimension;

In this embodiment of the present disclosure, inputting the target generation set into a pre-trained named entity recognition model to obtain an entity naming result for the target data includes:

Illustratively, as shown in fig. 4, an overall framework diagram of the method provided in the present specification, further, the named entity recognition model may have the following composition structure:

(1) Input sequence encoder

The pretrained BERT model is used, denoted as f _In Encoding the input sequence:

[h ₁ ，h ₂ ，…，h _N ]＝f _In (x ₁ ，x ₂ ，…，x _N )

wherein h is _i Is d ₁ The vector of dimensions is aimed at obtaining the context information of the corresponding tag.

(2) Label encoder

In order to model the relationship between tag sequence and tag sequence, a new tag encoder is used to extract context information from tag sequence, the method is different from the previous method in that the named entity tags are predicted based on the commonly used current tag representation and the tag information extracted from the previous tags, and a randomly initialized tag lookup table G E R needs to be constructed ^k×d2 K represents the number of unique tags in the source domain or the target domain, d2 is the specification of tag embedding, we use Bi-LSTM encoded tag sequences, expressed as:

[e ₁ ，e ₂ ，…，e _N ]＝fI _n (s ₁ ，s ₂ ，…，s _N )

wherein s is _k Is through the label lookup table G through the pair label y _k E, from tag embedding _k Is the output of Bi-LSTM, whose purpose is to obtain the context information of the previous tag.

(3) Label predictor

The tag predictor predicts NER tags using the context information of the input sequence and the last tag sequence, and to combine these two information we introduce a simple and efficient Bi-Attention module, specifically we use the last hidden state of Bi-LSTM as the tag's representation sequence, the representation sequence as the query vector, and the tag representation of all input sequence encoders is considered as the key factor.

E can be applied by a full connection layer neural network before performing the matrix product _i-1 Projection onto the sum h _i The same dimensions:

e′ _i-1 ＝W ₂ ·e _i-1 +b ₂

e′ _i-1 and h _i As such, all are d ₁ Vector of dimensions, followed by calculation of the attention weight using softmax function:

can be seen as a probability distribution and is used to calculate a weighted sum of the input marker representations:

due to label background informationIs composed of e _i-1 It is deduced that it can represent the relationship between the tag of the current token and the whole input sequence, in addition to which the current tag x is obtained _i And a previously predicted tag y _1：i-1 The relation between them is to increase x _i Sensitivity to previously named entity tags we first express the tag +.>As an intermediate state, and maps it into a two-dimensional vector:

the context information of the tag is then calculated still using an attention module:

wherein,,indicating the mark i versus e ₁ ,e ₂ ,…,e _i-1 Finally, the label background information on the input sequence and the context information obtained by the g-body label are connected to be used as the final label perception information z _i :

To further fuse tag-related knowledge to tag x _i In (2), h _i Related expression information z _i Taken together, g is the final sequence representation.

In practice, the named entity recognition model may also include a pre-training and fine-tuning process, which may be divided into two phases, in the first phase, to enhance the text feature extraction capability for the target domain, training the input sequence encoder on the domain-dependent corpus (i.e., source domain and target domain) to reduce the differences in background and text distribution for the source domain and target domain, and further to obtain more efficient features from the target domain, learn text knowledge, and enhance the effect of the feature extractor. More importantly, for shared named entity tags, valuable tag embedding can be learned prior to accessing the target domain.

In the second stage, the model on the target domain is fine-tuned and used for Dt. With pre-trained shared tag embedding of Bi-LSTM encoded tag sequences, the model can further learn relationships between named entity tags and target domain specific named entity tags (i.e., tags only exist in the target domain) as well as inherent tag dependency information. This may further help the model exploit knowledge of the source domain to better understand these invisible labels in the target domain.

The embodiment of the specification can achieve the following beneficial effects:

Combination of ddpm and DDRM on ODE version. Firstly, the invention rewrites a Denoising Diffusion Probability Model (DDPM) and a denoising probability recovery model (DDRM) which are constructed by a random differential equation (SDE) into models constructed by a normal differential equation (ODE), and the generation step is changed from thousands of steps to tens of steps, thereby greatly improving the efficiency of the diffusion model and solving the defect of too slow generation speed of the diffusion model based on the SDE generation.

Because the DDPM can only solve the problem of supervised learning, in order to solve the problem of identifying named entities on an unsupervised target domain, DDRM is introduced, and can directly call the optimal solution of the DDPM to perform unsupervised learning, and higher accuracy is ensured.

2. Introducing reinforcement learning based on human feedback

The birth of ChatGPT draws extensive attention worldwide, and the principle behind the ChatGPT draws attention of an industry man, the invention uses a labeling ordering sequence mode to order data by a manual labeling mode for the DDPM reading source domain data generation data set, replaces the traditional mode of directly scoring a generated text, and solves the problem that different labeling staff scoring is difficult to unify in the traditional mode.

3. Using the relationship between labels and tags

The traditional NER method pays little attention to the relationship between labels and tags, the framework we employ can be expressed as an autoregressive model, which can be extended to labels between source and target domains, and a new label encoder is employed, which can utilize the relationship between labels and tags to perform the task of named entity recognition.

Based on the method provided above, the embodiment of the present disclosure further provides a device for identifying a cross-domain named entity based on generating a diffusion model, as shown in fig. 5, where the device includes:

the original generation set acquisition module 501 is configured to perform feature extraction on source data through a first generation diffusion model to obtain an original generation set, where the source data is labeled text data;

the first target generation and diffusion model determining module 502 is configured to fine tune the first generation and diffusion model according to the original generation set and a preset excitation model, so as to obtain a first target generation and diffusion model;

the target generation set obtaining module 503 is configured to perform feature extraction on target data through a second generation diffusion model and the first target generation diffusion model, so as to obtain a target generation set, where the second generation diffusion model and the first target generation diffusion model have the same network architecture, and the target data and the source data belong to different fields;

And the entity naming result obtaining module 504 is configured to input the target generation set into a pre-trained named entity recognition model, and obtain an entity naming result for the target data.

The beneficial effects obtained by the device are consistent with those obtained by the method, and the embodiments of the present disclosure are not repeated.

The present embodiment provides a computer device, and an internal structure thereof may be as shown in fig. 6. The computer device includes a processor, a memory, and a network interface connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a method for identifying a driving surface covering of a computer device.

It will be appreciated by those skilled in the art that the structure shown in fig. 6 is merely a block diagram of some of the structures associated with the present application and is not limiting of the computer device to which the present application may be applied, and that a particular computer device may include more or fewer components than shown, or may combine certain components, or have a different arrangement of components.

In one embodiment, a computer device is provided, comprising a memory and a processor, the memory having stored therein a computer program, the processor implementing the steps of the method embodiments described above when the computer program is executed.

In one embodiment, a computer-readable storage medium is provided, on which a computer program is stored which, when executed by a processor, carries out the steps of the method embodiments described above.

In an embodiment, a computer program product is provided, comprising a computer program which, when executed by a processor, implements the steps of the method embodiments described above.

Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, database, or other medium used in the various embodiments provided herein may include at least one of non-volatile and volatile memory. The nonvolatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, high density embedded nonvolatile Memory, resistive random access Memory (ReRAM), magnetic random access Memory (Magnetoresistive Random Access Memory, MRAM), ferroelectric Memory (Ferroelectric Random Access Memory, FRAM), phase change Memory (Phase Change Memory, PCM), graphene Memory, and the like. Volatile memory can include random access memory (Random Access Memory, RAM) or external cache memory, and the like. By way of illustration, and not limitation, RAM can be in the form of a variety of forms, such as static random access memory (Static Random Access Memory, SRAM) or dynamic random access memory (Dynamic Random Access Memory, DRAM), and the like. The databases referred to in the various embodiments provided herein may include at least one of relational databases and non-relational databases. The non-relational database may include, but is not limited to, a blockchain-based distributed database, and the like. The processors referred to in the embodiments provided herein may be general purpose processors, central processing units, graphics processors, digital signal processors, programmable logic units, quantum computing-based data processing logic units, etc., without being limited thereto.

It should also be understood that in embodiments herein, the term "and/or" is merely one relationship that describes an associated object, meaning that three relationships may exist. For example, a and/or B may represent: a exists alone, A and B exist together, and B exists alone. In addition, the character "/" herein generally indicates that the front and rear associated objects are an "or" relationship.

Those of ordinary skill in the art will appreciate that the elements and algorithm steps described in connection with the embodiments disclosed herein may be embodied in electronic hardware, in computer software, or in a combination of the two, and that the elements and steps of the examples have been generally described in terms of function in the foregoing description to clearly illustrate the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present disclosure.

It will be clear to those skilled in the art that, for convenience and brevity of description, specific working procedures of the above-described systems, apparatuses and units may refer to corresponding procedures in the foregoing method embodiments, and are not repeated herein.

In the several embodiments provided herein, it should be understood that the disclosed systems, devices, and methods may be implemented in other ways. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of the units is merely a logical function division, and there may be additional divisions when actually implemented, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted or not performed. In addition, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices, or elements, or may be an electrical, mechanical, or other form of connection.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the elements may be selected according to actual needs to achieve the objectives of the embodiments herein.

Specific examples are set forth herein to illustrate the principles and embodiments herein and are merely illustrative of the methods herein and their core ideas; also, as will be apparent to those of ordinary skill in the art in light of the teachings herein, many variations are possible in the specific embodiments and in the scope of use, and nothing in this specification should be construed as a limitation on the invention.

Claims

1. A cross-domain named entity recognition method based on generating a diffusion model, the method comprising:

2. The method of claim 1, wherein the first generated diffusion model is obtained by:

acquiring training set data;

establishing an initial generation model with a Markov chain structure;

based on the Markov chain structure, performing normal differential equation-based diffusion processing on the training set data in a potential space of a variation automatic encoder to obtain a plurality of continuous diffusion variables, wherein the diffusion variables meet the following Gaussian distribution: q _t Is a forward edge distribution; n () is a gaussian distribution; x is x _t Is the t diffusion variable; gamma ray _t Is. />Is the square of the t-th real vector; i is a vector;

training the initial generation model based on a preset score model and the diffusion variable until a converged first generation diffusion model is obtained, wherein a score function in the preset score model is as follows:where e is the noise variable.

3. The method of claim 2, wherein the expression of the ordinary differential equation in the first generated diffusion model is:

wherein β (t) is a gaussian distribution parameter.

4. The method according to claim 1, wherein the predetermined excitation pattern is obtained by:

5. The method of claim 1, wherein the fine tuning the first generated diffusion model to obtain a first target generated diffusion model according to the original generated set and a preset excitation model, comprises:

6. The method of claim 5, wherein the loss function corresponding to the near-end policy optimization algorithm is:

wherein,,

wherein: e (E) _t Is expected; r is (r) _t (θ) is the ratio between the old and new policies; the E is a super-ginseng;an estimate of a dominance function; pi _θ Is a strategy; a, a _t Is the motion vector s _t A state vector.

7. The method of claim 1, wherein extracting features from the target data by the second generated diffusion model and the first target generated diffusion model to obtain a target generation set, comprises:

establishing a second generation diffusion model based on the network structure of the first target generation diffusion model, wherein a Markov chain in the second generation diffusion model has joint distribution based on noise y as follows: x ₀ The final output of the diffusion model, factorized variance distribution conditioned on noise y, is:

8. The method of claim 1, wherein the pre-trained named entity recognition model comprises an input sequence encoder, a tag encoder, and a tag predictor;

the input sequence encoder comprises a BERT model trained by source domain data, expressed as: [ h ] ₁ ,h ₂ ,…,h _N ]＝f _In (x ₁ ,x ₂ ,…,x _N )，h _i Is d ₁ A marker vector of the dimension;

the tag encoder includes a pre-trained Bi-LSTM model, expressed as: [ e ] ₁ ,e ₂ ,…,e _N ]＝f _In (s ₁ ,s ₂ ,…,s _N )，s _k Is obtained by label embedding the label sequence through a label lookup table G, e _k Is the output of Bi-LSTM, G E R ^k×d2 K represents the number of unique tags in the source domain or the target domain, and d2 is the tag embedded specification;

9. The method of claim 8, wherein inputting the target generation set into a pre-trained named entity recognition model to obtain an entity naming result for the target data comprises:

10. A cross-domain named entity recognition device based on generating a diffusion model, the device comprising: