WO2022001333A1 - 基于双曲空间表示和标签文本互动的细粒度实体识别方法 - Google Patents

基于双曲空间表示和标签文本互动的细粒度实体识别方法 Download PDF

Info

Publication number
WO2022001333A1
WO2022001333A1 PCT/CN2021/090507 CN2021090507W WO2022001333A1 WO 2022001333 A1 WO2022001333 A1 WO 2022001333A1 CN 2021090507 W CN2021090507 W CN 2021090507W WO 2022001333 A1 WO2022001333 A1 WO 2022001333A1
Authority
WO
WIPO (PCT)
Prior art keywords
entity
label
matrix
context
model
Prior art date
Application number
PCT/CN2021/090507
Other languages
English (en)
French (fr)
Inventor
刘杰
Original Assignee
首都师范大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 首都师范大学 filed Critical 首都师范大学
Publication of WO2022001333A1 publication Critical patent/WO2022001333A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/353Clustering; Classification into predefined classes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/049Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs

Definitions

  • the present application belongs to the field of fine-grained entity recognition, and in particular relates to a fine-grained entity recognition method based on hyperbolic space representation and label text interaction.
  • Named entity recognition has always been the basis of important research tasks in natural language processing fields such as information extraction, question answering systems, and machine translation. Its purpose is to identify and classify the components in the text that represent named entities.
  • fine-grained entity recognition includes not only simple label classification (such as person names, place names), but also more detailed and complex recognition classification (such as occupation, company) according to different entity granularities.
  • simple label classification such as person names, place names
  • complex recognition classification such as occupation, company
  • fine-grained named entity recognition often contains more information, which can provide valuable prior knowledge information and more effectively provide more knowledge for downstream tasks, such as relation extraction, event extraction, and metaphor resolution. and question answering system.
  • Fine-grained entity recognition can provide more refined, hierarchical, and different granularity entity information, and is more suitable for applications in actual complex scenarios.
  • the hierarchy and granularity of entities are reflected through the hierarchical relationship of tags.
  • How to represent a better hierarchical relationship of tags through modeling methods is the focus of research.
  • a graph neural network method based on tag co-occurrence information is used; there is also a method of using hyperbolic space to obtain the tag hierarchy.
  • the co-occurrence information based on the label itself will contain a certain amount of noise, and the co-occurrence relationship can only reflect part of the correlation; the hyperbolic space method is only more effective for fine-grained entities, and is insufficient for coarse-grained entities.
  • the corresponding fixed mapping method leads to a fixed number of label predictions, obtaining the hierarchical relationship of labels and a better modeling representation for the text model.
  • the two tasks are often independent of segmentation, and the guidance of text information is missing in the construction of the label relationship, usually It is to do a simple interaction with the text after building it alone, ignoring the relationship between the text and the label.
  • the present application provides a fine-grained entity recognition method based on hyperbolic space representation and label text interaction, which can solve one of the above technical problems.
  • the present application provides a fine-grained entity recognition method based on hyperbolic space representation and label text interaction, comprising the following steps:
  • the pre-trained graph convolutional neural network model is a model obtained by training based on the labels in the training set and the corresponding label association matrix;
  • the pre-trained hyperbolic space-based label-text interaction mechanism model is a model obtained by training based on the entity-context representation in the training set, the word-level label relationship matrix and the corresponding label classification results.
  • the fine-grained entity recognition method based on hyperbolic space representation and label text interaction proposed in the embodiment of the present application is based on the label text interaction mechanism, and utilizes the hierarchical nature of data in the fine-grained entity recognition task, which naturally fits in such a hyperbolic space. This hierarchical relationship is strengthened in the space of , so that the matching effect of labels and texts is better.
  • step S1 includes:
  • the character-based convolutional neural network model is used to encode the entity;
  • the Bi-LSTM model is used to encode the context, and the hidden state at each moment is output, and then the hidden state is interacted with the self-attention mechanism layer on the top layer to obtain contextual features;
  • step S12 includes:
  • both hm and hc are feature dimensions and satisfy the following relationship:
  • m pro j is the mapping function
  • tanh is the built-in function of the long short-term memory network model LSTM
  • connection layer is the connection layer
  • M is the entity.
  • the correlation matrix in step S122 satisfies the following formula:
  • A is the association matrix
  • W a is the learnable matrix, which is used to obtain the feedback of the interaction between entity mentions and contextual features
  • C is the contextual features
  • lc is the number of contextual annotations.
  • step S123 includes:
  • the feedback information after the initial interaction between the encoded entity and the context features is obtained, which satisfies the following formula:
  • feedback information after the initial interaction r c entities encoded by context characteristics.
  • step S124 the information of entity and context interaction in step S124 satisfies the following formula:
  • r is the entity-context mixture feature
  • g is the Gaussian error linear unit
  • o is the information of the interaction between the entity and the context
  • W r is the learnable matrix corresponding to the entity-context mixture feature
  • W g is the Gaussian error linear unit corresponding to the learnable matrix. learning matrix.
  • the training process of the graph convolutional neural network model includes:
  • the label is used as a node of the graph in the graph convolutional neural network model, and the co-occurrence information of the label is used as an edge to obtain a label association matrix;
  • the word-level label relation matrix follows the following propagation rules in the graph convolutional neural network model:
  • W' O is the word-level label relation matrix, is a diagonal matrix, is the output of the operation of the label association matrix, A' word is the word-level association matrix, W O is the randomly initialized parameter matrix, and T is the transformation matrix;
  • a word is the word-level label association matrix.
  • the training process of the hyperbolic space-based label-text interaction mechanism model includes:
  • the entity-context representation and the label relationship matrix are input into the hyperbolic space-based label-text interaction mechanism model, and the final label classification result of the entity is output, which satisfies the following formula:
  • p is the final label classification result of the entity
  • is the sigmoid normalization function
  • f is the matrix splicing function
  • N is the number of labels
  • d f is the matrix dimension after splicing.
  • the fine-grained entity recognition method based on hyperbolic space representation and label text interaction in this application proposes a label text interaction mechanism based on hyperbolic space.
  • An attention module is used to obtain context and label correlation, and then the label relationship help in the build process.
  • this hierarchical relationship is strengthened in a naturally fitting space such as hyperbolic space, and the Poincaré distance is used to replace the original cosine similarity method for calculation. Makes label and text match better.
  • FIG. 1 is a flowchart of a fine-grained entity recognition method based on hyperbolic space representation and label text interaction provided by the present application
  • FIG. 2 is a schematic diagram of a hierarchical structure of label data in Embodiment 1 of the application;
  • Fig. 3 is the structural diagram of hyperbolic space in the embodiment 1 of the application.
  • FIG. 4 is a schematic diagram of a model framework provided by the application.
  • Fig. 5 is the label distribution ratio diagram of Ultra-Fine data set and OntoNotes data set in the embodiment 2 of the application;
  • FIG. 6 is a schematic diagram of the precision rate-recall rate of the tag text interaction mechanism model in the present application and the model in the comparative experiment in Embodiment 2 of the present application.
  • Fine-grained entity recognition can provide more refined, hierarchical, and different granularity entity information, and is more suitable for applications in actual complex scenarios.
  • the hierarchy and granularity of entities are reflected through the hierarchical relationship of tags. How to represent a better hierarchical relationship of tags through modeling methods is the focus of research.
  • a method for designing hierarchy-aware loss by a given label hierarchy is proposed.
  • a method for jointly representing word and type in Euclidean space is proposed. These methods all pre-define the label type structure based on the entity type data set.
  • the knowledge base cannot contain all types in it. For example, if person/female/teacher is pre-set and there is no person/female/nurse form, it cannot effectively identify the nurse categories that are not in the knowledge base. Therefore, for a large number of unknown and undefined new types, it is difficult for models trained on these knowledge bases to learn to recognize effectively.
  • entity recognition is proposed in a more open scenario containing datasets of over 10,000 unknown types.
  • it is proposed to introduce a graph propagation layer, which uses the co-occurrence information of the labels to generate an adjacency matrix of the labels to capture the deep-level potential label relationships.
  • a graph propagation layer which uses the co-occurrence information of the labels to generate an adjacency matrix of the labels to capture the deep-level potential label relationships.
  • Fine-grained named entity recognition often produces different results with different contexts, and at the same time has certain logical regularity.
  • How to establish a representation that conforms to contextual logic and relational logic according to different text contexts is a key challenge. For example, in the same context, if an entity is a "judge", then the possibility of being a "defendant” at the same time is very low. This is in line with our logic, because the two identities have a wide span and are in the same language. in the environment. However, with different contexts, for identities with small spans, there is a certain problem in that the possibility that an entity is a "teacher” and a "student” is very low. Because a person is a teacher when he is in school and a student when he is in the gym. Therefore, the logic is based on the contextual relationship. When we ignore the relationship between the contextual text and the label, the effect of the model is affected.
  • an encoding method for joint embedding learning based on Euclidean space is proposed.
  • Euclidean space it is impossible to represent arbitrary hierarchical information in embedding, which will cause information loss for data with hierarchical information.
  • hyperbolic space is more suitable for embedding coding of hierarchical information than Euclidean space. Because the distance from the center of the source point to the edge in the hyperbolic space increases exponentially, the number of types contained in each layer will also increase exponentially with the increase of the number of layers, and the two have a natural structural fit.
  • hyperbolic space is better than Euclidean space for very fine-grained data.
  • fine-grained entity tasks include not only ultra-fine-grained entities but also coarse-grained entities, and it is not enough to perform well at a certain granularity.
  • the text entity does not have a hierarchical structure, and how to better match the hierarchical labels in the hyperbolic space is also a problem that needs to be solved.
  • the fine-grained entity recognition method based on hyperbolic space representation and label text interaction proposes a hyperbolic space-based label text interaction mechanism, which obtains context and label correlation through an attention module. and then help in the label relationship generation process.
  • this hierarchical relationship is strengthened in a naturally fitting space such as hyperbolic space, and the Poincaré distance is used to replace the original cosine similarity method for calculation. Makes label and text match better.
  • the flowchart of the fine-grained entity recognition method based on hyperbolic space representation and label text interaction includes the following steps:
  • the entity is represented as M ⁇ R hm
  • the context feature is represented as C ⁇ R lc ⁇ hc
  • both hm and hc are feature dimensions
  • lc is the number of contextual annotations.
  • step S12 it specifically includes:
  • m proj is the mapping function
  • tanh is the built-in function of the Long Short-Term Memory (LSTM) model
  • connection layer is the connection layer
  • M is the entity.
  • A is the association matrix
  • W a is the learnable matrix, which is used to obtain the feedback of the interaction between entity mentions and contextual features
  • C is the contextual features.
  • the correlation matrix is normalized to satisfy the following formula:
  • the feedback information after the initial interaction between the encoded entity and the context features is obtained, which satisfies the following formula:
  • feedback information after the initial interaction r c entities encoded by context characteristics.
  • r is the entity-context mixture feature
  • g is the Gaussian error linear unit
  • o is the output, that is, the information of the interaction between the entity and the context
  • W r is the learnable matrix corresponding to the entity-context mixture feature
  • W g is the Gaussian error linear unit The corresponding learnable matrix.
  • the pre-trained graph convolutional neural network model is a model obtained by training based on the labels in the training set and the corresponding label association matrix.
  • the training process of the graph convolutional neural network model includes:
  • the co-occurrence information of the labels is obtained.
  • the vector of the labels in the dataset is embedded into the hyperbolic space, and the adjacent points are calculated according to the cosine similarity to generate a correlation matrix, which is used as the basis for the co-occurrence information.
  • Hyperbolic structure is the study of non-Euclidean spaces with constant negative curvature.
  • hyperbolic space can be considered as an open disk without boundary, which is the so-called Poincaré disk, and the disk expressed by it is infinite.
  • the model of the Poincaré disc becomes a Poincaré sphere.
  • the distance between two points u and v satisfies the following formula:
  • d H (u, v) is the distance between the two points u and v on the Poincaré sphere.
  • Figure 3 shows the structure diagram of a hyperbolic space.
  • the items at the top of the hierarchy are placed near the origin, and the items at the bottom are placed near infinity.
  • Accuracy can be improved when vector similarity is used to represent type relationships.
  • the hierarchy reflects the annotated type distribution, and hyperbolic space is superior to Euclidean space in this respect.
  • the label is used as the node of the graph in the graph convolutional neural network model, and the co-occurrence information of the label is used as the edge to obtain the label association matrix.
  • entity types are usually represented as a tree-like structure.
  • the nodes in the graph are generally directly represented as entity types, and the edges between the nodes are relatively ambiguous, and it is unknown which nodes need to be connected by edges.
  • It is necessary to pass a type co-occurrence matrix ie, label association matrix: there are two types t 1 and t 2, both of which are true types of entities. If there is a dependency between the two types, then the edge is used to Connect two nodes.
  • a co-occurrence matrix is established as the adjacency matrix of the co-occurrence relation graph through the co-occurrence information of the labels.
  • W' O is the word-level label relation matrix, is a diagonal matrix, is the output of the operation of the label association matrix, A' word is the word-level association matrix, W O is the parameter matrix that is randomly initialized, and T is the transformation matrix.
  • AL is the label association matrix, that is, the adjacency matrix
  • IN is the edge information used to add autocorrelation to the feature matrix
  • a word is the word-level label association matrix.
  • the word-level tag relationship matrix is obtained through the word-level tag association matrix. From the above formula, it can be seen that the prediction of the true type t i of an entity depends on its nearest neighbors. Therefore, in this application, 1-hop propagation information is adopted, and the nonlinear activation of the graph convolutional neural network is ignored, because unnecessary constraints will be introduced on the scale of the label's weight matrix.
  • the pre-trained hyperbolic space-based label-text interaction mechanism model is a model obtained by training based on the entity-context representation in the training set, the word-level label relationship matrix and the corresponding label classification results.
  • the entity-context representation and label relationship matrix are input into the hyperbolic space-based label-text interaction mechanism model, and the final label classification result of the entity is output, which satisfies the following formula:
  • p is the probability of the current label, that is, the final label classification result of the entity
  • is the sigmoid normalization function
  • f is the matrix splicing function
  • N is the number of labels
  • d f is the matrix dimension after splicing.
  • FIG. 4 is a schematic diagram of the model framework in the application, after encoding the entity and the context, interact based on the Attention model to obtain the entity-context representation; in the hyperbolic space, based on the labels in the data set, The graph convolutional neural network model is used to obtain the label relationship matrix; based on the entity-context representation and label relationship matrix, combined with the label text interaction mechanism model of hyperbolic space, the final label classification result of the entity is obtained.
  • label-context interaction is also based on an attention layer. Taking the word-level label relationship matrix as the target and the context as the memory, the Attention mechanism can be used for interaction.
  • the fine-grained entity recognition method based on hyperbolic space representation and label text interaction provided by this application is compared with other models.
  • the experiments are carried out using the same public dataset as the baseline model. As shown in Table 1, it is part of the parameters of the experiment.
  • the main experimental dataset is the Ultra-Fine dataset, which contains 10331 labels and most of them are defined as free-form unknown phrases.
  • the training set is annotated by the method of remote supervision, mainly based on KB, Wikipedia and the relationship dependency tree based on the first word as the annotation source, and finally forms a 25.4M training sample, and also includes about 6000 crowdsourced samples, with an average of each The samples all contain 5 ground truth labels.
  • OntoNotes is a dataset with a smaller amount of data and less complexity.
  • the main purpose is to reflect the scalability of our model: it is not only effective for datasets with a large number of ultra-fine-grained entities and rich co-occurrence information, but also for small-volume datasets such as OntoNotes.
  • OntoNotes dataset contains only about 1.5 labels per sample on average.
  • baseline models (AttentiveNER model, MultiTask model, LabelGCN model, and FREQ model) are selected for comparison in this embodiment.
  • this task is similar to our method, which uses GCN to capture the label relationship, but the essential difference is that we not only consider the relationship between the labels themselves, but also add the context information of the text to carry out a label relationship.
  • the interaction mechanism improves performance and introduces hyperbolic space to enhance the relational representation between labels. Therefore, we are also better in performance and because of the addition of textual information, the recall rate is significantly improved.
  • the model of the present application adopts hyperbolic space to enhance the representation of label relations.
  • the FREQ task mainly improves the accuracy of ultra-fine-grained entities, and the improvement of coarse-grained and fine-grained entities is not obvious, resulting in a poor overall effect.
  • hyperbolic space is more suitable for complex data tasks than Euclidean space, but it does not work well for coarse-grained ones.
  • our model uses the hyperbolic space as the embedding, it also retains the embedding information of the Euclidean space, so the overall performance has achieved good results.
  • Figure 6 is a schematic diagram of the precision rate-recall rate of the model, and the experimental setting and evaluation method consistent with the LabelGCN model are used to evaluate the overall performance of the model. It can be seen from Figure 6 that the model provided by this application (represented by Ours) has the best effect on the equilibrium point.
  • baseline models (AttentiveNER model, AFET model, LNR model, NFETC model, MultiTask model, and LabelGCN model) are selected for comparison in this embodiment.
  • the embodiments of the present application may be provided as a method, a system or a computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.
  • computer-usable storage media including, but not limited to, disk storage, CD-ROM, optical storage, etc.
  • any reference signs placed between parentheses shall not be construed as limiting the claim.
  • the word “comprising” does not exclude the presence of elements or steps not listed in a claim.
  • the word “a” or “an” preceding an element does not exclude the presence of a plurality of such elements.
  • the present application may be implemented by means of hardware comprising several different components and by means of a suitably programmed computer. In the claims enumerating several means, several of these means can be embodied by one and the same item of hardware.
  • the words first, second, third, etc. are used for convenience only and do not imply any order. These words can be understood as part of the part name.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Machine Translation (AREA)

Abstract

提供了一种基于双曲空间表示和标签文本互动的细粒度实体识别方法。包括步骤:S1、基于数据集中已标注的实体和上下文,并对实体和上下文进行交互,得到实体-上下文表示;S2、在双曲空间下,基于数据集中的标签,并结合预先训练的图卷积神经网络模型,得到词级标签关系矩阵;S3、将实体-上下文表示和词级标签关系矩阵输入预先训练的基于双曲空间的标签文本互动机制模型,输出实体最终的标签分类结果。解决了现有技术中共现关系含噪、双曲空间文本标签映射匹配差的技术问题。

Description

基于双曲空间表示和标签文本互动的细粒度实体识别方法 技术领域
本申请属于细粒度实体识别领域,具体涉及一种基于双曲空间表示和标签文本互动的细粒度实体识别方法。
背景技术
命名实体识别一直以来都是信息抽取、问答系统、机器翻译等自然语言处理领域中重要的研究任务的基础。其目的是识别出文本中表示命名实体的成分并进行分类。
细粒度实体识别与一般的实体识别相比,不仅包含简单的标签分类(例如人名、地名),还按照实体粒度不同进行更加细致、更加复杂的识别分类(例如职业、公司)。对于其它的自然语言处理任务,细粒度命名实体识别往往蕴含更多信息,可以提供宝贵的先验知识信息,更加有效地为下游任务提供更多的知识,比如关系抽取、事件抽取、指代消解和问答系统。
细粒度实体识别可以提供更加精细化、层次性、不同粒度的实体信息,更适应于实际复杂场景的应用。一般通过标签的层级关系来体现出实体的层次、粒度,如何通过建模的方法来表示更好的标签的层级关系是研究的重点。现有方法中,有为了获取适用于更加开放、实际应用的标签层级关系,采用基于标签共现信息的图神经网络的方法;也有使用双曲空间来获取标签层级关系的方法。
但是基于标签本身的共现信息会含有一定噪音,共现关系只能体现部分相关性;双曲空间方法只对于精细粒度的实体更有效果,对于粗粒度的实体表现不足,在标签和文本的对应上固定的映射方法导致标签预测数量固定,获得标签的层次关系和对于文本模型更好的建模表示两个工作往往是分割 独立的,在标签关系的构建过程中缺失文本信息的指导,通常是独自构建完再去和文本做简单的交互,忽略了文本与标签之间的关系。
发明内容
本申请提供一种基于双曲空间表示和标签文本互动的细粒度实体识别方法,能够解决上述技术问题之一。
为了达到上述的目的,本申请采用的主要技术方案如下。
本申请提供一种基于双曲空间表示和标签文本互动的细粒度实体识别方法,包括以下步骤:
S1、基于数据集中的实体和上下文,对实体和上下文进行交互,得到实体-上下文表示;
S2、在双曲空间下,基于数据集中对实体进行标注的标签,结合预先训练的图卷积神经网络模型,得到与标签对应的词级标签关系矩阵;
预先训练的图卷积神经网络模型是基于训练集中的标签和对应的标签关联矩阵,进行训练得到的模型;
S3、将实体-上下文表示和词级标签关系矩阵输入预先训练的基于双曲空间的标签文本互动机制模型,输出实体最终的标签分类结果;
预先训练的基于双曲空间的标签文本互动机制模型是基于训练集中实体-上下文表示、词级标签关系矩阵和对应的标签分类结果,进行训练得到的模型。
本申请实施例提出的基于双曲空间表示和标签文本互动的细粒度实体识别方法,基于标签文本互动机制,并利用细粒度实体识别任务中数据具有层次性的特性,在双曲空间这样天然契合的空间中加强这种层级关系,使得标签和文本的匹配效果更好。
可选地,步骤S1包括:
S11、基于数据集中的实体和上下文,在学习模型上对实体和上下文进行编码;
采用基于字符的卷积神经网络模型对实体编码;采用Bi-LSTM模型 对上下文编码,输出每一个时刻的隐含状态,然后将隐含状态在顶层进行自注意力机制层的交互获得上下文特征;
S12、将编码后的实体和上下文特征进行拼接,得到实体-上下文表示。
可选地,步骤S12包括:
S121、通过映射函数对编码后的实体进行矩阵变换,使得编码后的实体的矩阵空间与上下文特征的矩阵空间维度对应一致;
S122、通过Attention模型生成编码后的实体与上下文特征的关联矩阵;
S123、根据关联矩阵,得到编码后的实体与上下文特征的初步交互后的回馈信息;
S124、基于编码后的实体与上下文特征的初步交互后的回馈信息,得到实体与上下文交互的信息;
S125、将实体与上下文交互的信息与上下文特征进行左右拼接,得到实体-上下文表示。
可选地,步骤S121中,经过连接层W m∈R hm×hc的线性变换和tanh函数操作,hm和hc均为特征维度,满足以下关系:
Figure PCTCN2021090507-appb-000001
式中,m proj为映射函数,tanh为长短期记忆网络模型LSTM的内置函数,
Figure PCTCN2021090507-appb-000002
为连接层,M为实体。
可选地,步骤S122中的关联矩阵满足以下公式:
A=m proj×W a×C,A∈R 1×lc
式中,A为关联矩阵,W a为可习得矩阵,用于获取实体提及与上下文特征相关部分交互的回馈,C为上下文特征,lc为上下文标注的数量。
可选地,步骤S123中包括:
将关联矩阵标准化,满足以下公式:
Figure PCTCN2021090507-appb-000003
式中,
Figure PCTCN2021090507-appb-000004
为关联矩阵的标准化结果;
再基于关联矩阵的标准化结果和上下文特征得到编码后的实体与上 下文特征的初步交互后的回馈信息,满足以下公式:
Figure PCTCN2021090507-appb-000005
式中,r c为编码后的实体与上下文特征的初步交互后的回馈信息。
可选地,步骤S124中实体与上下文交互的信息,满足以下公式:
r=ρ(W r[r c;m proj;r c-m proj])
g=σ(W g[r c;m proj;r c-m proj])
o=g*r+(1-g)*m proj
式中,r为实体上下文混合特征,g为高斯误差线性单元,o为为实体与上下文交互的信息,W r为实体上下文混合特征对应的可学习矩阵,W g为高斯误差线性单元对应的可学习矩阵。
可选地,图卷积神经网络模型的训练过程包括:
101、在双曲空间下,基于数据集中的标签,得到标签的共现信息;
102、将标签作为图卷积神经网络模型中图的结点,标签的共现信息作为边,获取标签关联矩阵;
103、将标签关联矩阵输入到预先训练的图卷积神经网络模型中,得到与标签对应的词级标签关系矩阵。
可选地,词级标签关系矩阵在图卷积神经网络模型中遵循以下传播规则:
Figure PCTCN2021090507-appb-000006
式中,W' O为词级标签关系矩阵,
Figure PCTCN2021090507-appb-000007
为对角矩阵,
Figure PCTCN2021090507-appb-000008
为标签关联矩阵经过操作的输出,A' word为词级关联矩阵,W O为随机初始化的参数矩阵,T为转换矩阵;
A' word满足以下公式:
Figure PCTCN2021090507-appb-000009
式中,A word为词级标签关联矩阵。
可选地,基于双曲空间的标签文本互动机制模型的训练过程包括:
基于标签-文本注意力机制,将实体-上下文表示和标签关系矩阵输入基于双曲空间的标签文本互动机制模型,输出实体最终的标签分类结果, 满足以下公式:
Figure PCTCN2021090507-appb-000010
式中,p为实体最终的标签分类结果,σ为sigmoid标准化函数,f为矩阵拼接函数,N为标签数量,d f为拼接后的矩阵维度。
本申请的有益效果是:
本申请的基于双曲空间表示和标签文本互动的细粒度实体识别方法,提出了一种基于双曲空间的标签文本交互机制,通过一个注意力模块来获取上下文和标签相关性,然后在标签关系生成过程中起到帮助。与此同时,利用细粒度实体识别任务中数据具有层次性的特性,在双曲空间这样天然契合的空间中加强这种层级关系,用庞加莱距离替代原有的余弦相似度方式进行计算,使得标签和文本的匹配效果更好。
附图说明
图1为本申请提供的基于双曲空间表示和标签文本互动的细粒度实体识别方法的流程图;
图2为本申请实施例1中的标签数据的层次结构示意图;
图3为本申请实施例1中双曲空间的结构图;
图4为本申请提供的模型框架的示意图;
图5为本申请实施例2中Ultra-Fine数据集和OntoNotes数据集的标签分布比例图;
图6为本申请实施例2中本申请中标签文本互动机制模型与对比实验中模型的精确率-召回率示意图。
具体实施方式
为了更好地解释本申请,以便于理解,下面结合附图,通过具体实施方式,对本申请作详细描述。
细粒度实体识别可以提供更加精细化、层次性、不同粒度的实体信息,更适应于实际复杂场景的应用。一般通过标签的层级关系来体现出实体的层次、粒度,如何通过建模的方法来表示更好的标签的层级关系 是研究的重点。
在本申请的第一相关实施例中,提出通过给定的标签层次结构来设计hierarchy-aware loss的方法。在本申请的第二相关实施例中,提出将word与type在欧式空间进行联合表示的方法。这些方法都基于实体类型数据集事先预定义好标签类型结构。然而,知识库在实际应用场景中,无法包含所有类型在其中,比如预先设定好person/female/teacher,没有person/female/nurse形式,那么对于不在知识库的nurse类别则无法有效识别。因此,对于大量的未知未定义的新类型,基于这些知识库训练的模型很难有效去学习识别。在本申请的第三相关实施例中,提出在包含超过10,000未知类型的数据集的更加开放的场景中进行实体识别。在本申请的第四相关实施例中,提出引入一个图传播层,利用标签的共现信息生成标签的邻接矩阵来捕获深层次的潜在标签关系。但是单独考虑标签的共现信息,可能会因为忽略上下文语境而产生的一定的噪声影响结果。
细粒度命名实体识别经常随着语境不同产生不同的结果,同时又具有一定的逻辑规律性。如何根据文本语境的不同建立合乎语境逻辑、关系逻辑的表示,是关键挑战。比如在同一个语境下,一个实体如果是“法官”,那么同时是“被告人”的可能性很低,这符合我们的逻辑性,因为这两个身份跨度确实很大又在同一个语境中。但随着语境的不同,对于跨度不大的身份,简单的认为一个实体是“老师”的同时是“学生”的可能性很低就存在一定的问题。因为一个人在学校的时候是一名老师,在健身房的时候又是一名学员是可以成立的。因此逻辑性是建立在语境关系的基础上,当我们忽略上下文文本和标签的关系时,模型的效果是受到影响的。
在本申请的第五相关实施例中,提出一种基于欧式空间的联合嵌入学习的编码方法。然而,对于欧式空间来说不可能将任意的层次信息在嵌入的时候进行表示,对于具有层次信息的数据来说会造成信息丢失。在本申请的第六相关实施例中,提出双曲空间比欧式空间更适合层次信 息的嵌入编码。因为在双曲空间中从源点中心到边缘的距离是指数型增长的,对于每层包含的类型数量也会随着层数增加呈指数增长,两者有天然的结构契合。在本申请的第七相关实施例中,提出双曲空间对于非常细粒度的数据的效果要比欧式空间更好。但是,细粒度实体任务不仅仅只有超精细粒度的实体也包含粗粒度的实体,仅仅是某一粒度的表现好是不够的。同时,在双曲空间中文本实体不具有层次结构,如何在双曲空间中和层次性的标签进行更好的匹配也是需要解决的问题。
基于上述,本申请实施例提出的基于双曲空间表示和标签文本互动的细粒度实体识别方法,提出了一种基于双曲空间的标签文本交互机制,通过一个注意力模块来获取上下文和标签相关性,然后在标签关系生成过程中起到帮助。与此同时,利用细粒度实体识别任务中数据具有层次性的特性,在双曲空间这样天然契合的空间中加强这种层级关系,用庞加莱距离替代原有的余弦相似度方式进行计算,使得标签和文本的匹配效果更好。
为了更好的理解上述技术方案,下面将参照附图更详细地描述本申请的示例性实施例。虽然附图中显示了本申请的示例性实施例,然而应当理解,可以以各种形式实现本申请而不应被这里阐述的实施例所限制。相反,提供这些实施例是为了能够更清楚、透彻地理解本申请,并且能够将本申请的范围完整的传达给本领域的技术人员。
实施例1
如图1所示,为本实施例提供的基于双曲空间表示和标签文本互动的细粒度实体识别方法的流程图,包括以下步骤:
S1、基于数据集中的实体和上下文,对实体和上下文进行交互,得到实体-上下文表示。
具体包括以下步骤:
S11、基于数据集中的实体和上下文,在学习模型上对实体和上下文进行编码:采用基于字符的卷积神经网络(Convolutional Neural Networks,CNN)模型对实体编码;采用Bi-LSTM模型对上下文编码,输出每一个 时刻的隐含状态,然后将隐含状态在顶层进行自注意力机制层的交互获得上下文特征。
实体表示为M∈R hm,上下文特征表示为C∈R lc×hc,hm和hc均为特征维度,lc为上下文标注的数量。
S12、将编码后的实体和上下文特征进行拼接,得到实体-上下文表示。
进一步地,在步骤S12中具体包括:
S121、通过映射函数对编码后的实体进行矩阵变换,使得编码后的实体的矩阵空间与上下文特征的矩阵空间维度对应一致。具体地,经过连接层W m∈R hm×hc的线性变换和tanh函数操作,满足以下关系:
Figure PCTCN2021090507-appb-000011
式中,m proj为映射函数,tanh为长短期记忆网络(Long Short-Term Memory,LSTM)模型的内置函数,
Figure PCTCN2021090507-appb-000012
为连接层,M为实体。
S122、通过Attention模型生成编码后的实体与上下文特征的关联矩阵,满足以下公式:
A=m proj×W a×C,A∈R 1×lc  (2)
式中,A为关联矩阵,W a为可习得矩阵,用于获取实体提及与上下文特征相关部分交互的回馈,C为上下文特征。
S123、根据关联矩阵,得到编码后的实体与上下文特征的初步交互后的回馈信息。
其中,将关联矩阵标准化,满足以下公式:
Figure PCTCN2021090507-appb-000013
式中,
Figure PCTCN2021090507-appb-000014
为关联矩阵的标准化结果。
基于关联矩阵的标准化结果和上下文特征得到编码后的实体与上下文特征的初步交互后的回馈信息,满足以下公式:
Figure PCTCN2021090507-appb-000015
式中,r c为编码后的实体与上下文特征的初步交互后的回馈信息。
S124、基于编码后的实体与上下文特征的初步交互后的回馈信息,得到实体与上下文交互的信息,满足以下公式:
r=ρ(W r[r c;m proj;r c-m proj])  (5)
g=σ(W g[r c;m proj;r c-m proj])  (6)
o=g*r+(1-g)*m proj  (7)
式中,r为实体上下文混合特征,g为高斯误差线性单元,o为输出,即为实体与上下文交互的信息,W r为实体上下文混合特征对应的可学习矩阵,W g为高斯误差线性单元对应的可学习矩阵。
S125、将实体与上下文交互的信息与上下文特征进行左右拼接f[o;C],得到实体-上下文表示。
S2、在双曲空间下,基于数据集中对实体进行标注的标签,结合预先训练的图卷积神经网络模型,得到与标签对应的词级标签关系矩阵。其中,预先训练的图卷积神经网络模型是基于训练集中的标签和对应的标签关联矩阵,进行训练得到的模型。
图卷积神经网络模型的训练过程包括:
在双曲空间下,基于数据集中的标签,得到标签的共现信息。具体地,将数据集中的标签的向量嵌入到双曲空间之中,根据余弦相似度计算邻点,生成相关性矩阵,作为共现信息的依据。
双曲结构是针对具有常负曲率的非欧空间的研究。在二维空间里,双曲空间可以被认为是一个开放的没有边界的圆盘,也就是所谓的庞加莱圆盘,其表达的圆盘是无穷大的。当一个点在双曲空间中趋近于无穷时,可以等同于庞加莱圆盘中趋近于无穷的一个点。推广到n维的情况,庞加莱圆盘的模型就会变成一个庞加莱球。在庞加莱球上,u、v两个点的距离满足以下公式:
Figure PCTCN2021090507-appb-000016
式中,d H(u,v)为庞加莱球上,u、v两个点的距离。
如果用源点O和空间中的两个点x 1、x 2来举例,那么当两个点x 1、x 2向庞加莱球边缘进行移动的时候,两个点之间的路径都收敛于源点O,可以看做是对树状层次结构的连续模拟,兄弟结点之间最短的路径一定经过 他们的祖先。与此同时,越靠近空间边缘的点到源点O的距离是呈指数增长的。具有树状层次结构的细粒度标签同样随着深度的增加,标签数量呈指数增长。因此,在结构上双曲空间与层次性的数据具有天然的适应性。如图2所示,为标签数据的层次结构示意图。
如图3所示为双曲空间的结构图,通过在庞加莱球中嵌入层次结构,使得层次结构顶部的项被放置在原点附近,而底部的项被放置在无穷大附近。当使用向量相似度来表示类型关系时,可以提高准确性。在非常细粒度的数据集上,层次结构反映了带注释的类型分布,在这方面双曲空间优于欧几里德空间。
将标签作为图卷积神经网络模型中图的结点,标签的共现信息作为边,获取标签关联矩阵。
在细粒度实体识别任务中,实体类型通常表示为一个树状的结构。在图表示的模型中,图中的结点一般直接表示为实体类型,而结点之间的边是比较模糊的,并且哪些结点需要用边来连接也是未知的。需要通过一种类型共现矩阵(即标签关联矩阵):这里有两个类型t 1、t 2两个都是关于实体的真实类型,如果两个类型之间有依赖关系,那么就通过边来连接两个结点。通过标签的共现信息来建立这样的共现矩阵作为共现关系图的邻接矩阵。
103、将标签关联矩阵输入到图卷积神经网络模型中,得到与标签对应的词级标签关系矩阵。在双曲空间中,这种成对的依赖关系可以由庞加莱距离来计算。为了编码这种邻点信息,本申请遵循图卷积神经网络的传播规则,具体地:
词级标签关系矩阵在图卷积神经网络模型中遵循以下传播规则:
Figure PCTCN2021090507-appb-000017
式中,W' O为词级标签关系矩阵,
Figure PCTCN2021090507-appb-000018
为对角矩阵,
Figure PCTCN2021090507-appb-000019
为标签关联矩阵经过操作的输出,A' word为词级关联矩阵,W O为随机初始化的参数矩阵,T为转换矩阵。
其中,
Figure PCTCN2021090507-appb-000020
满足以下公式:
Figure PCTCN2021090507-appb-000021
式中,A L为标签关联矩阵,即邻接矩阵,I N为特征矩阵用来添加自相关的边的信息。
A' word满足以下公式:
Figure PCTCN2021090507-appb-000022
式中,A word为词级标签关联矩阵。
综合上述,通过词级标签关联矩阵获取词级标签关系矩阵。通过上述公式,可以看出对于实体的真实类型t i的预测依赖于其最近的邻点。所以,本申请中采用1跳传播信息,忽略图卷积神经网络的非线性激活,因为会在标签的权重矩阵的尺度上引入不必要的约束。
S3、将实体-上下文表示和词级标签关系矩阵输入预先训练的基于双曲空间的标签文本互动机制模型,输出实体最终的标签分类结果。其中,预先训练的基于双曲空间的标签文本互动机制模型是基于训练集中实体-上下文表示、词级标签关系矩阵和对应的标签分类结果,进行训练得到的模型。
基于双曲空间的标签文本互动机制模型的训练过程包括:
基于标签-文本注意力机制,将实体-上下文表示和标签关系矩阵输入基于双曲空间的标签文本互动机制模型,输出实体最终的标签分类结果,满足以下公式:
Figure PCTCN2021090507-appb-000023
式中,p为当前标签的概率,即实体最终的标签分类结果,σ为sigmoid标准化函数,f为矩阵拼接函数,N为标签数量,d f为拼接后的矩阵维度。
进一步地,如图4所示,为本申请中模型框架的示意图,将实体和上下文进行编码后,基于Attention模型进行交互,得到实体-上下文表示;在双曲空间下,基于数据集中的标签,结图卷积神经网络模型,得到标签关系矩阵;基于实体-上下文表示和标签关系矩阵,并结合双曲空间的标签文本互动机制模型,得到实体最终的标签分类结果。
进一步地,与实体、上下文交互相似,标签、上下文交互同样基于一个注意层。将词级标签关系矩阵作为目标,上下文作为存储器,则可以利用Attention机制进行交互。
实施例2
本实施例中将本申请提供的基于双曲空间表示和标签文本互动的细粒度实体识别方法与其他模型进行对比实验。为遵循对比一致的原则,采用和基线模型一样的公开数据集进行实验。如表1所示,为实验的部分参数。
表1实验的部分参数
Learning rate 0.001
Batch size 1000
Position embedding size 50
Dropout on context C 0.3
Dropout on mention M 0.4
Hidden dimension of LSTM 100
Dropout on fused featuref(Ultra-Fine) 0.2
Dropout on fused featuref(OntoNotes) 0.3
主要的实验数据集为Ultra-Fine数据集,包含10331个标签并且大多数被定义为自由形式的未知的短语。训练集通过远程监督的方法进行注释,主要根据KB,Wikipedia和基于头字的关系依赖树来作为注释源,最终形成一个25.4M的训练样本,另外还包括大概6000个众包样本,平均每个样本都包含5个真实标签。
为了更好地体现实验的延展性、可迁移性,本实施例中还在常用的OntoNotes数据集上进行实验。与Ultra-Fine数据集不同,OntoNotes是一 个数据量更小并且复杂度不高的一个数据集。主要为了体现我们模型的一种延展性:不仅对于含有大量超精细粒度实体与共现信息丰富的数据集有效,同时对于OntoNotes这样小体量的数据集有效。OntoNotes数据集平均每个样本大约只包含1.5个标签。
以上两个数据集既可以体现复杂的情景又能表明在相对简单的场景模型的性能。如图5所示,为Ultra-Fine数据集和OntoNotes数据集的标签分布比例图。
(一)Ultra-Fine数据集
对于Ultra-Fine数据集,本实施例中选取基线模型(AttentiveNER模型、MultiTask模型、LabelGCN模型和FREQ模型)进行对比。
如表2所示,为Ultra-Fine数据集上,本申请提供的模型与各个基线模型的比较结果和消融实验结果。
表2 Ultra-Fine数据集上本申请提供的模型与各个基线模型的比较结果和消融实验结果
Figure PCTCN2021090507-appb-000024
Figure PCTCN2021090507-appb-000025
注:P-精确率,R-召回率,F1-深度学习的评价指标。
由表2可知,本申请提供的模型结果几乎在各项评价指标上都取得了当前最好的效果,尤其是在召回率上。在决策阈值上,为了公平所有模型采取同样的0.5进行比较。与AttentiveNER模型相比,本申请的模型F1值有明显提高,但准确率略低,这是因为二分类交叉熵(BCE)作为模型训练的损失函数的时候往往更容易预测到相关性最高的那一个,但是对于其它的不那么敏感导致,导致准确率高但是召回率低的问题。本申请的模型在二者的平衡上与性能上要优于它。与MultiTask模型相比,本申请的模型全部评价指标都优于前者。与LabelGCN模型相比,这个任务和我们的方法比较类似使用了GCN来进行标签关系的捕获,但是本质的区别在于我们不仅考虑了标签本身的相互关系,还加入了文本的上下文信息与标签进行一个交互机制提高性能并且引入了双曲空间增强标签之间的关系表示。因此,我们同样在性能表现上更好并且因为有了文本信息的加入,召回率提升很明显。与FREQ模型相比,本申请的模型采用了双曲空间来加强标签关系的表示。但是FREQ任务主要是提高了超精细粒度实体的准确率,对于粗细粒度与细粒度的实体提升并不明显导致整体效果不好。就像该模型作者在文中所说,双曲空间比欧式空间更适合复杂性的数据任务,对于粗粒度的反而效果不好。我们的模型虽然使用了双曲空间作为嵌入,但是同时也保留了欧式空间的嵌入信息,所以整体性能上取得了不错的效果。
通过消融实验可知:没有标签文本互动模块的条件下,与最好的效果相差了0.9%;没有双曲空间模块的条件下,与最好的效果相差0.5%。由此,可以分析出对于实验效果提升最明显的是标签文本交互模块,这也符合我们实现对于模型设计的初衷。确实引入文本信息来和标签进行 关系建立,提升标签的关系表示可以取得更好的效果。双曲空间虽然单独使用提升效果并不明显,但对最终的效果仍然有帮助。最后,在标签文本交互和双曲空间共同作用下模型取得了最好的效果,一方面说明标签关系建立过程中文本信息起了很大的作用,另一方面也说明将标签文本交互来获得关系表示引入到双曲空间中可以再次提升效果。
进一步地,如图6所示为模型的精确率-召回率示意图,采用和LabelGCN模型一致的实验设定和评测方式,用于评价模型的整体性能。从图6可知,本申请提供的模型(用Ours表示)在平衡点上的效果是最好的。
如表3所示,为本申请模型与LabelGCN模型的评价对比。
表3本申请模型与LabelGCN模型的评价对比
  Mi-P Mi-R Mi-F Ma-F
LabelGCN 50.2 25.3 33.7 36.6
Ours 46.2 28.1 34.9 37.8(↑1.2)
(二)OntoNotes数据集
对于OntoNotes数据集,本实施例中选取基线模型(AttentiveNER模型、AFET模型、LNR模型、NFETC模型、MultiTask模型和LabelGCN模型)进行对比。
如表4所示,为OntoNotes数据集上,本申请提供的模型与各个基线模型的比较结果。
表4OntoNotes数据集上本申请提供的模型与各个基线模型的比较结果
Model Accuracy Macro-F1 Micro-F1
AttentiveNER 51.7 71.0 64.9
AFET 55.1 71.1 64.7
LNR 57.2 71.5 66.1
NFETC 60.2 76.4 70.2
MultiTask 59.5 76.8 71.8
LabelGCN 59.6 77.8 72.2
OurModel 60.5 79.0 72.7
注:Accuracy-准确率,Macro-F1-宏平均F1值,Micro-F1-微平均F1值。
由表4可知,本申请的模型在各项评价指标中均高于其他模型。在OntoNotes数据集同样采用和LabelGCN模型一致的实验设定和评测标准。因为加入了标签文本互动信息,在标签本身共现信息不丰富的情境下还可以依据上下文来建立标签关系,所以在性能上还是获得了提升。同时,准确率也取得了最好的效果。
本领域内的技术人员应明白,本申请的实施例可提供为方法、系统或计算机程序产品。因此,本申请可采用完全硬件实施例、完全软件实施例,或结合软件和硬件方面的实施例的形式。而且,本申请可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。
本申请是参照根据本申请实施例的方法、设备(系统)和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。
应当注意的是,在权利要求中,不应将位于括号之间的任何附图标记理解成对权利要求的限制。词语“包含”不排除存在未列在权利要求中的部件或步骤。位于部件之前的词语“一”或“一个”不排除存在多 个这样的部件。本申请可以借助于包括有若干不同部件的硬件以及借助于适当编程的计算机来实现。在列举了若干装置的权利要求中,这些装置中的若干个可以是通过同一个硬件来具体体现。词语第一、第二、第三等的使用,仅是为了表述方便,而不表示任何顺序。可将这些词语理解为部件名称的一部分。
此外,需要说明的是,在本说明书的描述中,术语“一个实施例”、“一些实施例”、“实施例”、“示例”、“具体示例”或“一些示例”等的描述,是指结合该实施例或示例描述的具体特征、结构、材料或者特点包含于本申请的至少一个实施例或示例中。在本说明书中,对上述术语的示意性表述不必须针对的是相同的实施例或示例。而且,描述的具体特征、结构、材料或者特点可以在任一个或多个实施例或示例中以合适的方式结合。此外,在不相互矛盾的情况下,本领域的技术人员可以将本说明书中描述的不同实施例或示例以及不同实施例或示例的特征进行结合和组合。
尽管已描述了本申请的优选实施例,但本领域的技术人员在得知了基本创造性概念后,则可对这些实施例作出另外的变更和修改。所以,权利要求应该解释为包括优选实施例以及落入本申请范围的所有变更和修改。
显然,本领域的技术人员可以对本申请进行各种修改和变型而不脱离本申请的精神和范围。这样,倘若本申请的这些修改和变型属于本申请权利要求及其等同技术的范围之内,则本申请也应该包含这些修改和变型在内。

Claims (10)

  1. 一种基于双曲空间表示和标签文本互动的细粒度实体识别方法,其特征在于,包括以下步骤:
    S1、基于数据集中的实体和上下文,对实体和上下文进行交互,得到实体-上下文表示;
    S2、在双曲空间下,基于数据集中对实体进行标注的标签,结合预先训练的图卷积神经网络模型,得到与标签对应的词级标签关系矩阵;
    预先训练的图卷积神经网络模型是基于训练集中的标签和对应的标签关联矩阵,进行训练得到的模型;
    S3、将实体-上下文表示和词级标签关系矩阵输入预先训练的基于双曲空间的标签文本互动机制模型,输出实体最终的标签分类结果;
    预先训练的基于双曲空间的标签文本互动机制模型是基于训练集中实体-上下文表示、词级标签关系矩阵和对应的标签分类结果,进行训练得到的模型。
  2. 如权利要求1所述的细粒度实体识别方法,其特征在于,步骤S1包括:
    S11、基于数据集中的实体和上下文,在学习模型上对实体和上下文进行编码;
    采用基于字符的卷积神经网络模型对实体编码;采用Bi-LSTM模型对上下文编码,输出每一个时刻的隐含状态,然后将隐含状态在顶层进行自注意力机制层的交互获得上下文特征;
    S12、将编码后的实体和上下文特征进行拼接,得到实体-上下文表示。
  3. 如权利要求2所述的细粒度实体识别方法,其特征在于,步骤S12包括:
    S121、通过映射函数对编码后的实体进行矩阵变换,使得编码后的实体的矩阵空间与上下文特征的矩阵空间维度对应一致;
    S122、通过Attention模型生成编码后的实体与上下文特征的关联矩阵;
    S123、根据关联矩阵,得到编码后的实体与上下文特征的初步交互后的回馈信息;
    S124、基于编码后的实体与上下文特征的初步交互后的回馈信息,得到实体与上下文交互的信息;
    S125、将实体与上下文交互的信息与上下文特征进行左右拼接,得到实体-上下文表示。
  4. 如权利要求3所述的细粒度实体识别方法,其特征在于,步骤S121中,经过连接层W m∈R hm×hc的线性变换和tanh函数操作,hm和hc均为特征维度,满足以下关系:
    Figure PCTCN2021090507-appb-100001
    式中,m proj为映射函数,tanh为长短期记忆网络模型LSTM的内置函数,
    Figure PCTCN2021090507-appb-100002
    为连接层,M为实体。
  5. 如权利要求4所述的细粒度实体识别方法,其特征在于,步骤S122中的关联矩阵满足以下公式:
    A=m proj×W a×C,A∈R 1×lc
    式中,A为关联矩阵,W a为可习得矩阵,用于获取实体提及与上下文特征相关部分交互的回馈,C为上下文特征,lc为上下文标注的数量。
  6. 如权利要求5所述的细粒度实体识别方法,其特征在于,步骤S123中包括:
    将关联矩阵标准化,满足以下公式:
    Figure PCTCN2021090507-appb-100003
    式中,
    Figure PCTCN2021090507-appb-100004
    为关联矩阵的标准化结果;
    再基于关联矩阵的标准化结果和上下文特征得到编码后的实体与上下文特征的初步交互后的回馈信息,满足以下公式:
    Figure PCTCN2021090507-appb-100005
    式中,r c为编码后的实体与上下文特征的初步交互后的回馈信息。
  7. 如权利要求6所述的细粒度实体识别方法,其特征在于,步骤S124中实体与上下文交互的信息,满足以下公式:
    r=ρ(W r[r c;m proj;r c-m proj])
    g=σ(W g[r c;m proj;r c-m proj])
    o=g*r+(1-g)*m proj
    式中,r为实体上下文混合特征,g为高斯误差线性单元,o为实体与上下文交互的信息,W r为实体上下文混合特征对应的可学习矩阵,W g为高斯误差线性单元对应的可学习矩阵。
  8. 如权利要求7所述的细粒度实体识别方法,其特征在于,图卷积神经网络模型的训练过程包括:
    101、在双曲空间下,基于数据集中的标签,得到标签的共现信息;
    102、将标签作为图卷积神经网络模型中图的结点,标签的共现信息作为边,获取标签关联矩阵;
    103、将标签关联矩阵输入到预先训练的图卷积神经网络模型中,得到与标签对应的词级标签关系矩阵。
  9. 如权利要求8所述的细粒度实体识别方法,其特征在于,词级标签关系矩阵在图卷积神经网络模型中遵循以下传播规则:
    Figure PCTCN2021090507-appb-100006
    式中,W' O为词级标签关系矩阵,
    Figure PCTCN2021090507-appb-100007
    为对角矩阵,
    Figure PCTCN2021090507-appb-100008
    为标签关联矩阵经过操作的输出,A' word为词级关联矩阵,W O为随机初始化的参数矩阵,T为转换矩阵;
    A' word满足以下公式:
    Figure PCTCN2021090507-appb-100009
    式中,A word为词级标签关联矩阵。
  10. 如权利要求9所述的细粒度实体识别方法,其特征在于,基于双曲空间的标签文本互动机制模型的训练过程包括:
    基于标签-文本注意力机制,将实体-上下文表示和词级标签关系矩阵输入基于双曲空间的标签文本互动机制模型,输出实体最终的标签分类结果,满足以下公式:
    Figure PCTCN2021090507-appb-100010
    式中,p为实体最终的标签分类结果,σ为sigmoid标准化函数,f为矩阵拼接函数,N为标签数量,d f为拼接后的矩阵维度。
PCT/CN2021/090507 2020-06-30 2021-04-28 基于双曲空间表示和标签文本互动的细粒度实体识别方法 WO2022001333A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010622631.2A CN111782768B (zh) 2020-06-30 2020-06-30 基于双曲空间表示和标签文本互动的细粒度实体识别方法
CN202010622631.2 2020-06-30

Publications (1)

Publication Number Publication Date
WO2022001333A1 true WO2022001333A1 (zh) 2022-01-06

Family

ID=72761486

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/090507 WO2022001333A1 (zh) 2020-06-30 2021-04-28 基于双曲空间表示和标签文本互动的细粒度实体识别方法

Country Status (2)

Country Link
CN (1) CN111782768B (zh)
WO (1) WO2022001333A1 (zh)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114580424A (zh) * 2022-04-24 2022-06-03 之江实验室 一种用于法律文书的命名实体识别的标注方法和装置
CN115935994A (zh) * 2022-12-12 2023-04-07 重庆邮电大学 一种智能识别电商标题方法
CN116304061A (zh) * 2023-05-17 2023-06-23 中南大学 基于层次文本图结构学习的文本分类方法、装置及介质
CN117609902A (zh) * 2024-01-18 2024-02-27 知呱呱(天津)大数据技术有限公司 一种基于图文多模态双曲嵌入的专利ipc分类方法及系统

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111782768B (zh) * 2020-06-30 2021-04-27 首都师范大学 基于双曲空间表示和标签文本互动的细粒度实体识别方法
CN113111302B (zh) * 2021-04-21 2023-05-12 上海电力大学 一种基于非欧空间的信息提取方法
CN114139531B (zh) * 2021-11-30 2024-05-14 哈尔滨理工大学 一种基于深度学习的医疗实体预测方法及系统
CN114722823B (zh) * 2022-03-24 2023-04-14 华中科技大学 构建航空知识图谱的方法及装置、计算机可读介质

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR100829401B1 (ko) * 2006-12-06 2008-05-15 한국전자통신연구원 세부분류 개체명 인식 장치 및 방법
CN107797992A (zh) * 2017-11-10 2018-03-13 北京百分点信息科技有限公司 命名实体识别方法及装置
CN109062893A (zh) * 2018-07-13 2018-12-21 华南理工大学 一种基于全文注意力机制的商品名称识别方法
CN109919175A (zh) * 2019-01-16 2019-06-21 浙江大学 一种结合属性信息的实体多分类方法
US20200110809A1 (en) * 2018-02-01 2020-04-09 Jungle Disk, L.L.C. Collating Information From Multiple Sources To Create Actionable Categories And Associated Suggested Actions
US20200151396A1 (en) * 2018-01-31 2020-05-14 Jungle Disk, L.L.C. Natural language generation using pinned text and multiple discriminators
CN111782768A (zh) * 2020-06-30 2020-10-16 首都师范大学 基于双曲空间表示和标签文本互动的细粒度实体识别方法

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110597970B (zh) * 2019-08-19 2023-04-07 华东理工大学 一种多粒度医疗实体联合识别的方法及装置

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR100829401B1 (ko) * 2006-12-06 2008-05-15 한국전자통신연구원 세부분류 개체명 인식 장치 및 방법
CN107797992A (zh) * 2017-11-10 2018-03-13 北京百分点信息科技有限公司 命名实体识别方法及装置
US20200151396A1 (en) * 2018-01-31 2020-05-14 Jungle Disk, L.L.C. Natural language generation using pinned text and multiple discriminators
US20200110809A1 (en) * 2018-02-01 2020-04-09 Jungle Disk, L.L.C. Collating Information From Multiple Sources To Create Actionable Categories And Associated Suggested Actions
CN109062893A (zh) * 2018-07-13 2018-12-21 华南理工大学 一种基于全文注意力机制的商品名称识别方法
CN109919175A (zh) * 2019-01-16 2019-06-21 浙江大学 一种结合属性信息的实体多分类方法
CN111782768A (zh) * 2020-06-30 2020-10-16 首都师范大学 基于双曲空间表示和标签文本互动的细粒度实体识别方法

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
LIN GUANGHE, ZHANG SHAOWU;LIN HONGFEI: "Named Entity Identification Based on Fine-Grained Word Representation", JOURNAL OF CHINESE INFORMATION PROCESSING, vol. 32, no. 11, 30 November 2018 (2018-11-30), XP055883722, ISSN: 1003-0077 *

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114580424A (zh) * 2022-04-24 2022-06-03 之江实验室 一种用于法律文书的命名实体识别的标注方法和装置
CN114580424B (zh) * 2022-04-24 2022-08-05 之江实验室 一种用于法律文书的命名实体识别的标注方法和装置
CN115935994A (zh) * 2022-12-12 2023-04-07 重庆邮电大学 一种智能识别电商标题方法
CN115935994B (zh) * 2022-12-12 2024-03-08 芽米科技(广州)有限公司 一种智能识别电商标题方法
CN116304061A (zh) * 2023-05-17 2023-06-23 中南大学 基于层次文本图结构学习的文本分类方法、装置及介质
CN116304061B (zh) * 2023-05-17 2023-07-21 中南大学 基于层次文本图结构学习的文本分类方法、装置及介质
CN117609902A (zh) * 2024-01-18 2024-02-27 知呱呱(天津)大数据技术有限公司 一种基于图文多模态双曲嵌入的专利ipc分类方法及系统
CN117609902B (zh) * 2024-01-18 2024-04-05 北京知呱呱科技有限公司 一种基于图文多模态双曲嵌入的专利ipc分类方法及系统

Also Published As

Publication number Publication date
CN111782768A (zh) 2020-10-16
CN111782768B (zh) 2021-04-27

Similar Documents

Publication Publication Date Title
WO2022001333A1 (zh) 基于双曲空间表示和标签文本互动的细粒度实体识别方法
Cho et al. Biomedical named entity recognition using deep neural networks with contextual information
US11631007B2 (en) Method and device for text-enhanced knowledge graph joint representation learning
CN108681557B (zh) 基于自扩充表示和相似双向约束的短文本主题发现方法及系统
Peng et al. Dynamic network embedding via incremental skip-gram with negative sampling
Lei et al. Deep learning application on code clone detection: A review of current knowledge
Zhan et al. Multi-task compositional network for visual relationship detection
Chen et al. A multi-channel deep neural network for relation extraction
CN114896388A (zh) 一种基于混合注意力的层级多标签文本分类方法
Yang et al. Co-embedding network nodes and hierarchical labels with taxonomy based generative adversarial networks
Zhang et al. Sentiment classification for Chinese text based on interactive multitask learning
CN107391565B (zh) 一种基于主题模型的跨语言层次分类体系匹配方法
Fu et al. Bag of meta-words: A novel method to represent document for the sentiment classification
Moyano Learning network representations
CN114564563A (zh) 一种基于关系分解的端到端实体关系联合抽取方法及系统
Roudsari et al. Comparison and analysis of embedding methods for patent documents
Babur Statistical analysis of large sets of models
CN115687609A (zh) 一种基于Prompt多模板融合的零样本关系抽取方法
Al-Tameemi et al. Multi-model fusion framework using deep learning for visual-textual sentiment classification
CN113239143A (zh) 融合电网故障案例库的输变电设备故障处理方法及系统
CN111259106A (zh) 一种结合神经网络和特征演算的关系抽取方法
CN113392929B (zh) 一种基于词嵌入与自编码器融合的生物序列特征提取方法
Xu et al. Cross-media retrieval based on pseudo-label learning and semantic consistency algorithm
Ma et al. Semantic-Aware Dual Contrastive Learning for Multi-label Image Classification
Jian et al. An improved memory networks based product model classification method

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21831837

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 04.05.2023)

122 Ep: pct application non-entry in european phase

Ref document number: 21831837

Country of ref document: EP

Kind code of ref document: A1