CN114881014A

CN114881014A - Entity alias relationship acquisition method, entity alias relationship training device and storage medium

Info

Publication number: CN114881014A
Application number: CN202210425656.2A
Authority: CN
Inventors: 王子奕; 刘嘉伟; 鞠剑勋; 李健
Original assignee: Shanghai Zhilv Information Technology Co ltd
Current assignee: Shanghai Zhilv Information Technology Co ltd
Priority date: 2022-04-21
Filing date: 2022-04-21
Publication date: 2022-08-09

Abstract

The embodiment of the disclosure provides an entity alias relationship acquisition method, a training method, a device and a storage medium, which are applied to an entity alias relationship model; inputting a text based on an embedded layer of the model to obtain a text feature vector sequence; obtaining a text semantic feature vector sequence based on the feature coding layer; obtaining a text prediction tag sequence based on the sequence labeling layer, and obtaining each entity mention fragment in the text; fusing a text semantic feature vector sequence and a text prediction tag sequence based on a tag fusion layer to obtain a text enhanced feature vector sequence; obtaining an entity alias relation probability matrix based on the multi-head selection layer; and obtaining an entity alias relationship acquisition result between the two entity mention fragments according to the positions of the words respectively corresponding to the rows and columns of the screened probability values in the entity mention fragments to which the words respectively belong. The method realizes the scheme of accurately and efficiently mining the alias relationship of the entity, and is beneficial to the construction of the high-quality knowledge graph.

Description

Entity alias relationship acquisition method, entity alias relationship training device and storage medium

Technical Field

The disclosure relates to the technical field of artificial intelligence, in particular to an entity alias relationship acquisition method, an entity alias relationship training method, an entity alias relationship acquisition device and a storage medium.

Background

Knowledge-maps are considered an important cornerstone leading from perceptual intelligence to cognitive intelligence. As a semantic network, the knowledge graph has extremely strong expression capability and modeling flexibility, can accurately describe entities existing in the real world and relationships among the entities, is easy for human understanding, and has the characteristic of being machine-friendly, so that the technology is widely applied to scenes such as information retrieval, intelligent question answering, search recommendation, electronic commerce and the like in recent years, and the defect that the traditional machine learning algorithm does not have reasoning and association capabilities is greatly overcome.

The construction of a high-quality knowledge graph is a major challenge in the current industry, more and more knowledge data are mined by various methods, and a problem to be solved is how to integrate new knowledge and old knowledge, reduce information overload and avoid entity redundancy. For example, "Guangzhou" and "Yangcheng" can both represent the province of Guangdong in China, but only literally differ greatly, and if they are not aligned to the same entity, the effects of some downstream tasks such as entity expansion, entity disambiguation will be affected.

Therefore, the realization of automatic mining of entity synonyms from a massive corpus is an indispensable task for constructing a high-quality knowledge graph.

Entity synonym mining generally involves determining how likely a given two entity mentions point to the same entity, with the primary goal of learning a metric function that can distinguish between synonyms and non-synonyms. In the past, most of characteristic engineering is put on the literal quantity of entity mentions, and the entity similarity is calculated by constructing character-level characteristics, but most of semantically similar entity mentions are rarely or even completely not overlapped on a character level, and meanwhile, the similar mentions of the literal characteristics can represent the same entity to a great extent, so that the judgment is difficult even if the specific context is separated.

One approach to solving the problem of entity synonym mining is to convert it into an entity relationship extraction task, because entity synonyms correspond to entity alias relationships, which can be considered as a special inter-entity relationship. However, how to find a scheme for extracting entity alias relationships accurately and efficiently still has shortcomings at present.

Inventing messages

In view of the above disadvantages of the related art, an object of the present disclosure is to provide an entity alias relationship acquisition method, a training method, an apparatus, and a storage medium to solve the problems in the related art.

The first aspect of the present disclosure provides an entity alias relationship obtaining method, which is applied to an entity alias relationship obtaining model, where the entity alias relationship obtaining model includes: the system comprises an embedding layer, a characteristic coding layer, a sequence labeling layer, a tag fusion layer and a multi-head selection layer; the method comprises the following steps: acquiring an input text and inputting the entity alias relationship acquisition model; processing the input text based on an embedding layer to generate a word level token sequence, a word level position index sequence and a text fragment index sequence, and processing the fusion of feature vectors respectively mapped by the word level token sequence, the word level position index sequence and the text fragment index sequence based on the embedding layer to obtain a text feature vector sequence; processing the text feature vector sequence based on a feature coding layer to obtain a text semantic feature vector sequence; processing the text semantic feature vector sequence based on a sequence labeling layer to obtain a text prediction tag sequence, and obtaining each entity mention segment in the text according to the text prediction tag sequence; fusing the text semantic feature vector sequence and the text prediction tag sequence based on a tag fusion layer to obtain a text enhanced feature vector sequence; processing the text enhanced feature vector sequence based on a multi-head selection layer to obtain an entity alias relation probability matrix; each probability value in the entity alias relationship probability matrix represents the probability that entity alias relationship exists between entity mention fragments to which every two words in the input text belong respectively; and screening probability values reaching a preset threshold value from the entity alias relationship probability matrix, and obtaining an entity alias relationship acquisition result between two entity mention fragments according to the positions of words respectively corresponding to the rows and columns of the screened probability values in the entity mention fragments to which the words respectively belong.

In an embodiment of the first aspect, the obtaining a text feature vector sequence based on fusion of sequence vectors respectively mapped by the word-level token sequence, the word-level position index sequence, and the text segment index sequence includes: respectively coding the word-level token sequence, the word-level position index sequence and the text segment index sequence to obtain each coding sequence; converting each of the encoded sequences into each of the sequence vectors; and summing and normalizing the sequence vectors to obtain the text feature vector sequence.

In an embodiment of the first aspect, the feature encoding layer comprises a multi-headed self-attention layer and a fully-connected layer; the processing the text feature vector sequence based on the feature coding layer to obtain a text semantic feature vector sequence comprises: processing the text feature vector sequence through a multi-head self-attention layer to obtain an intermediate feature vector sequence; and processing the intermediate characteristic vector sequence through a full connection layer to obtain the text semantic characteristic vector sequence.

In an embodiment of the first aspect, the processing the text semantic feature vector sequence based on a sequence labeling layer to obtain a text prediction tag sequence, and obtaining each entity mention fragment in the text according to the text prediction tag sequence includes: predicting the position of a word corresponding to each characteristic value in the text semantic characteristic vector sequence in the entity mentioned fragment to obtain a prediction label so as to form the text prediction label sequence; and obtaining each entity mention segment based on the entity mention boundary marked by the text prediction tag sequence.

In an embodiment of the first aspect, the tag fusion layer comprises a gated neural network layer; the fusing the text semantic feature vector sequence and the text prediction tag sequence based on the tag fusion layer to obtain a text enhanced feature vector sequence, which comprises the following steps: and fusing the text semantic feature vector sequence and the text prediction tag sequence according to the word level position through a gated neural network layer to obtain a text enhanced feature vector sequence.

In an embodiment of the first aspect, the obtaining an entity alias relationship between two entity mention fragments according to positions of words respectively corresponding to rows and columns where the screened probability values are located in the respective entity mention fragments includes: and determining that entity alias relationship exists between the two entity mention fragments in response to that the words respectively corresponding to the row and the column where the probability value is located are located at the same boundary position in the respectively belonging entity mention fragments.

In an embodiment of the first aspect, the method for obtaining an entity alias relationship further includes: generating a mask sequence corresponding to the input text; and inputting the text feature vector sequence acted by the mask sequence into an entity alias relationship acquisition model.

A second aspect of the present disclosure provides a method for training an entity alias relationship acquisition model, where the entity alias relationship acquisition model includes: the system comprises an embedding layer, a characteristic coding layer, a sequence labeling layer, a tag fusion layer and a multi-head selection layer; the training method comprises the following steps: acquiring a training sample set and inputting the entity alias relationship acquisition model, wherein each training sample text in the training sample set has a corresponding text real label sequence and an entity alias relationship label; generating a word level token sequence, a word level position index sequence and a text fragment index sequence based on each training sample text, and obtaining a text characteristic vector sequence based on the fusion of sequence vectors respectively mapped by the word level token sequence, the word level position index sequence and the text fragment index sequence; processing the text feature vector sequence based on a feature coding layer to obtain a text semantic feature vector sequence; processing the text semantic feature vector sequence based on a sequence labeling layer to obtain a text prediction tag sequence, and obtaining each entity mention segment in the text according to the text prediction tag sequence; calculating a first loss between the text prediction tag sequence and a text real tag sequence; fusing the text semantic feature vector sequence with a text prediction tag sequence or a text real tag sequence based on a tag fusion layer to obtain a text enhanced feature vector sequence; processing the text enhanced feature vector sequence based on a multi-head selection layer to obtain an entity alias relation probability matrix; each probability value in the entity alias relationship probability matrix represents the probability that entity alias relationship exists between entity mention fragments to which every two words in the text belong respectively; calculating a second loss based on each probability value in the entity alias relationship probability matrix and the entity alias relationship label; the entity alias relationship label is determined based on whether an entity alias relationship exists between two words corresponding to the row and the column corresponding to each probability value; and obtaining total loss based on the fusion of the first loss and the second loss, and updating the entity alias relationship acquisition model according to the total loss. In an embodiment of the second aspect, in a training round using a first part of training sample text, fusing the text semantic feature vector sequence and a text prediction tag sequence based on a tag fusion layer to obtain a text enhanced feature vector sequence; and in the training turns of the rest of the second part of training sample texts, fusing the text semantic feature vector sequence and the text real tag sequence based on the tag fusion layer to obtain a text enhanced feature vector sequence.

In an embodiment of the second aspect, the number of samples of training samples of the second section is higher than the number of samples of training samples of the first section.

In an embodiment of the second aspect, the obtaining a text feature vector sequence based on fusion of sequence vectors respectively mapped by the word-level token sequence, the word-level position index sequence, and the text segment index sequence includes: respectively coding the word-level token sequence, the word-level position index sequence and the text segment index sequence to obtain each coding sequence; converting each of the encoded sequences into each of the sequence vectors; and summing and normalizing the sequence vectors to obtain the text feature vector sequence.

In an embodiment of the second aspect, the feature encoding layer comprises a multi-headed self-attention layer and a fully-connected layer; the processing the text feature vector sequence based on the feature coding layer to obtain a text semantic feature vector sequence comprises: processing the text feature vector sequence through a multi-head self-attention layer to obtain an intermediate feature vector sequence; and processing the intermediate characteristic vector sequence through a full connection layer to obtain the text semantic characteristic vector sequence.

In an embodiment of the second aspect, the processing the text semantic feature vector sequence based on a sequence labeling layer to obtain a text prediction tag sequence, and obtaining each entity mention fragment in the text according to the text prediction tag sequence includes: predicting the position of a word corresponding to each characteristic value in the text semantic characteristic vector sequence in the entity mentioned fragment to obtain a prediction label so as to form the text prediction label sequence; and obtaining each entity mention segment based on the entity mention boundary marked by the text prediction tag sequence.

In an embodiment of the second aspect, the tag fusion layer comprises a gated neural network layer; the fusing the text semantic feature vector sequence and the text prediction tag sequence or the text real tag sequence based on the tag fusion layer to obtain a text enhanced feature vector sequence, which comprises the following steps: and fusing the text semantic feature vector sequence and the text prediction tag sequence according to the word level position through a gated neural network layer to obtain a text enhanced feature vector sequence.

In an embodiment of the second aspect, the positional correspondence refers to: the words respectively corresponding to the rows and columns in which the probability values are located at the same boundary position in the entity-referenced fragments to which they belong.

In an embodiment of the second aspect, the training method further includes: generating a mask sequence corresponding to the text; and inputting the text feature vector sequence acted by the mask sequence into an entity alias relationship acquisition model.

A third aspect of the present disclosure provides an entity alias relationship obtaining apparatus, which is applied to an entity alias relationship obtaining model, where the entity alias relationship obtaining model includes: the system comprises an embedding layer, a characteristic coding layer, a sequence labeling layer, a tag fusion layer and a multi-head selection layer; the device comprises: the input module is used for acquiring an input text and inputting the entity alias relationship acquisition model; the embedded layer is used for processing the input text to generate a word level token sequence, a word level position index sequence and a text fragment index sequence, and obtaining a text feature vector sequence based on the fusion of feature vectors respectively mapped by the word level token sequence, the word level position index sequence and the text fragment index sequence; the feature coding layer is used for processing the text feature vector sequence to obtain a text semantic feature vector sequence; the sequence labeling layer is used for processing the text semantic feature vector sequence to obtain a text prediction tag sequence and obtaining each entity mention fragment in the text according to the text prediction tag sequence; the label fusion layer is used for fusing the text semantic feature vector sequence and the text prediction label sequence to obtain a text enhancement feature vector sequence; the multi-head selection layer is used for processing the text enhanced feature vector sequence to obtain an entity alias relationship probability matrix; each probability value in the entity alias relationship probability matrix represents the probability that entity alias relationship exists between entity mention fragments to which every two words in the input text belong respectively; and screening a probability value reaching a preset threshold value from the entity alias relationship probability matrix, and obtaining an entity alias relationship acquisition result between two entity mention fragments according to the positions of words respectively corresponding to the rows and columns of the screened probability value in the entity mention fragments to which the words belong.

A fourth aspect of the present disclosure provides a training apparatus for an entity alias relationship obtaining model, where the entity alias relationship obtaining model includes: the system comprises an embedding layer, a characteristic coding layer, a sequence labeling layer, a tag fusion layer and a multi-head selection layer; the training apparatus includes: the input module is used for acquiring a training sample set and inputting the entity alias relationship acquisition model, and each training sample text in the training sample set has a corresponding text real label sequence and an entity alias relationship label; the embedded layer is used for generating a word level token sequence, a word level position index sequence and a text fragment index sequence based on each training sample text, and obtaining a text characteristic vector sequence based on fusion of sequence vectors respectively mapped by the word level token sequence, the word level position index sequence and the text fragment index sequence; the feature coding layer is used for processing the text feature vector sequence to obtain a text semantic feature vector sequence; the sequence labeling layer is used for processing the text semantic feature vector sequence to obtain a text prediction tag sequence and obtaining each entity mention fragment in the text according to the text prediction tag sequence; the loss calculation module is used for calculating a first loss between the text prediction tag sequence and the text real tag sequence; the label fusion layer is used for fusing the text semantic feature vector sequence with a text prediction label sequence or a text real label sequence to obtain a text enhanced feature vector sequence; the multi-head selection layer is used for processing the text enhanced feature vector sequence to obtain an entity alias relationship probability matrix; each probability value in the entity alias relationship probability matrix represents the probability that entity alias relationship exists between entity mention fragments to which every two words in the text belong respectively; the loss calculation module is configured to calculate a second loss based on each probability value in the entity alias relationship probability matrix and the entity alias relationship tag; the entity alias relationship label is determined based on whether an entity alias relationship exists between two words corresponding to the row and the column corresponding to each probability value; and the loss calculation module is used for obtaining total loss based on the fusion of the first loss and the second loss and updating the entity alias relationship acquisition model according to the total loss. A fifth aspect of the present disclosure provides a computer apparatus comprising: a communicator, a memory, and a processor; the communicator is used for communicating with the outside; the memory stores program instructions; the processor is configured to execute the program instructions to perform the entity alias relationship acquisition method according to any one of the first aspect; alternatively, the training method of any of the second aspects is performed.

A sixth aspect of the present disclosure provides a computer-readable storage medium storing program instructions that are executed to perform the entity alias relationship acquisition method according to any one of the first aspects; alternatively, the training method of any of the second aspects is performed.

As described above, the embodiments of the present disclosure provide an entity alias relationship obtaining method, a training method, an apparatus, and a storage medium, which are applied to an entity alias relationship model; inputting a text based on an embedded layer of the model to obtain a text feature vector sequence; obtaining a text semantic feature vector sequence based on the feature coding layer; processing based on a sequence labeling layer to obtain a text prediction tag sequence and obtain each entity mention fragment in the text; fusing the text semantic feature vector sequence and the text prediction tag sequence based on a tag fusion layer to obtain a text enhanced feature vector sequence; processing the text enhanced feature vector sequence based on a multi-head selection layer to obtain an entity alias relation probability matrix; and obtaining an entity alias relationship acquisition result between the two entity mention fragments according to the positions of the words respectively corresponding to the rows and columns of the screened probability values in the entity mention fragments to which the words respectively belong. The method realizes the scheme of accurately and efficiently mining the alias relationship of the entity, and is beneficial to the construction of the high-quality knowledge graph.

Drawings

Fig. 1 shows a flowchart of a training method of an entity alias relationship obtaining model in an embodiment of the disclosure.

Fig. 2 is a schematic diagram of an architecture of an entity alias relationship obtaining model according to an embodiment of the disclosure.

Fig. 3 shows a schematic diagram of an entity alias relationship probability matrix in an example of the present disclosure.

Fig. 4 shows a schematic diagram of a training principle of an entity alias relationship obtaining model in an embodiment of the present disclosure under a specific implementation architecture.

Fig. 5 shows a flowchart of an entity alias relationship obtaining method in an embodiment of the disclosure.

Fig. 6 is a block diagram of an entity alias relationship obtaining apparatus according to an embodiment of the disclosure.

Fig. 7 is a block diagram of a training apparatus for an entity alias relationship obtaining model according to an embodiment of the disclosure.

Fig. 8 shows a schematic structural diagram of a computer device according to an embodiment of the disclosure.

Detailed Description

Embodiments of the present disclosure are described below with reference to specific examples, and other advantages and effects of the present disclosure will be readily apparent to those skilled in the art from the disclosure. The disclosure may be embodied or carried out in various other specific embodiments and with various modifications or alterations from various aspects and applications of the disclosure without departing from the spirit of the disclosure. It is to be noted that the embodiments and features of the embodiments in the present disclosure may be combined with each other without conflict.

Embodiments of the present disclosure are described in detail below with reference to the accompanying drawings so that those skilled in the art to which the present disclosure pertains can easily carry out the embodiments. The present disclosure may be embodied in many different forms and is not limited to the embodiments described herein.

Reference in the representation of the present disclosure to the terms "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present disclosure. Furthermore, the particular features, structures, materials, or characteristics shown may be combined in any suitable manner in any one or more embodiments or examples. Moreover, various embodiments or examples and features of different embodiments or examples presented in this disclosure can be combined and combined by one skilled in the art without contradiction.

Furthermore, the terms "first", "second" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In the expressions of the present disclosure, "plurality" means two or more unless specifically defined otherwise.

In order to clearly explain the present disclosure, components that are not related to the description are omitted, and the same reference numerals are given to the same or similar components throughout the specification.

Throughout the specification, when a device is referred to as being "connected" to another device, this includes not only the case of being "directly connected" but also the case of being "indirectly connected" with another element interposed therebetween. In addition, when a device "includes" a certain component, unless otherwise stated, the device does not exclude other components, but may include other components.

Although the terms first, second, etc. may be used herein to describe various elements in some instances, these elements should not be limited by these terms. These terms are only used to distinguish one element from another. For example, the first interface and the second interface are represented. Also, as used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context indicates otherwise. It will be further understood that the terms "comprises," "comprising," "includes" and/or "including," when used in this specification, specify the presence of stated features, steps, operations, elements, modules, items, species, and/or groups, but do not preclude the presence, or addition of one or more other features, steps, operations, elements, modules, items, species, and/or groups thereof. The terms "or" and/or "as used herein are to be construed as inclusive or meaning any one or any combination. Thus, "A, B or C" or "A, B and/or C" means "any of the following: a; b; c; a and B; a and C; b and C; A. b and C ". An exception to this definition will occur only when a combination of elements, functions, steps or operations performed are inherently mutually exclusive in some manner.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. As used herein, the singular forms "a", "an" and "the" include plural forms as long as the words do not expressly indicate a contrary meaning. The use of "including" in the specification is meant to specify the presence of stated features, regions, integers, steps, elements, and/or components, but does not preclude the presence or addition of other features, regions, integers, steps, elements, components, and/or groups thereof.

The use of spatially relative terms such as "lower," "upper," and the like may be used to more readily describe one device's relationship to another device as illustrated in the figures. Such terms are intended to have not only the meaning indicated in the drawings, but also other meanings or executions of the device in use. For example, if the device in the figures is turned over, elements described as "below" other elements would then be oriented "above" the other elements. Thus, the exemplary terms "under" and "beneath" all include above and below. The device may be rotated 90 or other angles and the terminology representing relative space is to be interpreted accordingly.

Although not defined differently, including technical and scientific terms used herein, all have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure belongs. Terms defined in commonly used dictionaries are to be additionally interpreted as having meanings consistent with those of related art documents and currently prompted messages, and should not be excessively interpreted as having ideal or very formulaic meanings unless defined.

In an actual scenario, in order to construct a high-quality knowledge graph, there is a need to mine entity relationships from a large amount of unstructured corpora. Moreover, since there may be various entity synonyms for an entity, such as "shanghai", "magic city", "shanghai", etc., accurate mining of entity alias relationships among these entity synonyms has a crucial impact on building a high-quality knowledge graph.

In view of this, in the embodiments of the present disclosure, a machine learning technique is used to establish and train an entity alias relationship obtaining model, so as to accurately mine entity mentions having entity alias relationships from corpora through the entity alias relationship obtaining model.

Fig. 1 shows a schematic flow chart of a training method of an entity alias relationship obtaining model in the embodiment of the present disclosure. Referring also to FIG. 2, in some embodiments, the entity alias relationship acquisition model 200 may include: an embedding layer 201, a feature encoding layer 202, a sequence annotation layer 203, a tag fusion layer 204, and a multi-head selection layer 205. The input text is processed by the above layers in turn, and corresponding loss functions are constructed to propagate loss in the opposite direction to update the model parameters.

In fig. 1, the process includes:

step S101: and acquiring a training sample set and inputting the entity alias relationship acquisition model, wherein each training sample text in the training sample set has a corresponding text real label sequence and an entity alias relationship label.

In some embodiments, the source of each text in the training sample set may include, but is not limited to, a known library, may be an online encyclopedia or website, or the like. Alternatively, if the entity alias relationship obtaining model is required to be dedicated to the identification of the entity alias relationship in the relevant text of a certain field, the relevant text of the field can be collected. Such as consulting websites, encyclopedias, etc., related to the travel industry.

Step S102: generating a word level token sequence, a word level position index sequence and a text fragment index sequence based on each training sample text, and obtaining a text characteristic vector sequence based on the fusion of sequence vectors respectively mapped by the word level token sequence, the word level position index sequence and the text fragment index sequence.

In some embodiments, Word Piece Tokenization, such as the BERT model, may be used to generate Word-level token sequences, Word-level position index sequences, and text snippet index sequences from text.

In some embodiments, therefore, from the text "west lake generally refers to west lake," a word-level token sequence { [ CLS ], "west", "child", "lake", "one", "generally", "finger", "west", "lake", "SEP ] }, the word-level position index sequence may be a sequence of {0,1.. 9} corresponding to 10 tokens. The text Segment index sequence refers to the mark of the Segment (Segment) to which each word belongs, each Segment can be, for example, a sentence, and the like, and the segments can be separated by [ SEP ]. For example, "West lake generally refers to West lake" as a segment, labeled, for example, A, and the corresponding text segment index sequence may be represented, for example, { A, A, A, A, A, A, A }.

In some embodiments, a corresponding MASK (MASK) sequence may be generated based on each text, and the text feature vector sequence acted by the MASK sequence may be input to the entity alias relationship acquisition model. For example, the word-level token sequence is masked by the MASK sequence to obtain the word-level token sequence of { [ CLS ], "West", "son", "lake", "one", "like", "finger", "West", "lake", "SEP ] } after the action of { [ CLS ]," West ", MASK", "lake", "one", "like", "finger", "son-West", "lake", "SEP ] }.

For example, the word-level token sequence, the word-level position index sequence, and the text segment index sequence may be encoded (e.g., unique hot encoding, etc.) to obtain encoding sequences, each encoding sequence may be converted into a sequence vector, and each sequence vector may be summed and normalized to obtain the text feature vector sequence.

The above possible implementation mode can be realized by a model Embedding layer, which can be realized by adopting an Embedding (Embedding) mode of BERT exemplarily to synthesize three parts of information of words, positions and fragments. Taking word-level token sequence as an example, the word-level token sequence of the input text can be converted into a corresponding index id sequence in token word table V through numerical processing, and a coded sequence is obtained through unique hot coding

Embedding matrices using a word

Encoding the sequence

Conversion to h-dimensional dense vectors, i.e. sequence vectors:

similarly, the word level position index sequence and the text segment index sequence respectively adopt similar operations to obtain position information coding, namely sequence vector

Coding with slice information, i.e. sequence vectors

And then, the three vectors are added and fused and subjected to layer normalization processing (LayerNorm) to obtain the following text feature vector sequence:

step S103: and processing the text feature vector sequence based on a feature coding layer to obtain a text semantic feature vector sequence.

In some embodiments, the feature encoding layer includes a multi-headed self-attention layer and a fully-connected layer. In step S103, in the feature coding layer, processing the text feature vector sequence through the multi-head self-attention layer to obtain an intermediate feature vector sequence; and processing the intermediate characteristic vector sequence through a full connection layer to obtain the text semantic characteristic vector sequence.

The above principles are specifically illustrated by way of examples. The feature coding layer may use, for example, a transform Encoder (transform Encoder) structure, whose basic unit includes a multi-headed self-attention layer and a full-link layer. (ii) a The feature encoding layer includes a Multi-headed Self-attention (Multi-headed Self-attention) layer and a full-link layer. The multi-head self-attention means that Q, K and V of the token obtain multiple self-attention to obtain multiple different outputs, and then the different outputs are connected to obtain the final output of the token.

Since the text feature vector sequence is formed by the feature vector of each token, the text feature vector sequence is actually expressed as a sequence feature matrix H ⁽⁰⁾ ＝[e ₁ ,e ₂ ,…,e _T ]And T is the number of tokens.

Assuming that a total of L encoderblocks are used, for L1, 2, …, L, there are the following processes:

1) in the multi-head self-attention layer, N attention heads are respectively used for extracting different grammars orContext characteristics of semantic level, dimension of each head is set as d ═ h/N, then weight of Query, Key, Value and proj are respectively

In addition, residual concatenation is added to the sequence features aggregated by attention mechanism to control the flow of lower layer information up.

2) At the full link layer, the parameter matrix includes

Here, the feature vector is mapped to a high-dimensional space, activated by a gaussian error linear unit (gelu), and then projected to an original low-dimensional space, and a residual connection is also added, and the process is represented as follows:

s104: and processing the text semantic feature vector sequence based on a sequence labeling layer to obtain a text prediction tag sequence, and obtaining each entity mention segment in the text according to the text prediction tag sequence.

In some embodiments, step S104 may specifically include: predicting the position of a word corresponding to each characteristic value in the text semantic characteristic vector sequence in the entity mentioned fragment to obtain a prediction label so as to form the text prediction label sequence; and obtaining each entity mention segment based on the entity mention boundary marked by the text prediction label sequence.

An example is taken for illustration. For example, the text prediction tag sequence with length T of L Encoder coded outputs arranged in the feature coding layer is

Illustratively, the sequence annotation layer may be implemented using a Softmax classification layer to predict the prediction tags (i.e., probabilities) that each token belongs to a position in the entity mention fragment for identifying the boundaries of the entity mention fragment in the sequence. Specifically, by adopting a BIO labeling system, assuming that a token label set is S and a projection matrix is S

Biasing

The label probability distribution of token at position t is

By way of example, the notation B-X indicates that token is an X entity header; notation I-X indicates token is within the X entity; o denotes token is outside the entity.

Thus, a label for each corresponding position of "west lake" in the text real tag sequence can be expressed as "west" corresponding to "B", "child" corresponding to "I", "lake" corresponding to "I", "one" corresponding to "O", "generally" corresponding to "O", "mean" corresponding to "O", "west" corresponding to "B", "lake" corresponding to "I", i.e. biioobi. Closure according to two B and their joining I gives two entity-referred fragments "west lake" and "west lake".

Ideally, if the text prediction tag sequence is the same as the text real tag sequence, the prediction is accurate, which is the training target. Therefore, the sequence annotation loss between the text predicted tag sequence and the text true tag sequence is calculated, and the model parameters are updated with the loss minimization as the target.

S105: and calculating a first loss between the text prediction label sequence and the text real label sequence, namely a sequence annotation loss.

In a possible example, said first loss L _ner The cross entropy function is adopted for the calculation of (1):

wherein the content of the first and second substances,

i.e. the value of the t-th position in the text real label sequence,

i.e. the value of the t-th position in the sequence of text prediction tags.

S106: and fusing the text semantic feature vector sequence with a text prediction tag sequence or a text real tag sequence based on a tag fusion layer to obtain a text enhanced feature vector sequence.

In some embodiments, the tag fusion layer comprises a gated neural network layer, which may be similar to the principle of gated cycle units (GRUs). Step S106 may specifically include: the fusing the text semantic feature vector sequence and the text prediction tag sequence based on the tag fusion layer to obtain a text enhanced feature vector sequence, which comprises the following steps: and fusing the text semantic feature vector sequence and the text prediction tag sequence according to the word level position through a gated neural network layer to obtain a text enhanced feature vector sequence.

The principle is illustrated by taking an example. The text real/prediction tag sequence represents the position of each token in the entity fragment, has a certain promotion effect on determining the entity relationship, and particularly for the table filling type relationship extraction task, the introduction of word-level entity tag information is helpful for reducing noise, so that the entity relationship acquisition model can pay more attention to the interaction information between the entity mention fragments.

In the training stage, for example, a text semantic feature vector sequence is fused with a text real tag sequence, where the text real tag sequence is y ═ y [ ₁ ,y ₂ ,…,y _T ]Wherein

Which may illustratively be one-hot vectors, are embedded into the matrix via the tags

Processing to obtain:

v _t ＝W ^label y _t

label information is fused through a gated neural network layer similar to GRU, the principle is as follows:

wherein, the middle bracket represents elements at two sides of the splicing part number,

a network parameter, an indicates a Hadamard product operation. It can be seen that h _t Fuse with y _t V of correlation _t 、g _t And the feature coding layer outputs

The information of (1). The sequence of the text enhanced feature vector after the label information is fused is H ═ H ₁ ,h ₂ ,…,h _T ]T is 1 to T; the H sequence can be represented by a vector H ₁ ～h _T Stacked to form a matrix.

Similarly, if the text semantic feature vector and the text prediction tag sequence are fused, the text prediction tag sequence can be used

Replacing y in the above description _t . In addition, when the model is actually applied, the predicted text is also used for predicting the label sequence

To obtain a sequence of text enhanced feature vectors.

In some embodiments, when using each training sample of the training set, the text semantic feature vector sequence and the text prediction tag sequence may be fused based on a tag fusion layer in a training round using a first part of the training sample text to obtain a text enhanced feature vector sequence; and in the training turns of the rest of the second part of training sample texts, fusing the text semantic feature vector sequence and the text real tag sequence based on the tag fusion layer to obtain a text enhanced feature vector sequence. Optionally, the number of training samples of the second section is higher than the number of training samples of the first section. For example, in a training round of 25% of training sample text

Fusing to obtain text enhanced feature vector sequence, and training the restPractice times of use y _t And fusing to obtain a text enhanced feature vector sequence.

S107: and processing the text enhanced feature vector sequence based on a multi-head selection layer to obtain an entity alias relationship probability matrix.

Considering that the same subject in the text may be associated with a plurality of objects, a multi-head selection mechanism is introduced to solve the problem of multiple relation extraction.

And each probability value in the entity alias relationship probability matrix represents the probability of entity alias relationship existing between entity mention fragments to which every two words in the text belong respectively.

Referring to FIG. 3, a diagram of an entity alias relationship probability matrix in an example is shown. Taking "the west lake generally refers to the west lake" as an example, the rows and columns respectively correspond to the index id of each word in the "west lake generally refers to the west lake", and the probability of the intersection position of the rows and columns represents the probability that the entity alias relationship exists between the mentioned fragments of the corresponding two words belonging to the entity. Because of the relationship of synonyms of entities, the [ CLS ] and [ SEP ] are not considered here, and are omitted. The row index id can be represented by i (0-7), the column index id can be represented by j (0-7), i represents the subject, and j represents the object.

In some embodiments, the probability values in the entity alias relationship probability matrix may be calculated based on an alias relationship score, which is calculated as follows

s(x _i ,x _j )＝w ^T tanh(W _s h _i +W _o h _j +b)

Wherein

Further, the score is converted into a probability value using a sigmoid activation function:

p _i,j ＝σ(s(x _i ,x _j ))

illustratively, since the token of the entity boundary position is determined at the sequence annotation level, the entity alias relationship can be set to exist only between the same boundary position in the entity mention fragment, e.g., between the head tokens, i.e., as inIf the token at positions i and j is the head of two entity mention fragments, and the two entity mention fragments have an alias relationship, then r _i,j ＝r _j,i Otherwise, it is 0.

For example, referring to fig. 3, an entity alias relationship exists between "west lake" and "west lake", the "west lake" is a subject, the "west lake" is an object, the "west" in the "west lake" with the row i being the 0 position and the "west" in the "west lake" with the column j being the 6 position belong to the "west lake" and the head position of the "west lake", the probability values of the positions i being the 0 and j being the 6 positions should ideally be 1, that is, the entity alias relationship label r is an entity alias relationship label r _ij The value of the label r at this position ₀₆ Otherwise, it is 0. If the subject and the object are not distinguished, r is the same ₆₀ ＝r ₀₆ ＝1。

S108: and calculating a second loss based on each probability value in the entity alias relationship probability matrix and the entity alias relationship label.

In some embodiments, the second loss, the multi-head selection loss L _rel Cross entropy calculations may also be used. For example, as shown in the following formula:

s109: and obtaining total loss based on the fusion of the first loss and the second loss, and updating the entity alias relationship acquisition model according to the total loss.

Illustratively, the overall loss may be represented by the following formula:

L＝L _ner +αL _rel

where α >0 is a factor that balances the two-part loss, e.g., a 1 indicates that the two-part loss is of equal importance.

Further, the back propagation is performed according to the total loss to update the entity alias relationship to obtain the parameters of the model, and preferably, an algorithm for accelerating the gradient descent efficiency, such as an AdamW optimization algorithm, is adopted, and the model parameters are continuously updated iteratively until convergence.

To more intuitively illustrate the principles of the entity alias relationship acquisition model, reference may be made to the following FIG. 4 embodiment.

Fig. 4 is a schematic diagram illustrating a training principle of an entity alias relationship obtaining model in an embodiment of the present disclosure under a specific implementation architecture.

As illustrated in fig. 4, { [ CLS ], "west", "child", "lake", "one", "generally", "finger", "west", "lake", "SEP ] } is processed by the embedding layer of the model and then input to the feature coding layer, which may be implemented based on BERT, for example, and outputs a text semantic feature vector sequence. The sequence labeling layer carries out BIO labeling on token of the west, the son, the lake, the first, the same, the finger, the west and the lake, and predicts that the west (B: 0.8, I: 0.1, O: 0.1, with high probability of B), the son (B: 0.1, I: 0.7, O: 0.2, with high probability of I), the lake (B: 0.1, I: 0.8, O: 0.1, with high probability of I), the one (B: 0.1, I: 0.1, O: 0.8, with high probability of O), the normal (B: 0.1, I: 0.1, O: 0.8, with high probability of O), the finger (B: 0.2, I: 0.1, O: 0.7, with high probability of O), the west (B: 0.9, I: 0.1, O: 0.0, with high probability of O), the west (B: 0.7, with high probability of O), a textual predictive tag sequence can be derived, while a textual real tag sequence has a probability of "biioobi" of 1.

Calculating a first fractional loss L by a cross entropy function _ner ：

In addition, the text prediction label sequence and/or the text real label sequence and the text semantic feature vector sequence are input into a label fusion layer for fusion, and the fused text enhanced feature vector sequence is input into a multi-head selection layer and a main body (such as a subject entity, h) _i ) And an object (e.g. an object of an object, h) _j ) Calculating s (x) _i ,x _j ) And obtaining p by sigmoid _i,j And forming an entity alias relation probability matrix.

According to the 'west lake' and the 'west lake' having entity alias relationship, the probability value p of the head token corresponding to the row-column intersection position _i,j Corresponding toTag value r _i,j 、r _j,i Is 1, i.e. r _i,j ＝r _j,i Calculating a second loss L using cross entropy as 1 _rel ：

And integrating the first loss and the second loss to obtain the total loss:

L＝L _ner +αL _rel

thereby updating the model.

Further, a possible implementation manner of the training method in the practical application example is provided, but not limited thereto.

Illustratively, the flow may be illustrated as:

(1) collecting corpora from a consulting website or encyclopedia related to the travel industry, setting the maximum sequence length which can be processed by the appointed model to be 128, segmenting an original text, manually screening partial effective samples for marking, and constructing a training and verification data set;

(2) for an input text, generating a word-level token sequence, a posiotion sequence, a segmentid sequence and a mask sequence by using a BERTWARD word segmenter;

(3) inputting the token sequence, the position sequence and the segment sequence into a model embedding layer to obtain a sequence embedding matrix, wherein the vector dimension h is 768;

(4) the feature coding layer is a 12-layer transformer encoder, wherein the multi-head self-attention mechanism extracts features by using, for example, N-12 heads, receives an embedding vector sequence of the previous step, and outputs a word sequence representation fused with context semantics, namely a text semantic feature vector sequence;

(5) transmitting the result of the previous step into a sequence labeling layer, obtaining the label distribution of the sequence token, and calculating the cross entropy loss L between the predicted label distribution and the real label distribution _ner ；

(6) Randomly selecting 25% of training samples to use the previous step to predict label distribution, embedding other samples into a label hidden space with dimension m being 128 by using real label distribution, and then transmitting the label hidden space and an output result of a feature coding layer to a label fusion layer to obtain a sequence representation enhanced by an entity label, namely a text enhanced feature vector sequence;

(7) inputting the sequence representation obtained in the previous step into a multi-head selection layer, wherein a hidden layer dimension is e.g. k is 100, and calculating a probability value p of an entity alias relationship between each pair of tokens _i,j ；

(8) Calculating the Multi-head selection loss L _rel And (5) sequence annotation loss L _ner Adding the weight factors alpha to 1 to obtain the total loss;

(9) and minimizing the overall loss by adopting an AdamW optimization algorithm, and continuously iteratively updating the model parameters until convergence.

Fig. 5 is a schematic flow chart illustrating an entity alias relationship obtaining method according to an embodiment of the present disclosure. For example, the entity alias relationship acquisition method may be acquired by using an entity alias relationship acquisition model trained by a previous training method. Details of the model in the embodiment of fig. 5 may be found in previous embodiments of the training method.

In the example of fig. 5, the entity alias relationship obtaining method includes:

step S501: and acquiring an input text and inputting the entity alias relationship acquisition model.

Illustratively, "West lake generally refers to West lake" is input to the model.

Step S502: and processing the input text based on the embedding layer to generate a word level token sequence, a word level position index sequence and a text fragment index sequence, and processing the fusion of the feature vectors respectively mapped by the word level token sequence, the word level position index sequence and the text fragment index sequence based on the embedding layer to obtain a text feature vector sequence.

Illustratively, the text feature vector sequence is as before e _t As shown.

Step S503: and processing the text feature vector sequence based on a feature coding layer to obtain a text semantic feature vector sequence.

Illustratively, the text semantic feature vector sequence is as before H ^(l) As shown.

Step S504: and processing the text semantic feature vector sequence based on a sequence labeling layer to obtain a text prediction label sequence, and obtaining each entity mention fragment in the text according to the text prediction label sequence.

Illustratively, text prediction tag sequences such as

As shown, the entities refer to fragments such as "west lake" and "west lake".

Step S505: and fusing the text semantic feature vector sequence and the text prediction tag sequence based on the tag fusion layer to obtain a text enhanced feature vector sequence.

Illustratively, a sequence of text enhanced feature vectors such as H ═ H ₁ ,h ₂ ,…,h _T ]As shown.

Step S506: processing the text enhanced feature vector sequence based on a multi-head selection layer to obtain an entity alias relation probability matrix; and each probability value in the entity alias relationship probability matrix represents the probability of entity alias relationship existing between entity mention fragments to which every two words in the input text respectively belong.

Illustratively, the probability values in the entity alias relationship probability matrix, such as p _i,j As shown.

Step S507: and screening probability values reaching a preset threshold value from the entity alias relationship probability matrix, and obtaining an entity alias relationship acquisition result between two entity mention fragments according to the positions of words respectively corresponding to the rows and columns of the screened probability values in the entity mention fragments to which the words respectively belong.

For example, the preset threshold may be, for example, 0.5, and a probability value of more than 0.5 is taken out, for example, the probability value of "west" in "west lake" and "west" in "west lake" is 0.6, so that it can be determined that the entity mention segment "west lake" and "west lake" have an entity alias relationship, i.e., belong to an entity synonym.

In some embodiments, the obtaining a text feature vector sequence based on fusion of sequence vectors respectively mapped by the word-level token sequence, the word-level position index sequence, and the text segment index sequence includes: respectively coding the word-level token sequence, the word-level position index sequence and the text segment index sequence to obtain each coding sequence; converting each of the encoded sequences into each of the sequence vectors; and summing and normalizing the sequence vectors to obtain the text feature vector sequence.

In some embodiments, the feature encoding layer comprises a multi-headed self-attention layer and a fully-connected layer; the processing the text feature vector sequence based on the feature coding layer to obtain a text semantic feature vector sequence comprises: processing the text feature vector sequence through a multi-head self-attention layer to obtain an intermediate feature vector sequence; and processing the intermediate characteristic vector sequence through a full connection layer to obtain the text semantic characteristic vector sequence.

In some embodiments, the processing the text semantic feature vector sequence based on a sequence labeling layer to obtain a text prediction tag sequence, and obtaining each entity mention fragment in the text according to the text prediction tag sequence includes: predicting the position of a word corresponding to each characteristic value in the text semantic characteristic vector sequence in the entity mentioned fragment to obtain a prediction label so as to form the text prediction label sequence; and obtaining each entity mention segment based on the entity mention boundary marked by the text prediction tag sequence.

In some embodiments, the tag fusion layer comprises a gated neural network layer; the fusing the text semantic feature vector sequence and the text prediction tag sequence based on the tag fusion layer to obtain a text enhanced feature vector sequence, which comprises the following steps: and fusing the text semantic feature vector sequence and the text prediction tag sequence according to word level positions through a gated neural network layer to obtain a text enhanced feature vector sequence.

In some embodiments, the obtaining an entity alias relationship between two entity mention fragments according to positions of words respectively corresponding to rows and columns where the screened probability values are located in the entity mention fragments to which the words respectively belong includes: and determining that entity alias relationship exists between the two entity mention fragments in response to that the words respectively corresponding to the row and the column where the probability value is located are located at the same boundary position in the respectively belonging entity mention fragments.

In some embodiments, the entity alias relationship obtaining method further includes: generating a mask sequence corresponding to the input text; and inputting the text feature vector sequence acted by the mask sequence into an entity alias relationship acquisition model.

The following process of the entity relationship obtaining method is described by a specific application example, and the specific steps are as follows:

(1) the input text is normalized, is converted into a half and is converted into a simple text, and the overlong sentence part is cut off;

(2) the text is digitized to obtain the id sequences and the mask sequences of the word-level token, position and segment, and the id sequences and the mask sequences are input into an entity alias relationship acquisition model for forward propagation calculation;

(3) token label distribution for decoded sequence annotation layer prediction

Obtaining each entity mention fragment;

(4) in the inference phase, token label distribution of the previous step prediction is directly distributed by using a label embedding matrix

Performing projection, inputting the result of the same characteristic coding layer into a tag fusion layer, and finally obtaining an entity alias relationship probability matrix through calculation of a multi-head selection layer;

fig. 6 is a schematic block diagram of an entity alias relationship obtaining apparatus according to an embodiment of the present disclosure. Since the specific implementation of the entity alias relationship obtaining apparatus may refer to the previous embodiment of the entity alias relationship obtaining method, the technical details are not repeated here.

In fig. 6, the entity alias relationship obtaining apparatus 600 is applied to an entity alias relationship obtaining model 601, where the entity alias relationship obtaining model 601 includes: an embedding layer 611, a feature encoding layer 612, a sequence annotation layer 613, a tag fusion layer 614, and a multi-head selection layer 615.

The entity alias relationship obtaining apparatus 600 includes:

an input module 602, configured to obtain an input text and input the entity alias relationship obtaining model;

the embedding layer 611 is configured to process the input text to generate a word-level token sequence, a word-level position index sequence, and a text fragment index sequence, and obtain a text feature vector sequence based on fusion of feature vectors respectively mapped by the word-level token sequence, the word-level position index sequence, and the text fragment index sequence;

the feature coding layer 612 is configured to process the text feature vector sequence to obtain a text semantic feature vector sequence;

the sequence labeling layer 613 is configured to process the text semantic feature vector sequence to obtain a text prediction tag sequence, and obtain each entity mention segment in the text according to the text prediction tag sequence;

the label fusion layer 614 is configured to fuse the text semantic feature vector sequence and the text prediction label sequence to obtain a text enhanced feature vector sequence;

the multi-head selection layer 615 is configured to process the text enhanced feature vector sequence to obtain an entity alias relationship probability matrix; each probability value in the entity alias relationship probability matrix represents the probability that entity alias relationship exists between entity mention fragments to which every two words in the input text belong respectively; and screening probability values reaching a preset threshold value from the entity alias relationship probability matrix, and obtaining an entity alias relationship acquisition result between two entity mention fragments according to the positions of words respectively corresponding to the rows and columns of the screened probability values in the entity mention fragments to which the words respectively belong.

Fig. 7 is a block diagram of a training apparatus for an entity alias relationship obtaining model according to an embodiment of the disclosure. Since the specific implementation of the training apparatus 700 for the entity alias relationship obtaining model may refer to the previous embodiment of the training method for the entity alias relationship obtaining model, the technical details are not repeated here.

The entity alias relationship acquisition model 701 includes: an embedding layer 711, a feature encoding layer 712, a sequence annotation layer 713, a tag fusion layer 714, and a multi-head selection layer 716.

The training apparatus 700 includes:

an input module 702, configured to obtain a training sample set and input the entity alias relationship obtaining model, where each training sample text in the training sample set has a corresponding text real label sequence and an entity alias relationship label;

the embedding layer 711 is configured to generate a word-level token sequence, a word-level position index sequence, and a text fragment index sequence based on each training sample text, and obtain a text feature vector sequence based on fusion of sequence vectors respectively mapped by the word-level token sequence, the word-level position index sequence, and the text fragment index sequence;

the feature coding layer 712 is configured to process the text feature vector sequence to obtain a text semantic feature vector sequence;

the sequence labeling layer 713 is configured to process the text semantic feature vector sequence to obtain a text prediction tag sequence, and obtain each entity mention segment in the text according to the text prediction tag sequence;

a loss calculating module 703, configured to calculate a first loss between the text prediction tag sequence and the text real tag sequence;

the label fusion layer 714 is used for fusing the text semantic feature vector sequence with a text prediction label sequence or a text real label sequence to obtain a text enhanced feature vector sequence;

the multi-head selection layer 715 is configured to process the text enhanced feature vector sequence to obtain an entity alias relationship probability matrix; each probability value in the entity alias relationship probability matrix represents the probability that entity alias relationship exists between entity mention fragments to which every two words in the text belong respectively;

the loss calculating module 703 is configured to calculate a second loss based on each probability value in the entity alias relationship probability matrix and the entity alias relationship tag; the entity alias relationship label is determined based on whether an entity alias relationship exists between two words corresponding to the row and the column corresponding to each probability value;

the loss calculating module 703 is configured to obtain an overall loss based on the fusion of the first loss and the second loss, and update the entity alias relationship obtaining model according to the overall loss.

It should be noted that, all or part of the functional modules in the embodiments of fig. 6 and 7 may be implemented by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of program instruction products. The program instruction product includes one or more program instructions. The processes or functions according to the present disclosure are produced in whole or in part when program instruction instructions are loaded and executed on a computer. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The program instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium.

In addition, the apparatuses disclosed in the embodiments of fig. 6 and fig. 7 can be implemented by other module division methods. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the modules described is merely a logical division, and in actual implementation, there may be other divisions, for example, multiple modules or modules may be combined or may be dynamic to another system, or some features may be omitted, or not implemented. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be through some interfaces, indirect coupling or communication connection of devices or modules, and may be in an electrical or other form.

In addition, each functional module and sub-module in the embodiments of fig. 6 and 7 may be dynamically in one processing unit, or each module may exist alone physically, or two or more modules may be dynamically in one unit. The dynamic component can be realized in a form of hardware or a form of a software functional module. The dynamic components described above, if implemented in the form of software functional modules and executed as separate products for sale or use, may also be stored in a computer readable storage medium. The storage medium may be a read-only memory, a magnetic or optical disk, or the like.

It should be noted that the flowchart or method representations of the flowchart representations of the above-described embodiments of the present disclosure may be understood as representing modules, segments or portions of code which include one or more executable instructions for implementing specific logical functions or steps of the process. And the scope of the preferred embodiments of the present disclosure includes additional implementations in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved.

For example, the order of the steps in the embodiments of fig. 1 and 5 may be changed in a specific scenario, and is not limited to the above representation.

Fig. 8 is a schematic structural diagram of a computer device according to an embodiment of the disclosure.

In some embodiments, the computer device is configured to load program instructions to implement the aforementioned method embodiments (e.g., method steps of fig. 1, 5, etc.). The computer device may be implemented as a server, such as a server in a server cluster in the foregoing embodiments, and may serve as an implementation carrier for one or more nodes.

As shown in fig. 8, computer apparatus 800 is embodied in the form of a general purpose computing device. The components of computer device 800 may include, but are not limited to: the at least one processing unit 810, the at least one memory unit 820, and a bus 830 that couples the various system components including the memory unit 820 and the processing unit 810.

Wherein the storage unit stores program code, which can be executed by the processing unit 810, so that the computer apparatus is configured to implement the method steps described in the above embodiments of the present disclosure.

In some embodiments, the memory unit 820 may include volatile memory units such as a random access memory unit (RAM)8201 and/or a cache memory unit 8202, and may further include a read only memory unit (ROM) 8203.

In some embodiments, the storage unit 820 may also include a program/utility 8204 having a set (at least one) of program modules 8205, such program modules 8205 including, but not limited to: an operating execution system, one or more application programs, other program modules, and program data, each of which, and in some combination, may comprise an implementation of a network environment.

In some embodiments, bus 830 may include a data bus, an address bus, and a control bus.

In some embodiments, computer apparatus 800 may also communicate with one or more external devices 900 (e.g., keyboard, pointing device, Bluetooth device, etc.), which may be through an input/output (I/O) interface 850. Optionally, computer device 800 further includes a display unit 840 coupled to input/output (I/O) interface 850 for display. Also, computer device 800 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network such as the Internet) via network adapter 860. As shown, the network adapter 860 communicates with the other modules of the computer device 800 via the bus 830. It should be appreciated that although not shown in the figures, other hardware and/or software modules may be used in conjunction with the computer device 800, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.

Embodiments of the present disclosure may also provide a computer-readable storage medium, which may contain program code and may be run on a device, such as a personal computer, to implement the execution of each step and sub-step in the above-described method embodiments (e.g., fig. 1, fig. 5, etc.) of the present disclosure. In the context of this document, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

The program code may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

A computer readable signal medium may include a propagated data signal with readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A readable signal medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.

Program code embodied on a readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

Program code for carrying out operations of the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user computing device, partly on a remote computing device, or entirely on the remote computing device or server. In the case of a remote computing device, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., through the internet using an internet service provider).

In summary, the embodiments of the present disclosure provide an entity alias relationship obtaining method, a training method, an apparatus, and a storage medium, which are applied to an entity alias relationship model; inputting a text based on an embedded layer of the model to obtain a text feature vector sequence; obtaining a text semantic feature vector sequence based on the feature coding layer; processing based on a sequence labeling layer to obtain a text prediction tag sequence and obtain each entity mention fragment in the text; fusing the text semantic feature vector sequence and the text prediction tag sequence based on a tag fusion layer to obtain a text enhanced feature vector sequence; processing the text enhanced feature vector sequence based on a multi-head selection layer to obtain an entity alias relation probability matrix; and obtaining an entity alias relationship acquisition result between the two entity mention fragments according to the positions of the words respectively corresponding to the rows and columns of the screened probability values in the entity mention fragments to which the words respectively belong. The method realizes the scheme of accurately and efficiently mining the alias relationship of the entity, and is beneficial to the construction of the high-quality knowledge graph.

The scheme provided by the embodiment of the disclosure realizes automatic discovery of a large number of alias relations from the unstructured corpus, and promotes the development of the construction and fusion technology of the knowledge map in the travel industry. Meanwhile, the fusion (label fusion layer) of word-level label information is improved, so that the limitation of the label imbalance problem in the multiple relation extraction on the model learning capacity is effectively relieved, and the precision is improved by about 2%.

Finally, through the test of the applicant, on the test data, the entity identification f1 value reaches 92%, and the relation extraction f1 value reaches 74%. Compared with the traditional text font quantity matching algorithm and simple CNN, RNN and other network modules, the embodiment of the disclosure reduces a large amount of feature engineering on the premise of obtaining higher performance.

The above-described embodiments are merely illustrative of the principles of the present disclosure and their efficacy, and are not intended to limit the disclosure. Any person skilled in the art can modify or change the above-mentioned embodiments without departing from the spirit and scope of the present disclosure. Accordingly, it is intended that all equivalent modifications or changes which may be made by those skilled in the art without departing from the spirit and technical spirit of the present disclosure be covered by the claims of the present disclosure.

Claims

1. An entity alias relationship acquisition method is applied to an entity alias relationship acquisition model, and the entity alias relationship acquisition model comprises: the system comprises an embedding layer, a characteristic coding layer, a sequence labeling layer, a tag fusion layer and a multi-head selection layer; the method comprises the following steps:

acquiring an input text and inputting the entity alias relationship acquisition model;

processing the input text based on an embedding layer to generate a word level token sequence, a word level position index sequence and a text fragment index sequence, and processing the fusion of feature vectors respectively mapped by the word level token sequence, the word level position index sequence and the text fragment index sequence based on the embedding layer to obtain a text feature vector sequence;

processing the text feature vector sequence based on a feature coding layer to obtain a text semantic feature vector sequence;

processing the text semantic feature vector sequence based on a sequence labeling layer to obtain a text prediction tag sequence, and obtaining each entity mention segment in the text according to the text prediction tag sequence;

fusing the text semantic feature vector sequence and the text prediction tag sequence based on a tag fusion layer to obtain a text enhanced feature vector sequence;

processing the text enhanced feature vector sequence based on a multi-head selection layer to obtain an entity alias relation probability matrix; each probability value in the entity alias relationship probability matrix represents the probability that entity alias relationship exists between entity mention fragments to which every two words in the input text belong respectively;

and screening probability values reaching a preset threshold value from the entity alias relationship probability matrix, and obtaining an entity alias relationship acquisition result between two entity mention fragments according to the positions of words respectively corresponding to the rows and columns of the screened probability values in the entity mention fragments to which the words respectively belong.

2. The entity alias relationship obtaining method according to claim 1, wherein obtaining the text feature vector sequence based on fusion of sequence vectors respectively mapped by the word-level token sequence, the word-level position index sequence, and the text fragment index sequence includes:

respectively coding the word-level token sequence, the word-level position index sequence and the text segment index sequence to obtain each coding sequence;

converting each of the encoded sequences into each of the sequence vectors;

and summing and normalizing the sequence vectors to obtain the text feature vector sequence.

3. The entity alias relationship obtaining method as claimed in claim 1, wherein the feature encoding layer includes a multi-headed self-attention layer and a full connection layer; the processing the text feature vector sequence based on the feature coding layer to obtain a text semantic feature vector sequence comprises:

processing the text feature vector sequence through a multi-head self-attention layer to obtain an intermediate feature vector sequence;

and processing the intermediate characteristic vector sequence through a full connection layer to obtain the text semantic characteristic vector sequence.

4. The entity alias relationship obtaining method according to claim 1, wherein the processing the text semantic feature vector sequence based on a sequence labeling layer to obtain a text prediction tag sequence, and obtaining each entity mention fragment in the text according to the text prediction tag sequence includes:

predicting the position of a word corresponding to each characteristic value in the text semantic characteristic vector sequence in the entity mentioned fragment to obtain a prediction label so as to form the text prediction label sequence;

and obtaining each entity mention segment based on the entity mention boundary marked by the text prediction tag sequence.

5. The entity alias relationship acquisition method according to claim 1, wherein the tag fusion layer includes a gated neural network layer; the fusing the text semantic feature vector sequence and the text prediction tag sequence based on the tag fusion layer to obtain a text enhanced feature vector sequence, which comprises the following steps:

and fusing the text semantic feature vector sequence and the text prediction tag sequence according to the word level position through a gated neural network layer to obtain a text enhanced feature vector sequence.

6. The method according to claim 1, wherein the obtaining an entity alias relationship between two entity mention fragments according to positions of words respectively corresponding to rows and columns of the screened probability values in the entity mention fragments to which the words respectively belong comprises:

and determining that entity alias relationship exists between the two entity mention fragments in response to that the words respectively corresponding to the row and the column where the probability value is located are located at the same boundary position in the respectively belonging entity mention fragments.

7. The entity alias relationship acquisition method according to claim 1, further comprising:

generating a mask sequence corresponding to the input text;

and inputting the text feature vector sequence acted by the mask sequence into an entity alias relationship acquisition model.

8. A training method for an entity alias relationship acquisition model is characterized in that the entity alias relationship acquisition model comprises the following steps: the system comprises an embedding layer, a characteristic coding layer, a sequence labeling layer, a tag fusion layer and a multi-head selection layer; the training method comprises the following steps:

acquiring a training sample set and inputting the entity alias relationship acquisition model, wherein each training sample text in the training sample set has a corresponding text real label sequence and an entity alias relationship label;

generating a word level token sequence, a word level position index sequence and a text fragment index sequence based on each training sample text, and obtaining a text characteristic vector sequence based on the fusion of sequence vectors respectively mapped by the word level token sequence, the word level position index sequence and the text fragment index sequence;

calculating a first loss between the text prediction tag sequence and a text real tag sequence;

fusing the text semantic feature vector sequence with a text prediction tag sequence or a text real tag sequence based on a tag fusion layer to obtain a text enhanced feature vector sequence;

processing the text enhanced feature vector sequence based on a multi-head selection layer to obtain an entity alias relation probability matrix; each probability value in the entity alias relationship probability matrix represents the probability that entity alias relationship exists between entity mention fragments to which every two words in the text belong respectively;

calculating a second loss based on each probability value in the entity alias relationship probability matrix and the entity alias relationship label; the entity alias relationship label is determined based on whether an entity alias relationship exists between two words corresponding to the row and the column corresponding to each probability value;

and obtaining total loss based on the fusion of the first loss and the second loss, and updating the entity alias relationship acquisition model according to the total loss.

9. The training method according to claim 8, wherein in a training round using a first part of training sample text, fusing the text semantic feature vector sequence with a text prediction tag sequence based on a tag fusion layer to obtain a text enhanced feature vector sequence; and in the training turns of the rest of the second part of training sample texts, fusing the text semantic feature vector sequence and the text real tag sequence based on the tag fusion layer to obtain a text enhanced feature vector sequence.

10. A training method as claimed in claim 9, characterized in that the number of samples of the training samples of the second section is higher than the number of samples of the training samples of the first section.

11. The training method of claim 8, wherein obtaining the text feature vector sequence based on the fusion of the sequence vectors respectively mapped by the word-level token sequence, the word-level position index sequence, and the text segment index sequence comprises:

converting each of the encoded sequences into each of the sequence vectors;

12. The training method of claim 8, wherein the feature coding layer comprises a multi-headed self-attention layer and a full-connection layer; the processing the text feature vector sequence based on the feature coding layer to obtain a text semantic feature vector sequence comprises:

13. The training method of claim 8, wherein the processing the text semantic feature vector sequence based on a sequence labeling layer to obtain a text prediction tag sequence, and obtaining each entity mention fragment in the text according to the text prediction tag sequence comprises:

14. The training method of claim 8, wherein the label fusion layer comprises a gated neural network layer; the fusing the text semantic feature vector sequence and the text prediction tag sequence or the text real tag sequence based on the tag fusion layer to obtain a text enhanced feature vector sequence, which comprises the following steps:

fusing the text prediction tag sequence or the text real tag sequence according to the word level position through a gated neural network layer to obtain a text enhanced feature vector sequence.

15. Training method according to claim 8, wherein said positional correspondence refers to: the words respectively corresponding to the rows and columns in which the probability values are located at the same boundary position in the entity-referenced fragments to which they belong.

16. The training method of claim 8, further comprising:

generating a mask sequence corresponding to the text;

17. An entity alias relationship acquisition apparatus, applied to an entity alias relationship acquisition model, the entity alias relationship acquisition model comprising: the system comprises an embedding layer, a characteristic coding layer, a sequence labeling layer, a tag fusion layer and a multi-head selection layer; the device comprises:

the input module is used for acquiring an input text and inputting the entity alias relationship acquisition model;

the embedded layer is used for processing the input text to generate a word level token sequence, a word level position index sequence and a text fragment index sequence, and obtaining a text feature vector sequence based on the fusion of feature vectors respectively mapped by the word level token sequence, the word level position index sequence and the text fragment index sequence;

the feature coding layer is used for processing the text feature vector sequence to obtain a text semantic feature vector sequence;

the sequence labeling layer is used for processing the text semantic feature vector sequence to obtain a text prediction tag sequence and obtaining each entity mention fragment in the text according to the text prediction tag sequence;

the label fusion layer is used for fusing the text semantic feature vector sequence and the text prediction label sequence to obtain a text enhancement feature vector sequence;

the multi-head selection layer is used for processing the text enhanced feature vector sequence to obtain an entity alias relationship probability matrix; each probability value in the entity alias relationship probability matrix represents the probability that entity alias relationship exists between entity mention fragments to which every two words in the input text belong respectively; and screening probability values reaching a preset threshold value from the entity alias relationship probability matrix, and obtaining an entity alias relationship acquisition result between two entity mention fragments according to the positions of words respectively corresponding to the rows and columns of the screened probability values in the entity mention fragments to which the words respectively belong.

18. An apparatus for training an entity alias relationship acquisition model, wherein the entity alias relationship acquisition model comprises: the system comprises an embedding layer, a characteristic coding layer, a sequence labeling layer, a tag fusion layer and a multi-head selection layer; the training apparatus includes:

the input module is used for acquiring a training sample set and inputting the entity alias relationship acquisition model, and each training sample text in the training sample set has a corresponding text real label sequence and an entity alias relationship label;

the embedded layer is used for generating a word level token sequence, a word level position index sequence and a text fragment index sequence based on each training sample text, and obtaining a text characteristic vector sequence based on fusion of sequence vectors respectively mapped by the word level token sequence, the word level position index sequence and the text fragment index sequence;

the loss calculation module is used for calculating a first loss between the text prediction tag sequence and the text real tag sequence;

the label fusion layer is used for fusing the text semantic feature vector sequence with a text prediction label sequence or a text real label sequence to obtain a text enhanced feature vector sequence;

the multi-head selection layer is used for processing the text enhanced feature vector sequence to obtain an entity alias relationship probability matrix; each probability value in the entity alias relationship probability matrix represents the probability that entity alias relationship exists between entity mention fragments to which every two words in the text belong respectively;

the loss calculation module is configured to calculate a second loss based on each probability value in the entity alias relationship probability matrix and the entity alias relationship tag; the entity alias relationship label is determined based on whether an entity alias relationship exists between two words corresponding to the row and the column corresponding to each probability value;

and the loss calculation module is used for obtaining total loss based on the fusion of the first loss and the second loss and updating the entity alias relationship acquisition model according to the total loss.

19. A computer device, comprising: a communicator, a memory, and a processor; the communicator is used for communicating with the outside; the memory stores program instructions; the processor is configured to execute the program instructions to perform the entity alias relationship acquisition method according to any one of claims 1 to 7; alternatively, a training method according to any one of claims 8 to 16 is performed.

20. A computer-readable storage medium storing program instructions that are executed to perform the entity alias relationship acquisition method according to any one of claims 1 to 7; alternatively, a training method according to any one of claims 8 to 16 is performed.