CN115983271B

CN115983271B - Named entity recognition method and named entity recognition model training method

Info

Publication number: CN115983271B
Application number: CN202211610737.6A
Authority: CN
Inventors: 张惠蒙; 黄昉; 史亚冰; 蒋烨; 佘俏俏
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2022-12-12
Filing date: 2022-12-12
Publication date: 2024-04-02
Anticipated expiration: 2042-12-12
Also published as: CN115983271A

Abstract

The disclosure provides a named entity recognition method and a named entity recognition model training method, relates to the field of artificial intelligence, in particular to the technical fields of natural language processing, deep learning and the like, and can be applied to scenes such as knowledge mining, knowledge graph construction and the like. The specific implementation scheme of the named entity identification method is as follows: obtaining a marking sequence according to the entity type of the named entity to be identified and the text to be identified; carrying out semantic coding on the marking sequence to obtain entity types and semantic features of texts to be identified; decoding the semantic features to obtain a labeling matrix; marking a target mark in the matrix indication mark sequence and a semantic adjacent relation; determining a target named entity belonging to the entity class in the text to be identified according to the labeling matrix, wherein the target mark is a mark corresponding to the target named entity; the semantic adjacency includes adjacency between any two labels of the label sequence that correspond to the target named entity.

Description

Named entity recognition method and named entity recognition model training method

Technical Field

The disclosure relates to the field of artificial intelligence, in particular to the technical fields of natural language processing, deep learning and the like, and can be applied to scenes such as knowledge mining, knowledge graph construction and the like.

Background

Named entity recognition (Named Entity Recognition, NER) is one of the basic and important tasks in natural language processing. Named entities generally refer to entity names in text that have a particular meaning or are referred to strongly, and may include, for example, person names, place names, organization names, and the like.

Named entity recognition techniques may be applied, for example, to knowledge mining and knowledge graph construction.

Disclosure of Invention

The disclosure aims to provide a named entity recognition method, a named entity recognition model training method, a named entity recognition device, electronic equipment and a storage medium, so as to improve the recognition precision of named entities and the universality of recognition scenes.

According to one aspect of the present disclosure, there is provided a method for identifying a named entity, including: obtaining a marking sequence according to the entity type of the named entity to be identified and the text to be identified; carrying out semantic coding on the marking sequence to obtain entity types and semantic features of texts to be identified; decoding the semantic features to obtain a labeling matrix; marking a target mark in the matrix indication mark sequence and a semantic adjacent relation; determining a target named entity belonging to the entity class in the text to be identified according to the labeling matrix, wherein the target mark is a mark corresponding to the target named entity; the semantic adjacency includes adjacency between any two labels of the label sequence that correspond to the target named entity.

According to another aspect of the present disclosure, there is provided a training method of a named entity recognition model, wherein the named entity recognition model includes a coding sub-model and a decoding sub-model, the training method including: obtaining a marking sequence according to the sample text and the entity type of the named entity in the sample text; the sample text has true values of the labeling matrix; marking a target mark and semantic adjacent relation in a matrix true value indication mark sequence; carrying out semantic coding on the marking sequence by adopting a coding sub-model to obtain entity types and semantic features of sample texts; decoding the semantic features by adopting a decoding sub-model to obtain a labeling matrix predicted value; the marking matrix predicted value indicates a target marking predicted value and a semantic adjacent relation predicted value; training a named entity recognition model according to the difference between the true value of the labeling matrix and the predicted value of the labeling matrix, wherein the target mark corresponds to a target named entity belonging to the entity type in the sample text; the semantic adjacency includes adjacency between any two labels of the label sequence that correspond to the target named entity.

According to another aspect of the present disclosure, there is provided an apparatus for identifying a named entity, including: the marking sequence obtaining module is used for obtaining a marking sequence according to the entity type of the named entity to be identified and the text to be identified; the semantic coding module is used for carrying out semantic coding on the marking sequence to obtain entity types and semantic features of the text to be identified; the feature decoding module is used for decoding the semantic features to obtain a labeling matrix; marking a target mark in the matrix indication mark sequence and a semantic adjacent relation; the entity determining module is used for determining a target named entity belonging to the entity category in the text to be identified according to the labeling matrix, wherein the target mark is a mark corresponding to the target named entity; the semantic adjacency includes adjacency between any two labels of the label sequence that correspond to the target named entity.

According to another aspect of the present disclosure, a training apparatus for naming a model of entity recognition is provided, wherein the naming model of entity recognition includes a coding sub-model and a decoding sub-model. The device comprises: the marking sequence obtaining module is used for obtaining marking sequences according to the sample text and the entity types of the named entities in the sample text; the sample text has true values of the labeling matrix; marking a target mark and semantic adjacent relation in a matrix true value indication mark sequence; the semantic coding module is used for carrying out semantic coding on the marking sequence by adopting a coding sub-model to obtain entity types and semantic features of the sample text; the feature decoding module is used for decoding semantic features by adopting a decoding sub-model to obtain a marked matrix predicted value; the marking matrix predicted value indicates a target marking predicted value and a semantic adjacent relation predicted value; the model training module is used for training the named entity recognition model according to the difference between the true value of the marking matrix and the predicted value of the marking matrix, wherein the target mark corresponds to a target named entity belonging to the entity type in the sample text; the semantic adjacency includes adjacency between any two labels of the label sequence that correspond to the target named entity.

According to another aspect of the present disclosure, there is provided an electronic device including: at least one processor; and a memory communicatively coupled to the at least one processor; the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the named entity recognition method and/or the named entity recognition model training method provided by the present disclosure.

According to another aspect of the present disclosure, there is provided a non-transitory computer-readable storage medium storing computer instructions for causing a computer to perform the named entity recognition method and/or the named entity recognition model training method provided by the present disclosure.

According to another aspect of the present disclosure, there is provided a computer program product comprising a computer program/instruction stored on at least one of a readable storage medium and an electronic device, which when executed by a processor, implements the named entity recognition method and/or named entity recognition model training method provided by the present disclosure.

It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the disclosure, nor is it intended to be used to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following specification.

Drawings

The drawings are for a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

fig. 1 is a schematic diagram of an application scenario of a named entity recognition method and a named entity recognition model training method and apparatus according to an embodiment of the disclosure;

FIG. 2 is a flow diagram of a method of identifying named entities according to an embodiment of the present disclosure;

FIG. 3 is an implementation schematic diagram of a named entity identification method according to an embodiment of the present disclosure;

FIG. 4 is a schematic diagram of an annotated matrix according to a first embodiment of the present disclosure;

FIG. 5 is a schematic diagram of an annotated matrix according to a second embodiment of the present disclosure;

FIG. 6 is a schematic diagram of determining a target named entity from a labeling matrix according to an embodiment of the disclosure;

FIG. 7 is a flow diagram of a training method for named entity recognition models according to an embodiment of the present disclosure;

FIG. 8 is a block diagram of a named entity recognition device according to an embodiment of the present disclosure;

FIG. 9 is a block diagram of a training apparatus for named entity recognition models according to an embodiment of the present disclosure;

FIG. 10 is a block diagram of an electronic device for implementing a named entity recognition method and/or a training method of named entity recognition models of embodiments of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present disclosure to facilitate understanding, and should be considered as merely exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

Named entities generally refer to entities in text that have a particular meaning or are referred to as strongly descriptive, and may generally include person names, place names, organization names, date and time, proper nouns, and the like. Named entity recognition technology is used for extracting named entities from unstructured input text, and can also recognize more types of entity fragments with more complex structures than the named entities, such as product names, models, prices and the like according to business requirements.

With the development of deep learning technology, a deep learning model can be adopted to identify named entities.

For example, an entity extraction model may be employed to extract named entities from the input text. For example, the entity extraction model may be a sequence annotation model. The sequence labeling model is used to label each element in a natural language sequence with a label, such as a label of a person name, a place name, and the like. However, when the sequence labeling model is used for solving the task of extracting the universal named entity, the sequence labeling model can only solve the named entity extraction requirement in a certain entity category range, namely the category range of one named entity is needed to be defined. Accordingly, the training data of the sequence annotation model should contain, and only contain, text with named entities within the class of the delineation. The sequence labeling model can extract sentence-level semantic information of an input text by adopting a bidirectional long-short term memory network LSTM, a bidirectional coding description model BERT based on a transducer, a text model ERNIE and the like, and judges the label of each character position in the input text by using a conditional random field model CRF on the basis. The method has low practicability in practical application because of the limited category range of the extracted named entity. Furthermore, the method cannot accurately extract nested entities and non-contiguous entities. For example, for nested entities, some characters may be characters of a subsequent location of one named entity or characters of a starting location of another named entity, but the sequence annotation model cannot predict multiple tags for the same location, and thus cannot extract nested entities. Because the sequence annotation model only supports annotation of consecutive named entities, non-consecutive entities cannot be extracted either.

For example, a prompt learning based boundary discriminant model may also be employed to identify named entities in the input text. The model is superior to the sequence labeling model in effect and expandability. The boundary discrimination model is a named entity recognition scheme for recognizing the target segment by discriminating the principle of the starting node and the ending node of the target segment. The hint learning is the transfer of modeling of the class of named entities to be extracted from the decoding layer to the encoding layer. In the decoding process, the input of the non-prompt learning mode is a sentence, the decoding dimension is 2*n (n is the number of categories, and for each category, whether each character is the initial character or the termination character of the named entity of each category needs to be judged); the decoding dimension of the prompt learning mode is only 2, and the category information is contained in the input text. For example, if the input text is "[ CLS ] name [ SEP ] professor yyyy of university of XXXX", the boundary discrimination model based on prompt learning need only extract the named entity "yyyy" of the name class from the text "professor yyyy of university of XXXX". The boundary discriminating model based on prompt learning can extract semantic information of sentence level of input text by adopting a pretraining network such as BERT, ERNIE and the like, and process the semantic information through a full connection layer so as to judge the starting position and the ending position of a named entity.

Although the boundary discriminating model based on prompt learning can solve the limitation of the class of the extracted named entity, the model can still only extract continuous named entity fragments in the process of judging the boundary of the named entity through the starting position and the ending position. But cannot extract non-contiguous named entities.

In order to solve the problem, the disclosure provides a named entity recognition method, a named entity recognition model training method, a named entity recognition device, named entity recognition equipment and a named entity recognition medium. An application scenario of the method and apparatus provided in the present disclosure is described below with reference to fig. 1.

Fig. 1 is a schematic diagram of an application scenario of a named entity recognition method and a named entity recognition model training method and apparatus according to an embodiment of the disclosure.

As shown in fig. 1, the application scenario 100 of this embodiment may include a terminal device 110, and the terminal device 110 may be various electronic devices with processing functions, including but not limited to a smart phone, a tablet computer, a laptop computer, a desktop computer, a server, and the like.

The terminal device 110 may be installed with various client applications, such as a text processing class application, a named entity recognition class application, a voice interaction class application, an instant messaging class application, and the like, for example, which is not limited by the present disclosure.

The terminal device 110 may, for example, perform entity extraction on the input text 120 to be identified, to obtain a named entity 130 in the text 120 to be identified. The terminal device 110 may extract the named entity 130 in the text 120 to be identified, for example, using the sequence labeling model described above or a boundary discrimination model based on prompt learning.

In an embodiment, the terminal device 110 may employ the named entity recognition model 150 pre-trained by the server 140 to extract the named entity 130 in the text 120 to be recognized. Wherein the terminal device 110 may be communicatively connected to the server 140, for example, via a network. The network may include wired or wireless communication links. For example, the server 140 may be, for example, a background management server that provides support for the running of a client application installed in the terminal device 110, or may be a cloud server or a blockchain server, etc., which is not limited in this disclosure. The named entity recognition model 150 may be a model that is trained for the training method provided by the present disclosure.

In an embodiment, the terminal device 110 may also send the text 120 to be identified to the server 140 via the network, and the server 140 identifies the named entity in the text 120 to be identified.

It should be noted that, the method for identifying a named entity provided in the present disclosure may be executed by the terminal device 110 or may be executed by the server 140. Accordingly, the identifying device of the named entity provided in the present disclosure may be disposed in the terminal device 110 or may be disposed in the server 140. The training method of the named entity model provided by the present disclosure may be performed by the server 140. Accordingly, the training apparatus for naming entity model provided in the present disclosure may be disposed in the server 140.

It should be understood that the number and type of terminal devices 110 and servers 140 in fig. 1 are merely illustrative. There may be any number and type of terminal devices 110 and servers 140 as desired for implementation.

The method for identifying named entities provided in the present disclosure will be described in detail below with reference to fig. 2 to 6.

Fig. 2 is a flow diagram of a method of identifying named entities according to an embodiment of the present disclosure.

As shown in fig. 2, the named entity recognition method 200 of this embodiment may include operations S210 to S240.

In operation S210, a tag sequence is obtained according to the entity type of the named entity to be identified and the text to be identified.

According to embodiments of the present disclosure, the entity type may be, for example, a person name type, a place name type, an organization name type, an item name, a literature name, a biological name, a musical work name, a game name, or the like.

The embodiment can carry out character segmentation processing on the entity type of the named entity and the text to be recognized by taking the word as a unit to obtain a word sequence, and the word sequence can be used as a marking sequence. For example, the entity type is a name of a person, and the text to be recognized is "professor XYZ of economics of ABCD university", and the word sequence obtained by performing the character segmentation processing in units of words is (person, name, a, B, C, D, university, teaching, X, Y, Z).

In an embodiment, a [ CLS ] tag may also be added before the word sequence to represent the start position of the tag sequence, and the token vector obtained by semantic encoding of the tag may be used for subsequent classification tasks. In an embodiment, a [ SEP ] tag may also be added between the words of the entity type and the words of the text to be recognized in the word sequence for distinguishing between the entity type and the text to be recognized. For example, a [ SEP ] tag may also be added after the last word of the word sequence for marking the termination position of the word sequence. For example, the resulting tag sequence may be ([ CLS ], person, name, [ SEP ], A, B, C, D, university, college, academic, praise, academic, teaching, X, Y, Z, [ SEP ]).

In an embodiment, word2vec conversion may be further performed on the word sequence to obtain a vector sequence, a tag [ CLS ] is added before the vector sequence, and a tag [ SEP ] is added between a vector representing a word of an entity type and a vector representing a word of a text to be recognized, to obtain a tag sequence including the tag [ CLS ], the vector sequence, and the [ SEP ].

In an embodiment, in the marking sequence, the word of the text to be recognized may also precede the word of the entity type, and the arrangement sequence of the word of the text to be recognized and the word of the entity type is not limited in the present disclosure.

In operation S220, the tag sequence is semantically encoded to obtain entity types and semantic features of the text to be identified.

In one embodiment, the tag sequence may be semantically encoded using a long and short memory network LSTM, a two-way long and short memory network Bi-LSTM, an ERNIE network, or a BERT network to obtain semantic features. Alternatively, the tag sequence may be semantically encoded using a network of BERT in combination with Bi-LSTM. For example, the marker sequence may be encoded by the BERT, the output of the BERT network is used as the input to the Bi-LSTM network, the Bi-LSTM network processes the input features, and the semantic features are output.

In operation S230, the semantic features are decoded to obtain a labeling matrix.

In one embodiment, the semantic features may be decoded using a decoding network to obtain a labeling matrix capable of indicating a neighboring relationship between a target marker in the marker sequence and any two markers in the marker sequence that correspond to the target named entity.

The decoding network may be, for example, a network based on an attention mechanism, for example, a network composed of a decoder and a multi-layer perceptron which may include a transducer framework, or a dual affine classifier based on a dual affine attention mechanism, which is not limited in this disclosure.

For example, the annotation matrix may include classification information for each marker in the sequence of markers to indicate whether the each marker is a target marker. The target tag may be, for example, a tag corresponding to a start word and a stop word of a named entity of an entity type in a tag sequence. The classification information may be similar to the output information of the boundary discrimination model based on prompt learning, for example. The number of labels in the label sequence corresponding to the target named entity may be plural, for example, if the target named entity is "XYZ", the number of labels in the label sequence corresponding to the target named entity is three. The labeling matrix may indicate the adjacency between the three labels. For example, the labeling matrix may indicate that the label corresponding to "X" has an adjacent relationship to the label corresponding to "Y" and the label corresponding to "Z" has an adjacent relationship.

In one embodiment, the task of obtaining a labeling matrix for decoding semantic features may be understood as two classification tasks, where one classification task is to classify each label in a label sequence, and according to the labeling matrix, it may be determined whether each label is a target label. The other classification task is to classify the adjacent relation of any two marks in the mark sequence, and whether the adjacent relation exists between any two marks can be determined according to the mark matrix.

It will be appreciated that the network that decodes the semantic features may be any network that has the ability to perform two classification tasks, as this disclosure is not limited in this regard.

In operation S240, a target named entity belonging to the entity class in the text to be identified is determined according to the labeling matrix.

According to the embodiment, semantic adjacency relations indicated by the marking matrix can be traversed, all mark pairs with adjacency relations are determined, then, two marks in the mark pairs are ordered according to the arrangement sequence of words corresponding to the marks in the text to be identified, and at least one ordered mark sequence can be obtained. The embodiment may deduplicate adjacent and identical tags in at least one ordered tag sequence to obtain a deduplicated tag sequence. Finally, word segments formed by sequentially corresponding words in the de-duplicated tag sequence can be used as target naming entities.

Alternatively, the embodiment may first determine the top-ranked and bottom-ranked ones of the target marks based on the order of the target marks in the sequence of marks. And traversing to obtain all label pairs with adjacent relations by taking the label at the head as a starting point and the label at the tail as an ending point, and obtaining the target named entity according to the label pairs obtained by traversing.

According to the technical scheme, the text to be identified and the entity types are coded simultaneously, so that named entities of any entity type can be identified and extracted through setting the required entity types, and the universality of the application scene of the identification method can be improved. Furthermore, by encoding the tag sequence obtained according to the entity type, the entity type can be modeled in the input and encoding stages, so that the decoding process can be decoupled from the number of categories obtained by classification, and the task of multi-classification of the text to be identified can be realized. Therefore, based on the multi-classification result (labeling matrix) obtained by decoding, not only can continuous entities be obtained by analysis, but also nested entities and discontinuous entities can be obtained by analysis. Therefore, the identification method of the embodiment can achieve the effect that the use scene is wider, and the identified target named entity is more accurate.

In one embodiment, a dual affine classifier based on a dual affine attention mechanism (Biaffine Attention) may be employed to decode semantic features to yield an annotation matrix. Specifically, the semantic features are multiplied by the preacknowhere-obtained biaffine matrix to obtain the labeling matrix. Therefore, the cross characteristics among the labels in the label sequence can be better identified, and the accuracy of the obtained label matrix is improved.

Fig. 3 is an implementation schematic diagram of a named entity recognition method according to an embodiment of the present disclosure.

In an embodiment, in the process of decoding the semantic features, further feature extraction of at least two dimensions can be performed on the semantic features, then the extracted features of at least two dimensions are fused, and finally the fused features obtained through fusion are decoded to obtain the labeling matrix. Therefore, the feature of at least two dimensions of the semantic feature is considered during decoding, so that the expression capability of the feature according to which the labeling matrix is obtained during decoding can be improved. And therefore, the precision of the target mark indicated by the labeling matrix and the semantic adjacent relation can be improved, and the accuracy of the obtained target named entity is improved.

The Embedding network Embedding layer can be used for Embedding the semantic features in at least two dimensions to obtain at least two embedded features. The at least two embedded features are at least two dimensional features. The at least two dimensions may include, for example, at least two of the following dimensions: word dimension, distance dimension, region dimension. The character dimension feature may be, for example, an entity type extracted from the semantic feature and a feature of each character in the text to be identified. The features of the distance dimension may be, for example, the types of entities extracted from the semantic features and the distance features between all words in the text to be identified. The features of the region dimension may be, for example, features extracted from semantic features for distinguishing entity types from text to be identified.

In an embodiment, before the feature extraction of at least two dimensions is further performed on the semantic features, for example, the semantic features may be normalized, and then feature extraction of at least two dimensions may be performed on features obtained by the normalization. Therefore, the occurrence of the over-fitting condition in the decoding process can be avoided.

As shown in FIG. 3, in an embodiment 300, a coding network 310 may be employed to encode a annotation sequence 301 to yield semantic features. Wherein the encoding network 310 is composed of an ERNIE network 311 and a Bi-LSTM network 312. The annotation sequence 301 is input into the ERNIE network 311, and the feature output by the ERNIE network 311 after processing the annotation sequence 301 is used as input to the Bi-LSTM network 312. And processing the input features by the Bi-LSTM network and outputting semantic features. It will be appreciated that the annotation sequence is derived from the prompt message (prompt) and the text to be identified (Context), wherein the prompt message is of the entity type described above.

As shown in fig. 3, the semantic feature may be input to a normalization network 321 to normalize the semantic feature by the normalization network 321 to obtain a normalized feature. In an embodiment, the normalization network 321 may be, for example, a conditional normalization network (C _o The normalized-Layer Normalization, CLN) that is based on the normalization layer, the input features are fused as conditions into the superparameters of the normalization layer to control the direction of the output features of the normalization network 321.

After the normalized features are obtained, an embedding network may be used to embed the normalized features. In particular, the embedding network may include at least two embedding layers corresponding to at least two dimensions, respectively, and the embodiment may input normalized features into the at least two embedding layers, and output features of one dimension from each of the at least two embedding layers, resulting in at least two features in total. For example, as shown in fig. 3, the embedding network 322 includes a distance dimension embedding layer, a Word dimension embedding layer, and a Region dimension embedding layer, which process the normalized features, respectively, resulting in a distance embedding feature (Distance embedding) 302, a Word embedding feature (Word embedding) 303, and a Region embedding feature (Region embedding) 304, respectively.

After obtaining at least two features, the at least two features may be fused to obtain a fused feature. For example, the at least two features may be stitched using a concat () function, resulting in a fused feature. After the fusion feature is obtained, the embodiment can adopt a classifier such as a multi-layer perceptron MLP 323 and the like to decode the fusion feature, and the classifier such as the multi-layer perceptron MLP 323 and the like outputs the labeling matrix 305.

Fig. 4 is a schematic diagram of an annotated matrix according to a first embodiment of the present disclosure.

In an embodiment, after obtaining at least two embedded features, for example, features obtained by splicing at least two embedded features may also be processed through a hole convolution network, and features processed through the hole convolution (Dilated Convolution) network may participate in decoding as fusion features. Therefore, the fusion characteristics obtained after the cavity convolution processing can express the context information of a larger receptive field, so that the context relation among words in the non-continuous named entity can be learned better, the accuracy of the labeling matrix obtained by decoding can be improved, and the recognition accuracy of the non-continuous named entity can be improved.

In an embodiment, before the features obtained after the splicing of the at least two embedded features are processed by adopting the cavity convolution network, for example, the spliced features may be processed by adopting a multi-layer perceptron to better fuse the at least two embedded features. Thus, the cavity convolution network is processed by the multi-layer perceptron.

As shown in fig. 4, in this embodiment 400, after at least two-dimensional embedded features (e.g., embedded feature 402, embedded feature 403, and embedded feature 404) are extracted via at least two-dimensional embedded layers, the at least two obtained embedded features may be stitched first, resulting in a stitched feature. The at least two embedded features may be spliced, for example, using a concat () function. And then taking the obtained spliced features as the input of the multi-layer perceptron MLP1 423 to fully fuse at least two embedded features to obtain the perceived features.

The perceived features may then be processed using the hole convolution network 424 to obtain fusion features. Specifically, the perceived features are input into the hole convolution network 424, and the features output by the hole convolution network 424 are used as fusion features.

In an embodiment, at least two hole convolution layers with different hole rates (ratios) may be set in parallel to process the perceived features in parallel. And then, splicing at least two convolution features obtained by processing the at least two cavity convolution layers, and taking the spliced features as fusion features. Therefore, the feature extraction of at least two different receptive fields can be performed while expanding receptive fields of feature expression information, so that fusion features can express context information of different receptive fields, and the expression capability of the fusion features is improved. And therefore, the method can further help learn the context relation among the words in the text to be recognized, is beneficial to improving the accuracy of the labeling matrix obtained by decoding and improves the recognition accuracy of the named entity.

As shown in fig. 4, the hole convolution network 424 may include, for example, a first hole convolution layer 4241, a second hole convolution layer 4242, and a third hole convolution layer 4243 that are disposed in parallel, and the hole ratios of the three hole convolution layers may be, for example, 1, 2, and 3, respectively. The embodiment can input the perceived features output by the multi-layer perceptron MLP1423 into the three cavity convolution layers simultaneously, each layer of the three cavity convolution layers outputs one convolution feature, and three convolution features can be obtained in total. The embodiment may then splice the three convolution features, for example, a concat () function may be employed to splice the three convolution features to yield a fused feature.

It is understood that the number of void convolutions layers and the void fraction of the void convolutions layers in this embodiment 400 are merely examples to facilitate understanding of the present disclosure, which is not limited by the present disclosure.

After the fused feature is obtained, a classifier such as a multi-layer perceptron MLP 2425 may be used to decode the fused feature, thereby obtaining the labeling matrix 405.

Fig. 5 is a schematic diagram of an annotated matrix according to a second embodiment of the present disclosure.

In an embodiment, two different decoding networks may be set to decode the semantic features, and then two labeling matrices obtained by decoding the two decoding networks are fused, where the fused matrices are used as labeling matrices obtained in operation S230 described above. The final labeling matrix is obtained according to the decoding results of two different decoding networks, so that the accuracy of the obtained labeling matrix can be improved, the named entity identification method of the embodiment can be adapted to a plurality of different named entity identification scenes, the robustness of the named entity identification method of the embodiment can be improved, and the situation of overfitting caused by inaccuracy of a single decoding network can be avoided.

Wherein the two different decoding networks may comprise, for example, the dual affine classifier described above and the multi-layer perceptron-based classifier. The classifier structure based on the multi-layer perceptron can adopt the principle shown in fig. 3 to decode to obtain the labeling features, and can also adopt the principle shown in fig. 4 to decode to obtain the labeling features.

In the following, with reference to fig. 5, the implementation principle of the named entity recognition method of this embodiment will be described by taking the principle shown in fig. 4 as an example for decoding the labeled features based on the classifier structure of the multi-layer perceptron.

As shown in fig. 5, in this embodiment 500, similar to embodiment 300, the resulting tag sequence 501 may be semantically encoded using an encoding network 510 to obtain semantic features. Wherein encoding network 510 includes an ERNIE network 511 and a Bi-LSTM network 512.

After deriving the semantic features, the semantic features may be input into a normalization network 521 (which may be, for example, a CLN), from which the normalized features are output. Subsequently, the embodiment 500 may input the normalized feature into three-dimensional embedded layers included in the embedded network 522, respectively, and the three embedded features 502-504 output by the three-dimensional embedded layers may be spliced to obtain a post-splice feature. The post-stitching features are processed by the multi-layer perceptron MLP 1523 to obtain post-stitching features. The perceived features are input into three hole convolution layers 5241-5243 with different hole ratios included in the hole convolution network 524, respectively, to obtain three convolution features. The fusion characteristic can be obtained after the three convolution characteristics are spliced. By inputting the fused features into the multi-layer perceptron MLP2525, a second labeling matrix may be obtained.

While deriving the second labeling matrix, this embodiment 500 may input the normalized features into the dual affine classifier Biaffine 526, with the third labeling matrix being output by the dual affine classifier Biaffine 526. Finally, the embodiment may obtain the labeling matrix 505 obtained by decoding the semantic features by fusing the second labeling matrix and the third labeling matrix.

For example, the second labeling matrix and the third labeling matrix may be added to obtain the labeling matrix 505.

FIG. 6 is a schematic diagram of determining a target named entity from a labeling matrix according to an embodiment of the disclosure.

According to an embodiment of the disclosure, when determining the target named entity according to the labeling matrix, a labeled directed graph may be generated first according to the semantic adjacency indicated by the labeling matrix and the labeling sequence described above, for example. The target named entity is then determined based on the location of the target mark in the mark-directed graph as indicated by the mark matrix.

For example, a marker pair formed by two markers having adjacent relationships in a marker sequence may be determined in a traversal manner according to the semantic adjacent relationships indicated by the labeling matrix, so as to obtain a plurality of marker pairs. Then, the two markers included in each marker pair are added with pointing arrows according to the order of precedence in the marker sequence. For example, the pointing arrow is pointed to by a marker that is placed earlier in the sequence of markers, and a marker that is placed later in the sequence of markers. By adding directional arrows to the markers in all marker pairs, a marker directed graph can be obtained.

For example, the target token includes a token corresponding to the first word of the target named entity and a token corresponding to the last word of the target named entity. This embodiment can locate the target mark in the mark-oriented map. And then, splicing words corresponding to all marks on the links connecting the two target marks in the marked directed graph according to the direction of the links to obtain the target named entity.

For example, as shown in fig. 6, in this embodiment 600, umbilical vein catheters for the entity category "disease" and text to be identified cannot be used for acute esophagitis, gastritis, etc. The resulting tag sequence was ([ CLS ], [ disease ], [ SEP ], [ umbilicus ], [ quiet ], [ pulse ], [ guide ], [ tube ], [ not ], [ energy ], [ use ], [ in ], [ acute ], [ sex ], [ food ], [ tract ], [ inflammation ], [ stomach ], [ inflammation ], [ etc ], [ ], which comprises 23 tags. The labeling matrix 601 obtained by employing the principles of the various embodiments described above may be 23 x 23 in size. As shown in fig. 6, the element of the upper left corner region of the marker matrix may indicate the adjacency between any two markers of the 23 markers, and the element of the lower right corner region of the marker matrix indicates whether each of the 23 markers is a target marker. If there is an adjacency relationship between two labels, in the labeling matrix 601, the element of the first label of the two labels, which is arranged earlier in the label sequence, is "nnw" at the intersection position of the row corresponding to the labeling matrix and the second label of the two labels, which is arranged later in the label sequence, which is arranged corresponding to the column in the labeling matrix. For the target marks, the value of the element of the intersection position of any two target marks in the marking matrix at the corresponding position in the marking matrix is 'En'. It will be appreciated that the values of the elements in the labeling matrix are merely examples to facilitate understanding of the disclosure, and the disclosure is not limited thereto, so long as the values of the elements can distinguish between different classification results.

From this labeling matrix 601, it can be determined that a marker pair made up of two markers having an adjacent relationship includes the following six marker pairs: ([ acute ], [ sex ]), ([ sex ], [ food ]), ([ food ], [ tube ], ([ tube ], [ inflammation ]), ([ sex ], [ stomach ], ([ stomach ], [ inflammation ]). From the six marker pairs, a marker directed tree 602 may be obtained. From the target markers, one can locate the target markers in the marker-directed tree as [ urgent ], [ inflammatory ], where there are two target markers [ inflammatory ] respectively [ inflammatory ] in the marker pair ([ tube ], [ inflammatory ]) and [ inflammatory ] in the marker pair ([ stomach ], [ inflammatory ]). The link connecting the two target marks includes the following two links: [ urgent ] → [ sexual ] → [ food ] → [ tube ] → [ inflammatory ]; the named entities of the target include acute esophagitis and acute gastritis.

The method of the embodiment of the disclosure adopts a hypergraph decoding mode, the connection relation between words can be learned through the upper triangle element in the hypergraph (labeling matrix), the boundary information of the named entity of the target type can be learned through the lower triangle element in the hypergraph, and the identification of all named entity forms such as continuous entities, nested entities, discontinuous entities and the like can be realized.

In order to facilitate implementation of the named entity recognition method provided in the present disclosure, the present disclosure further provides a training method of a named entity recognition model, and the training method will be described in detail below with reference to fig. 7.

Fig. 7 is a flow diagram of a training method of named entity recognition models according to an embodiment of the present disclosure.

As shown in fig. 7, the training method 700 of this embodiment may include operations S710 to S740. The named entity recognition model comprises a coding submodel and a decoding submodel.

In operation S710, a tag sequence is obtained according to the sample text and the entity type of the named entity in the sample text.

The entity type of the named entity can be any entity type of the named entity in the sample text. The sample text may have labeling matrix truth values. The labeling matrix truth value indicates the target label and semantic adjacency in the label sequence. The semantic adjacency relation comprises adjacency relations between any two marks corresponding to the target named entity in the mark sequence. It can be appreciated that the true value of the labeling matrix is similar to the labeling matrix described above, except that the information indicated by the true value of the labeling matrix is real information, and the information indicated by the labeling matrix is prediction information; the true value of the labeling matrix can be obtained by labeling in advance, and the labeling matrix can be obtained by predicting a named entity recognition model.

It is to be understood that the implementation principle of this operation S710 is similar to that of the operation S210 described above, and will not be described herein.

In operation S720, the coding submodel is used to semantically code the tag sequence to obtain semantic features of the entity type and the sample text.

According to an embodiment of the present disclosure, the principle of the encoding sub-model to semantically encode the tag sequence may be similar to the principle of the semantic encoding in operation S220 described above. For example, the coding sub-model may include the model formed by the ERNIE network and Bi-LSTM network described above, which is not limited by the present disclosure.

In operation S730, the semantic features are decoded using the decoding sub-model to obtain labeled matrix predictors.

The marking matrix predicted value indicates a target marking predicted value and a semantic adjacent relation predicted value. It can be understood that the implementation principle of the operation S730 is similar to that of the operation S230 described above, and the labeling matrix predicted value obtained in the operation S730 is similar to that obtained in the operation S230, which is not described herein.

In operation S740, the named entity recognition model is trained based on the differences between the labeling matrix true values and the labeling matrix predicted values.

For example, the embodiment may use a cross entropy loss function to calculate the difference between the true value of the labeling matrix and the predicted value of the labeling matrix, and use the calculated cross entropy loss value as the loss value of the named entity recognition model. The embodiment may train the named entity recognition model with the goal of minimizing the loss value. It will be appreciated that the loss function employed above is merely an example to facilitate an understanding of the present disclosure, which is not limited thereto.

In one embodiment, text of named entities of multiple entity types may be obtained as sample text. For example, for each of a plurality of entity types, text may be obtained that includes a named entity for each entity type, resulting in a plurality of text for each entity type. Finally, the embodiment may generate sample text from all text for multiple entity types. For example, after all texts for a plurality of entity types are summarized, a true value of an annotation matrix for each text in all texts can be obtained through human-computer interaction, and each text is annotated according to the true value of the annotation matrix, so that a sample text can be obtained.

Wherein the plurality of entity types may be determined, for example, from categories of encyclopedia knowledge. For example, the plurality of entity types may include some or all of the following 23 types: person type, movie work type, organization type, article type, literature work type, biology type, music work type, game type, software type, vehicle type, place name type, food type, website type, medicine type, disease type, holiday calendar type, treatment type, substance type, brand type, time type, event activity type, prize type, language type, and the like.

In an embodiment, a title of a certain type of the titles of the encyclopedia knowledge may be used as a target named entity, and text including the target named entity may be extracted from the text library, thereby obtaining text for the certain type.

In an embodiment, when obtaining the text including the named entity of each entity type, the embodiment may obtain, for each sub-class included in each entity type, the text including the named entity of each sub-class, and take, as the text for each entity type, all the text obtained for all the sub-classes included in the each entity type.

For example, for a persona type, the included subclasses may include a persona subclass and a persona field word subclass, the named entities of the persona subclass include persona names, and the named entities of the persona field word subclass may include category names that represent persona identities, etc., such as moderators, athletes, etc. For organizational types, the subclasses included may include educational institution subclasses, medical institution subclasses, government part subclasses, enterprise & company subclasses, and other organizational subclasses. The named entities of the other organizational sub-classes may comprise, for example, the name of a television station, the name of a broadcast station, etc. For example, for item types, the included subclasses may include scientific and technological product subclasses, electrical subclasses, mechanical product subclasses, and component subclasses, among others. For literary work types, the included subclasses may include literary work subclasses, artistic work subclasses, and the like. The named entities of the literary works subclasses may include names of published books, documents, etc., and the named entities of the artistic works subclasses may include painting names, etc. It will be appreciated that the subclasses included in each type may be determined, for example, by categorizing knowledge in encyclopedia of knowledge, which is not limiting to the present disclosure.

Through the arrangement of the plurality of entity categories and the subclasses included in each entity category, more practical application scenes can be covered by the sample text, so that the named entity recognition model can perform modeling learning on named entities of more categories, and the modeling capability of the named entity recognition model on named entity types is improved. Therefore, the named entity recognition model obtained through training can have better understanding and reasoning generalization capability in the face of the entity types which are not learned.

In an embodiment, after training the named entity recognition model by using all the texts for the multiple entity types, if the named entity recognition model is applied to a certain vertical field to facilitate the recognition of the field of the target entity type (may be any of the multiple entity types), for example, a sample text for the target entity type may be generated by using the text for the target entity type, and the named entity recognition model may be fine-tuned according to the sample text for the target entity type, that is, the named entity recognition model may be trained for the second time by using a principle similar to operations S710 to S740 described above. Therefore, the named entity recognition model of the embodiment not only has high generalization, but also can support accurate recognition of named entities in the specific field so as to meet the higher precision requirement of a user on the recognition of the named entities in the specific field and the specific type.

According to an embodiment of the present disclosure, the decoding sub-model includes a dual affine classifier. The above-described operation S730 may be specifically implemented by: and decoding the semantic features by adopting a double affine classifier to obtain a first labeling matrix.

According to an embodiment of the present disclosure, a decoding sub-model includes an embedded network, a first fusion network, and a perceptron classifier connected in sequence. The step S730 may specifically include performing at least two-dimensional embedding processing on the semantic features by using an embedding network to obtain at least two embedded features; the at least two dimensions include at least two of the following dimensions: word dimension, distance dimension, region dimension. And then fusing at least two embedded features by adopting a first fusion network to obtain fused features. And then decoding the fusion features by adopting a perceptron classifier to obtain a second labeling matrix.

According to an embodiment of the present disclosure, an embedded network includes a conditional normalization layer and at least two embedded layers corresponding to at least two dimensions, respectively. The operation of embedding the semantic features in at least two dimensions by using the embedding network to obtain at least two embedded features may include the following operations: carrying out normalization processing on the semantic features by adopting a condition normalization layer to obtain normalized features; and inputting the normalized features into at least two embedded layers, and outputting at least two features by the at least two embedded layers in one-to-one correspondence.

According to an embodiment of the present disclosure, the first converged network includes a spliced subnetwork, a multi-layer perceptron, and a hole convolution subnetwork. The above-mentioned fusing at least two embedded features by using the first fusion network, the operation of obtaining the fused feature may include: splicing at least two embedded features by adopting a spliced sub-network to obtain spliced features; processing the spliced features by adopting a multi-layer perceptron to obtain perceived features; and processing the perceived features by adopting a cavity convolution sub-network to obtain fusion features.

According to an embodiment of the present disclosure, a hole convolution sub-network includes at least two hole convolution layers and a splice layer having different hole ratios. The operation of processing the perceived features by using the cavity convolution sub-network to obtain the fusion features may include: processing the perceived features by adopting at least two cavity convolution layers to obtain at least two convolution features; and splicing at least two convolution features by adopting a splicing layer to obtain a fusion feature.

According to an embodiment of the present disclosure, the decoding sub-model further comprises a dual affine classifier and a second fusion network. The operation of decoding the semantic features by using the decoding sub-model to obtain the labeling matrix predicted value may further include: processing the normalized features by adopting a double affine classifier to obtain a third labeling matrix; and fusing the second labeling matrix and the third labeling matrix by adopting a second fusion network to obtain a labeling matrix predicted value obtained by decoding the semantic features.

Based on the named entity recognition method provided by the present disclosure, the present disclosure further provides a named entity recognition device, which will be described in detail below with reference to fig. 8.

Fig. 8 is a block diagram of a named entity recognition device according to an embodiment of the present disclosure.

As shown in fig. 8, the identifying device 800 of the named entity of this embodiment includes a tag sequence obtaining module 810, a semantic encoding module 820, a feature decoding module 830, and an entity determining module 840.

The tag sequence obtaining module 810 is configured to obtain a tag sequence according to the sample text and the entity type of the named entity in the sample text. The sample text has true values of the labeling matrix; the labeling matrix truth value indicates the target label and semantic adjacency in the label sequence. The target mark corresponds to a target named entity belonging to the entity type in the sample text; the semantic adjacency includes adjacency between any two labels of the label sequence that correspond to the target named entity. In an embodiment, the tag sequence obtaining module 810 may be configured to perform the operation S210 described above, which is not described herein.

The semantic coding module 820 is configured to perform semantic coding on the tag sequence by using the coding submodel to obtain semantic features of the entity type and the sample text. In an embodiment, the semantic coding module 820 may be used to perform the operation S220 described above, which is not described herein.

The feature decoding module 830 is configured to decode semantic features by using a decoding sub-model to obtain a labeling matrix predicted value; the labeling matrix predictors indicate target label predictors and semantic adjacency predictors. In an embodiment, the feature decoding module 830 may be configured to perform the operation S230 described above, which is not described herein.

The entity determining module 840 is configured to determine, according to the labeling matrix, a target named entity belonging to the entity class in the text to be identified. In an embodiment, the entity determining module 840 may be configured to perform the operation S240 described above, which is not described herein.

According to an embodiment of the disclosure, the feature decoding module 830 may be specifically configured to decode semantic features by using a dual affine classifier to obtain a first labeling matrix.

The feature decoding module 830 may include an embedding processing sub-module, a first fusion sub-module, and a decoding sub-module according to embodiments of the present disclosure. The embedding processing sub-module is used for carrying out embedding processing of at least two dimensions on the semantic features by adopting an embedding network to obtain at least two embedding features; the at least two dimensions include at least two of the following dimensions: word dimension, distance dimension, region dimension. The first fusion submodule is used for fusing at least two embedded features to obtain fusion features. And the decoding submodule is used for decoding the fusion characteristics to obtain a second labeling matrix.

According to an embodiment of the present disclosure, an embedded network includes at least two embedded layers corresponding to at least two dimensions, respectively. The embedding processing sub-module may include a normalization unit and an embedding processing unit. The normalization unit is used for carrying out normalization processing on the semantic features by adopting a conditional normalization network to obtain normalized features. The embedding processing unit is used for inputting the normalized features into at least two embedding layers, and outputting at least two features corresponding to the at least two embedding layers one by one.

According to an embodiment of the present disclosure, the first fusion sub-module may include a splicing unit, a perception processing unit, and a hole convolution processing unit. The splicing unit is used for splicing at least two embedded features to obtain spliced features. The perception processing unit is used for processing the spliced features by adopting a multi-layer perception machine to obtain perceived features. The cavity convolution processing unit is used for processing the perceived characteristics by adopting a cavity convolution network to obtain fusion characteristics.

According to an embodiment of the present disclosure, the hole convolution processing unit includes a processing subunit and a fusion subunit. The processing subunit is used for processing the perceived features by adopting at least two cavity convolution layers with different cavity rates to obtain at least two convolution features. The fusion subunit is used for splicing at least two convolution features to obtain fusion features.

The feature decoding module 830 may further include a classification sub-module and a second fusion sub-module according to an embodiment of the present disclosure. And the classification submodule is used for processing the normalized features by adopting a double affine classifier to obtain a third labeling matrix. The second fusion sub-module is used for fusing the second labeling matrix and the third labeling matrix to obtain a labeling matrix obtained by decoding the semantic features.

The entity determination module 840 may include a graph generation sub-module and an entity determination sub-module, according to embodiments of the present disclosure. The graph generation submodule is used for generating a marked directed graph according to the semantic adjacent relation indicated by the marking matrix and the marking sequence. The entity determination submodule is used for determining the target named entity according to the position of the target mark indicated by the marking matrix in the mark directed graph. Wherein the target tag includes a tag corresponding to a first word of the target named entity and a tag corresponding to a last word of the target named entity.

Based on the training method of the named entity recognition model provided by the present disclosure, the present disclosure further provides a training device of the named entity recognition model, and the device will be described in detail below with reference to fig. 9.

Fig. 9 is a block diagram of a training apparatus for named entity recognition models according to an embodiment of the present disclosure.

As shown in fig. 9, the training apparatus 900 for a named entity recognition model of this embodiment may include a tag sequence obtaining module 910, a semantic encoding module 920, a feature decoding module 930, and a model training module 940. The named entity recognition model comprises a coding submodel and a decoding submodel.

The tag sequence obtaining module 910 is configured to obtain a tag sequence according to the sample text and the entity type of the named entity in the sample text. The sample text has true values of the labeling matrix; the labeling matrix truth value indicates the target label and semantic adjacency in the label sequence. The target mark corresponds to a target named entity belonging to the entity type in the sample text; the semantic adjacency includes adjacency between any two labels of the label sequence that correspond to the target named entity. In an embodiment, the tag sequence obtaining module 910 may be configured to perform the operation S710 described above, which is not described herein.

The semantic coding module 920 is configured to perform semantic coding on the tag sequence by using the coding submodel to obtain semantic features of the entity type and the sample text. In an embodiment, the semantic coding module 920 may be configured to perform the operation S720 described above, which is not described herein.

The feature decoding module 930 is configured to decode semantic features using the decoding submodel to obtain labeled matrix predicted values. The marking matrix predicted value indicates a target marking predicted value and a semantic adjacent relation predicted value. In an embodiment, the feature decoding module 930 may be configured to perform the operation S730 described above, which is not described herein.

The model training module 940 is configured to train the named entity recognition model according to the difference between the true value of the labeling matrix and the predicted value of the labeling matrix. In an embodiment, the model training module 940 may be configured to perform the operation S740 described above, which is not described herein.

According to an embodiment of the present disclosure, the training apparatus 900 of the named entity recognition model may further include a text obtaining module and a sample generating module. The text acquisition module is used for acquiring texts of named entities comprising each entity type aiming at each entity type in the plurality of entity types to obtain a plurality of texts aiming at the plurality of entity types. The sample generation module is used for generating sample texts according to the plurality of texts.

According to an embodiment of the present disclosure, each entity type includes at least one subclass. The text acquisition module is specifically configured to: for each of the at least one subclass, text is obtained that includes a named entity for each subclass.

According to an embodiment of the present disclosure, the decoding sub-model includes a dual affine classifier. The above feature decoding module 930 may be specifically configured to decode semantic features by using a dual affine classifier to obtain a first labeling matrix.

According to an embodiment of the present disclosure, a decoding sub-model includes an embedded network, a first fusion network, and a perceptron classifier connected in sequence. The feature decoding module 930 may include an embedding processing sub-module, a first fusion sub-module, and a decoding sub-module. The embedding processing sub-module is used for carrying out embedding processing of at least two dimensions on the semantic features by adopting an embedding network to obtain at least two embedding features; the at least two dimensions include at least two of the following dimensions: word dimension, distance dimension, region dimension. The first fusion submodule is used for fusing at least two embedded features by adopting a first fusion network to obtain fusion features. And the decoding submodule is used for decoding the fusion characteristics by adopting a perceptron classifier to obtain a second labeling matrix.

According to an embodiment of the present disclosure, an embedded network includes a conditional normalization layer and at least two embedded layers corresponding to at least two dimensions, respectively. The embedding processing sub-module may include a normalization unit and an embedding processing unit. The normalization unit is used for carrying out normalization processing on semantic features by adopting a conditional normalization layer to obtain normalized features. The embedding processing unit is used for inputting the normalized features into at least two embedding layers, and outputting at least two features corresponding to the at least two embedding layers one by one.

According to an embodiment of the present disclosure, the first converged network includes a spliced subnetwork, a multi-layer perceptron, and a hole convolution subnetwork. The first fusion sub-module may include a splicing unit, a sensing processing unit, and a hole convolution processing unit. The splicing unit is used for splicing at least two embedded features by adopting a splicing sub-network to obtain spliced features. The perception processing unit is used for processing the spliced features by adopting a multi-layer perception machine to obtain perceived features. And the cavity convolution processing unit is used for processing the perceived characteristics by adopting a cavity convolution sub-network to obtain fusion characteristics.

According to an embodiment of the present disclosure, a hole convolution sub-network includes at least two hole convolution layers and a splice layer having different hole ratios. The cavity convolution processing unit comprises a processing subunit and a fusion subunit. The processing subunit is used for processing the perceived features by adopting at least two cavity convolution layers to obtain at least two convolution features. The fusion subunit is used for splicing at least two convolution features by adopting the splicing layer to obtain fusion features.

According to an embodiment of the present disclosure, the decoding sub-model further comprises a dual affine classifier and a second fusion network. The feature decoding module 930 may further include a classification sub-module and a second fusion sub-module. And the classification submodule is used for processing the normalized features by adopting a double affine classifier to obtain a third labeling matrix. The second fusion sub-module is used for fusing the second labeling matrix and the third labeling matrix by adopting a second fusion network to obtain a labeling matrix predicted value obtained by decoding semantic features.

In the technical scheme of the disclosure, the related processes of collecting, storing, using, processing, transmitting, providing, disclosing and applying personal information of the user all conform to the regulations of related laws and regulations, necessary security measures are adopted, and the public welcome is not violated. In the technical scheme of the disclosure, the authorization or consent of the user is obtained before the personal information of the user is obtained or acquired.

According to embodiments of the present disclosure, the present disclosure also provides an electronic device, a readable storage medium and a computer program product.

FIG. 10 illustrates a schematic block diagram of an example electronic device 1000 that may be used to implement the named entity recognition method and/or the training method of the named entity recognition model of embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 10, the apparatus 1000 includes a computing unit 1001 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM) 1002 or a computer program loaded from a storage unit 1008 into a Random Access Memory (RAM) 1003. In the RAM 1003, various programs and data required for the operation of the device 1000 can also be stored. The computing unit 1001, the ROM 1002, and the RAM 1003 are connected to each other by a bus 1004. An input/output (I/O) interface 1005 is also connected to bus 1004.

Various components in device 1000 are connected to I/O interface 1005, including: an input unit 1006 such as a keyboard, a mouse, and the like; an output unit 1007 such as various types of displays, speakers, and the like; a storage unit 1008 such as a magnetic disk, an optical disk, or the like; and communication unit 1009 such as a network card, modem, wireless communication transceiver, etc. Communication unit 1009 allows device 1000 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunications networks.

The computing unit 1001 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of computing unit 1001 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, etc. The computing unit 1001 performs the respective methods and processes described above, for example, a named entity recognition method and/or a named entity recognition model training method. For example, in some embodiments, the named entity recognition method and/or the named entity recognition model training method may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as storage unit 1 008. In some embodiments, part or all of the computer program may be loaded and/or installed onto device 1000 via ROM 1002 and/or communication unit 1009. When the computer program is loaded into RAM 1003 and executed by the computing unit 1001, one or more steps of the above-described named entity recognition method and/or the training method of the named entity recognition model may be performed. Alternatively, in other embodiments, the computing unit 1001 may be configured to perform the named entity recognition method and/or the named entity recognition model training method in any other suitable manner (e.g., by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), complex Programmable Logic Devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for carrying out methods of the present disclosure may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.

The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so as to solve the defects of high management difficulty and weak service expansibility in the traditional physical hosts and VPS service ("Virtual Private Server" or simply "VPS"). The server may also be a server of a distributed system or a server that incorporates a blockchain.

It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps recited in the present disclosure may be performed in parallel or sequentially or in a different order, provided that the desired results of the technical solutions of the present disclosure are achieved, and are not limited herein.

The above detailed description should not be taken as limiting the scope of the present disclosure. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present disclosure are intended to be included within the scope of the present disclosure.

Claims

1. A method of identifying named entities, comprising:

obtaining a marking sequence according to the entity type of the named entity to be identified and the text to be identified;

carrying out semantic coding on the marking sequence to obtain the entity type and the semantic feature of the text to be identified;

decoding the semantic features to obtain a labeling matrix; the marking matrix indicates the target mark and the semantic adjacent relation in the mark sequence; and

Determining a target named entity belonging to the entity category in the text to be identified according to the labeling matrix,

wherein the target mark is a mark corresponding to the target named entity; the semantic adjacency includes adjacency between any two labels in the label sequence corresponding to the target named entity,

wherein the decoding the semantic features to obtain the labeling matrix includes:

embedding the semantic features into at least two dimensions by adopting an embedding network to obtain at least two embedded features; the at least two dimensions include at least one of a distance dimension and: the characteristics of the distance dimension are the entity types extracted from the semantic characteristics and the distance characteristics between every two words in the text to be identified;

splicing the at least two embedded features to obtain spliced features;

processing the spliced features by adopting a multi-layer perceptron to obtain perceived features;

processing the perceived features by adopting a cavity convolution network to obtain fusion features; and

and decoding the fusion characteristics to obtain a second labeling matrix.

2. The method of claim 1, wherein the embedded network comprises at least two embedded layers corresponding to the at least two dimensions, respectively; the embedding network is adopted to conduct embedding processing of at least two dimensions on the semantic features, and obtaining at least two embedded features comprises:

carrying out normalization processing on the semantic features by adopting a conditional normalization network to obtain normalized features; and

inputting the normalized features into the at least two embedded layers, and outputting the at least two features by the at least two embedded layers in one-to-one correspondence.

3. The method of claim 1, wherein the processing the perceived features with a hole convolution network to obtain the fused features comprises:

processing the perceived features by adopting at least two cavity convolution layers with different cavity rates to obtain at least two convolution features; and

and splicing the at least two convolution characteristics to obtain the fusion characteristic.

4. The method of claim 2, wherein the decoding the semantic features to obtain a labeling matrix further comprises:

processing the normalized features by adopting a double affine classifier to obtain a third labeling matrix; and

And fusing the second labeling matrix and the third labeling matrix to obtain a labeling matrix obtained by decoding the semantic features.

5. The method of claim 1, wherein the determining the target named entity in the text to be identified according to the annotation matrix comprises:

generating a marked directed graph according to the semantic adjacency relation indicated by the marking matrix and the marking sequence; and

determining the target named entity according to the position of the target mark indicated by the marking matrix in the mark directed graph,

wherein the target tag includes a tag corresponding to a first word of the target named entity and a tag corresponding to a last word of the target named entity.

6. A training method of a named entity recognition model, wherein the named entity recognition model comprises a coding submodel and a decoding submodel; the method comprises the following steps:

obtaining a marking sequence according to the sample text and the entity type of the named entity in the sample text; the sample text has a true value of a labeling matrix; the true value of the marking matrix indicates the target mark and the semantic adjacent relation in the mark sequence;

Carrying out semantic coding on the marking sequence by adopting the coding submodel to obtain semantic features of the entity type and the sample text;

decoding the semantic features by adopting the decoding sub-model to obtain a labeling matrix predicted value; the marking matrix predicted value indicates a target marking predicted value and a semantic adjacent relation predicted value; and

training the named entity recognition model according to the difference between the true value of the labeling matrix and the predicted value of the labeling matrix,

the target mark corresponds to a target named entity belonging to the entity type in the sample text; the semantic adjacency includes adjacency between any two labels in the label sequence corresponding to the target named entity,

the decoding sub-model comprises an embedded network, a first fusion network and a perceptron classifier which are sequentially connected, wherein the first fusion network comprises a splicing sub-network, a multi-layer perceptron and a cavity convolution sub-network; the adoption of the decoding sub-model to decode the semantic features, the obtaining of the labeling matrix predicted value comprises the following steps:

embedding the semantic features into at least two dimensions by adopting the embedding network to obtain at least two embedded features; the at least two dimensions include at least one of a distance dimension and: the characteristics of the distance dimension are the entity types extracted from the semantic characteristics and the distance characteristics between every two words in the sample text;

Splicing the at least two embedded features by adopting the spliced sub-network to obtain spliced features;

processing the spliced features by adopting the multi-layer perceptron to obtain perceived features; and

processing the perceived features by adopting the cavity convolution sub-network to obtain fusion features; and

and decoding the fusion features by adopting the perceptron classifier to obtain a second labeling matrix.

7. The method of claim 6, further comprising:

for each entity type in a plurality of entity types, acquiring a text comprising a named entity of each entity type, and obtaining a plurality of texts for the plurality of entity types; and

and generating the sample text according to the texts.

8. The method of claim 7, wherein each entity type comprises at least one subclass; obtaining text including named entities of each entity type includes:

for each sub-class of the at least one sub-class, text is obtained that includes a named entity of the each sub-class.

9. The method of claim 6, wherein the embedded network comprises a conditional normalization layer and at least two embedded layers corresponding to the at least two dimensions, respectively; the embedding network is adopted to perform embedding processing of at least two dimensions on the semantic features, and the obtaining of at least two embedded features comprises:

Normalizing the semantic features by adopting the condition normalization layer to obtain normalized features; and

10. The method of claim 6, wherein the hole convolution sub-network comprises at least two hole convolution layers and a splice layer having different hole rates; the step of processing the perceived features by adopting the cavity convolution sub-network to obtain the fusion features comprises the following steps:

processing the perceived features by adopting the at least two cavity convolution layers to obtain at least two convolution features; and

and splicing the at least two convolution features by adopting the splicing layer to obtain the fusion feature.

11. The method of claim 9, wherein the decoding sub-model further comprises a dual affine classifier and a second fusion network; the method for decoding the semantic features by adopting the decoding submodel further comprises the following steps:

processing the normalized features by adopting the double affine classifier to obtain a third labeling matrix; and

And fusing the second labeling matrix and the third labeling matrix by adopting the second fusion network to obtain a labeling matrix predicted value obtained by decoding the semantic features.

12. An apparatus for identifying named entities, comprising:

the marking sequence obtaining module is used for obtaining a marking sequence according to the entity type of the named entity to be identified and the text to be identified;

the semantic coding module is used for carrying out semantic coding on the marking sequence to obtain the entity type and the semantic characteristics of the text to be identified;

the feature decoding module is used for decoding the semantic features to obtain a labeling matrix; the marking matrix indicates the target mark and the semantic adjacent relation in the mark sequence; and

an entity determining module for determining the target named entity belonging to the entity category in the text to be identified according to the labeling matrix,

wherein, the feature decoding module includes:

the embedding processing sub-module is used for carrying out embedding processing of at least two dimensions on the semantic features by adopting an embedding network to obtain at least two embedding features; the at least two dimensions include at least one of a distance dimension and: the characteristics of the distance dimension are the entity types extracted from the semantic characteristics and the distance characteristics between every two words in the text to be identified;

The splicing unit is used for splicing at least two embedded features to obtain spliced features;

the perception processing unit is used for processing the spliced characteristics by adopting a multi-layer perception machine to obtain perceived characteristics;

the cavity convolution processing unit is used for processing the perceived characteristics by adopting a cavity convolution network to obtain fusion characteristics; and

and the decoding submodule is used for decoding the fusion characteristics to obtain a second labeling matrix.

13. A training device for a named entity recognition model, wherein the named entity recognition model comprises a coding submodel and a decoding submodel; the device comprises:

the marking sequence obtaining module is used for obtaining marking sequences according to the sample text and the entity types of the named entities in the sample text; the sample text has a true value of a labeling matrix; the true value of the marking matrix indicates the target mark and the semantic adjacent relation in the mark sequence;

the semantic coding module is used for carrying out semantic coding on the marking sequence by adopting the coding submodel to obtain semantic features of the entity type and the sample text;

the feature decoding module is used for decoding the semantic features by adopting the decoding sub-model to obtain a labeling matrix predicted value; the marking matrix predicted value indicates a target marking predicted value and a semantic adjacent relation predicted value; and

A model training module for training the named entity recognition model according to the difference between the true value of the labeling matrix and the predicted value of the labeling matrix,

the decoding sub-model comprises an embedded network, a first fusion network and a perceptron classifier which are sequentially connected, and the feature decoding module comprises:

the embedding processing sub-module is used for carrying out embedding processing of at least two dimensions on the semantic features by adopting an embedding network to obtain at least two embedding features; the at least two dimensions include at least one of a distance dimension and: the characteristics of the distance dimension are the entity types extracted from the semantic characteristics and the distance characteristics between every two words in the sample text;

the splicing unit is used for splicing at least two embedded features by adopting a splicing sub-network to obtain spliced features;

The cavity convolution processing unit is used for processing the perceived characteristics by adopting a cavity convolution sub-network to obtain fusion characteristics; and

and the decoding submodule is used for decoding the fusion characteristics by adopting a perceptron classifier to obtain a second labeling matrix.

14. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-11.

15. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of any one of claims 1-11.

16. A computer program product comprising computer programs/instructions stored on at least one of a readable storage medium and an electronic device, which when executed by a processor, implement the steps of the method according to any one of claims 1-11.