WO2023173556A1

WO2023173556A1 - Deep learning-based named entity recognition method and apparatus, device, and medium

Info

Publication number: WO2023173556A1
Application number: PCT/CN2022/090740
Authority: WO
Inventors: 姜鹏
Original assignee: 平安科技（深圳）有限公司
Priority date: 2022-03-15
Filing date: 2022-04-29
Publication date: 2023-09-21
Also published as: CN114611517B; CN114611517A

Abstract

The present application relates to the technical fields of artificial intelligence and natural language processing, and provides a deep learning-based named entity recognition method and apparatus, a device, and a medium. The method comprises: recognizing a plurality of candidate spans from a sentence to be processed, so as to recognize all possible candidate spans with the length not exceeding a preset recognition length threshold, and then forming a candidate span set, thereby solving the problem that a nested entity with a relatively long span cannot be recognized; screening the candidate spans in the candidate span set to remove low-quality candidate spans, so as to obtain at least one first forward span, thereby reducing the subsequent calculation overhead; predicting a boundary deviation value corresponding to the first forward span by means of a first neural network to obtain a target span; and predicting an entity classification corresponding to the target span by means of a second neural network. In this case, the span boundary can be finely adjusted on the basis of the predicted boundary deviation value, such that the final target span overlaps a real span as much as possible to reach or close to the ideal state of complete overlapping, thereby improving the entity recognition accuracy.

Description

Named entity recognition methods, devices, equipment and media based on deep learning

This application requests the priority of the Chinese patent application submitted to the China Patent Office on March 15, 2022, with the application number 202210255150.1 and the invention title "Named entity recognition method, device, equipment and medium based on deep learning", and its entire content incorporated herein by reference.

Technical field

This application relates to the fields of artificial intelligence technology and natural language processing, and in particular to a named entity recognition method, device, electronic device and computer-readable storage medium based on deep learning.

Background technique

Named Entity Recognition (NER) is a basic task in natural language processing. It is widely used in downstream tasks such as knowledge extraction and graph construction. Its main task is to extract entity nouns involved in text. Specifically, The purpose is to identify the start/end index position of each entity and the entity category.

Conventional entity recognition uses the sequence labeling model in deep learning to label each semantic unit in the text sentence to obtain the unique label of the semantic unit, and obtain entity fragments by combining the labels. In actual tasks, there are often nested entities in some text sentences. Nested entities here refer to multiple nouns that make up an entity, and there are entities whose individual nouns belong to another category.

technical problem

The following are technical problems of the prior art that the inventor is aware of: conventional sequence annotation models cannot solve the problem of nested entity recognition. For the recognition of nested entities, related technologies have proposed a method to change the target of the sequence classification task from single label to multi-label. Entity recognition methods based on reading comprehension (MRC) and hypergraph-based entity recognition methods have also been proposed. , but these methods still cannot solve the problem of identifying nested entities with a large span.

Technical solutions

In the first aspect, embodiments of this application propose a named entity recognition method based on deep learning. The method includes: obtaining a sentence to be processed; identifying multiple candidate spans from the sentence to be processed based on different preset recognition lengths. , obtain a set of candidate spans, wherein the preset recognition length is less than the preset recognition length threshold, and the set of candidate spans includes multiple candidate spans with lengths less than or equal to the preset recognition length threshold; for the set of candidate spans Screen the candidate spans to obtain at least one first forward span; predict the boundary offset value corresponding to the first forward span through the preset first neural network; according to the boundary offset value corresponding to the first forward span Shift the value, adjust the boundary of the first forward span, and obtain the target span based on the first forward span after adjusting the boundary; predict the entity classification corresponding to the target span through the preset second neural network.

In the second aspect, an embodiment of the present application proposes a named entity recognition device based on deep learning. The device includes:

Acquisition module, used to obtain sentences to be processed;

A candidate span determination module, configured to traverse the sentence to be processed based on different preset recognition lengths to obtain a candidate span set, wherein the preset recognition length is less than a preset recognition length threshold, and the candidate span set includes multiple A candidate span whose length is less than or equal to the preset recognition length threshold;

A screening module, used to screen candidate spans in the candidate span set to obtain at least one first forward span;

A first prediction module, configured to predict the boundary offset value corresponding to the first forward span through a preset first neural network;

A target span determination module, configured to perform boundary adjustment on the first forward span according to the boundary offset value corresponding to the first forward span, and obtain the target span based on the first forward span after adjusting the boundary. ;

The second prediction module is used to predict the entity classification corresponding to the target span through a preset second neural network.

In a third aspect, embodiments of the present application provide an electronic device, including:

at least one processor;

and, a memory communicatively connected to the at least one processor;

Wherein, the memory stores a computer program, and the computer program is executed by the at least one processor, so that the at least one processor can execute a named entity recognition method based on deep learning;

Among them, the named entity recognition method based on deep learning includes:

Get the sentence to be processed;

Multiple candidate spans are identified from the sentence to be processed based on different preset recognition lengths, and a candidate span set is obtained, wherein the preset recognition length is less than a preset recognition length threshold, and the candidate span set includes a plurality of span candidates with a length less than Candidate spans equal to the preset recognition length threshold;

Screen the candidate spans in the candidate span set to obtain at least one first forward span;

Predict the boundary offset value corresponding to the first forward span through a preset first neural network;

Adjust the boundary of the first forward span according to the boundary offset value corresponding to the first forward span, and obtain the target span based on the first forward span after adjusting the boundary;

Predict the entity classification corresponding to the target span through a preset second neural network.

In the fourth aspect, embodiments of the present application propose a computer-readable storage medium that stores a computer program. When the computer program is executed by a processor, a named entity recognition method based on deep learning is implemented;

Get the sentence to be processed;

beneficial effects

The embodiments of this application propose a named entity recognition method, device, electronic device and computer-readable storage medium based on deep learning. First, multiple candidate spans are identified from the sentence to be processed based on different preset recognition lengths, and then All possible candidate spans whose length does not exceed the preset recognition length threshold are identified, and then a candidate span set is formed to solve the problem that long-span nested entities cannot be identified. Then, the candidate spans in the candidate span set are screened, with the purpose of eliminating low-quality candidate spans and obtaining at least one first forward span, thereby reducing subsequent calculation overhead; predicting all the candidate spans through the preset first neural network The boundary offset value corresponding to the first forward span; according to the boundary offset value corresponding to the first forward span, adjust the boundary of the first forward span, and based on the first forward span after adjusting the boundary Obtain the target span from the span; predict the entity classification corresponding to the target span through the preset second neural network. In this way, the span boundary can be fine-tuned based on the predicted boundary offset value, so that the final target span overlaps the real span as much as possible, reaching or approaching the ideal state of complete overlap, thereby improving the accuracy of entity recognition.

Description of the drawings

Figure 1 is a schematic diagram of entity distribution in an exemplary text sentence provided by this application;

Figure 2 is a schematic flowchart of a named entity recognition method based on deep learning provided by an embodiment of the present application;

Figure 3 is a schematic diagram of the sub-steps of step S130 in Figure 2;

Figure 4 is a schematic diagram of the sub-steps of step S133 in Figure 3;

Figure 5 is a schematic structural diagram of a third neural network provided by an embodiment of the present application;

Figure 6 is a schematic diagram of the sub-steps of step S1333 in Figure 4;

Figure 7 is a schematic diagram of the sub-steps of step S140 in Figure 2;

Figure 8 is a schematic structural diagram of a second neural network provided by an embodiment of the present application;

Figure 9 is a schematic diagram of the sub-steps of step S160 in Figure 2;

Figure 10 is a schematic structural diagram of a deep learning-based named entity recognition device provided by an embodiment of the present application.

Embodiments of the invention

In order to make the purpose, technical solutions and advantages of the present application more clear, the present application will be further described in detail below with reference to the drawings and embodiments. It should be understood that the specific embodiments described here are only used to explain the present application and are not used to limit the present application.

It should be noted that, unless otherwise defined, all technical and scientific terms used herein have the same meanings as commonly understood by those skilled in the technical field belonging to this application. The terms used herein are only for the purpose of describing the embodiments of the present application and are not intended to limit the present application.

First, let’s analyze some terms involved in this application:

Natural Language Processing (NLP, Natural Language Processing): NLP is a kind of artificial intelligence that specializes in analyzing human language. Its working principle is roughly: receiving natural language, which evolved through the natural use of human beings. Human beings use it every day. Use it to communicate and translate natural language; analyze natural language and output results through probability-based algorithms.

Named Entity Recognition (NER): NER is a key basic task in NLP. The concept can be understood literally, that is, to identify entities with specific meanings in text, mainly including names of people, places, institutions, and expertise. There are nouns etc.

Embedding: Embedding is a vector representation, which refers to using a low-dimensional vector to represent an object. The object can be a word, a product, a movie, etc.; the nature of this embedding vector is that it can Objects corresponding to vectors with similar distances have similar meanings. For example, the distance between embedding (Avengers) and embedding (Iron Man) will be very close, but the distance between embedding (Avengers) and embedding (Gone with the Wind) It will be further away. Embedding is essentially a mapping, a mapping from semantic space to vector space, while maintaining the relationship between the original sample in the semantic space as much as possible in the vector space. For example, the positions of two words with close semantics are also relatively close in the vector space. Embedding can encode objects with low-dimensional vectors and retain their meaning. It is often used in machine learning. In the process of building a machine learning model, the object is encoded into a low-dimensional dense vector and then passed to the DNN to improve efficiency.

For example, see Figure 1, which shows the textual statement "The US Supreme Court will hear arguments from both sides on Friday and Florida's Leon County Circuit Court will consider the arguments on disputed state ballots on Saturday." Two types of entities, ORG (organization) and GPE (geopolitics), are marked. "Florida" and "Leon County" are both GPE type entities, and they are also part of the ORG type entity "Florida's Leon County Circuit Court", that is, There is a nested entity, and the entity span is significantly longer.

For the recognition of nested entities, related technologies have proposed a method to change the target of the sequence classification task from single label to multi-label. Entity recognition methods based on reading comprehension (MRC) and hypergraph-based entity recognition methods have also been proposed. , but these methods still cannot solve the problem of identifying nested entities with a large span.

The main purpose of the embodiments of this application is to propose a named entity recognition method, device, electronic device and computer-readable storage medium based on deep learning, aiming to solve the problem of identifying nested entities with a large span.

The embodiments of this application can obtain and process relevant data based on artificial intelligence technology. Among them, Artificial Intelligence (AI) is a theory, method, technology and application system that uses digital computers or machines controlled by digital computers to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge and use knowledge to obtain the best results. .

Basic artificial intelligence technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technology, operation/interaction systems, mechatronics and other technologies. Artificial intelligence software technology mainly includes computer vision technology, robotics technology, biometric technology, speech processing technology, natural language processing technology, and machine learning/deep learning.

The named entity recognition method provided by the embodiments of this application relates to the technical fields of artificial intelligence and natural language processing. The named entity recognition method provided by the embodiment of the present application can be applied to the terminal or the server, or can be software running in the terminal or the server. In some embodiments, the terminal can be a smartphone, a tablet, a laptop, a desktop computer, etc.; the server can be configured as an independent physical server, or as a server cluster or distributed system composed of multiple physical servers. A cloud that can be configured to provide basic cloud computing services such as cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communications, middleware services, domain name services, security services, CDN, and big data and artificial intelligence platforms. Server; software can be an application that implements named entity recognition methods, etc., but is not limited to the above forms.

The application may be used in a variety of general or special purpose computer system environments or configurations. For example: personal computers, server computers, handheld or portable devices, tablet devices, multiprocessor systems, microprocessor-based systems, set-top boxes, programmable consumer electronics devices, network PCs, minicomputers, mainframe computers, including Distributed computing environment for any of the above systems or devices, etc. The application may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform specific tasks or implement specific abstract data types. The present application may also be practiced in distributed computing environments where tasks are performed by remote processing devices connected through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including storage devices.

Please refer to Figure 2. Figure 2 shows a schematic flowchart of a deep learning-based named entity recognition method proposed in an embodiment of the present application. As shown in Figure 2, the named entity identification method provided by the embodiment of this application includes but is not limited to the following steps:

Step S110: Obtain the sentence to be processed.

It can be understood that the sentence to be processed here is a sentence composed of multiple words, so the sentence to be processed is also regarded as a word sequence.

Step S120: Identify multiple candidate spans from the sentence to be processed based on different preset recognition lengths to obtain a candidate span set, wherein the preset recognition length is less than a preset recognition length threshold, and the candidate span set includes multiple spans. candidate spans whose length is less than or equal to the preset recognition length threshold.

For example, the preset recognition length threshold is L, and then the preset recognition lengths are determined to be 1, 2,..., L, and then based on each preset recognition length, the words in the sentence are traversed to extract all possible candidate spans. For example, in the first round, the sentences to be processed are traversed based on the preset recognition length of 1, and multiple candidate spans of length 1 are obtained; in the second round, the sentences to be processed are traversed based on the preset recognition length of 2, and spans with length 1 are obtained. Multiple candidate spans; and so on, until the last round of traversing the sentence to be processed based on the preset recognition length L, and obtaining multiple candidate spans of length L. In this way, all possible spans with lengths less than or equal to L can be obtained, and the obtained spans are formed into a candidate span set. Each span in the candidate span set is a candidate span.

It should be understood that the embodiments of the present application can identify entities with a maximum length of L, and the purpose of setting the identification length threshold L is to avoid the computational overhead caused by unlimited length. During specific implementation, those skilled in the art can flexibly implement it according to actual needs. Set the value of L accordingly. For example, for the example shown in Figure 1, when the value of L is set to 7, the ORG entity with a large span in Figure 1 can be identified. It can be seen that this application can solve the problem of identifying nested entities with a large span to a certain extent.

Step S130: Screen the candidate spans in the candidate span set to obtain at least one first forward span.

It can be understood that in order to save the computational cost in subsequent steps, the candidate spans obtained in step S120 can be screened first to eliminate some candidate spans with lower quality and reduce the number of spans to be processed in subsequent steps.

Specifically, please refer to Figure 3. Step S130 can be implemented through the following steps S131-S132:

Step S131, obtain the preset real span set;

Step S132, perform IOU calculation on the candidate span and the real span set to obtain the IOU value corresponding to the candidate span;

Step S133: Determine the first forward span from the candidate spans in the candidate span set according to the IOU value corresponding to each candidate span in the candidate span set.

It can be understood that the real span set is formed by collecting multiple real spans. In order to screen all candidate spans, you can perform IOU calculations on the candidate spans and the real span set to obtain the IOU values corresponding to the candidate spans. Then, based on the IOU values corresponding to each candidate span, all candidate spans are divided into positive spans and negative spans. There are two main categories of span, and then the negative span is eliminated, and only the positive span is retained.

Specifically, the IOU calculation between the candidate span and the real span set can be achieved through the following formula (1):

Among them, A represents the candidate span, B represents the real span set, and IoU(A,B) represents the IOU value of the candidate span.

It can be understood that IoU(A,B) is the ratio of the intersection and union of the candidate span and the real span set. Obviously, the higher the overlap between the two, the greater the score. If the overlap degree between the candidate span and the real span set is high, it means that the quality of the candidate span is higher and it can be used as a positive span; otherwise, it means that the quality of the candidate span is low and it can be used as a negative span.

As an optional implementation method, according to the IOU value corresponding to each candidate span in the candidate span set, K candidate spans with the largest IOU values can be selected from all candidate spans in the candidate span set as the candidate spans. Describing the first forward span, it is assumed that the candidate span set has N candidate spans, 0<K<N.

As another optional implementation method, please refer to Figure 4. Step S133 can be implemented through the following steps:

Step S1331, obtain the candidate span whose IOU value is greater than the preset IOU threshold from the candidate span set, and use the obtained candidate span as the second forward span;

It can be understood that after calculating the IOU value of each candidate span set, all candidate spans are sorted from large to small according to the IOU value, and then based on the preset screening number or screening ratio, the top K candidate spans are selected as The second forward span, here assumes that the candidate span set has N candidate spans, 0<K<N.

Step S1332: Obtain the embedding vector corresponding to each second forward span.

It can be understood that the second forward span is composed of multiple tokens, and the embedding vector of the second forward span is formed by splicing the embedding vectors of multiple tokens.

It can be understood that in entity recognition, in addition to the meaning information of the token itself, the position information in the sentence is also very important. For this reason, the second forward span of the embodiment of the present application also introduces the position information of the token. Specifically, the token embedding vector is expressed by the following formula (2):

h _i =E(t _i )+P _i (2)

Among them, h _i represents the embedding vector of the i-th token, E(t _i ) represents the word embedding vector of the i-th token, and _Pi represents the position embedding vector of the i-th token.

Step S1333: Input the embedding vector corresponding to the second forward span to a preset third neural network, so that the third neural network outputs the forward sample prediction probability corresponding to the second forward span.

For example, the third neural network can adopt the architecture of BI-LSTM network + fully connected network. The BI-LSTM network is used to extract features from the embedding vector corresponding to the second forward span, and then the extracted features are extracted through the fully connected network. Perform probability calculation to obtain the forward sample prediction probability of the second forward span. Here, the forward sample prediction probability represents the third neural network's prediction probability that the second forward span belongs to the forward sample.

Step S1334: Use the second forward span whose forward sample prediction probability is greater than the preset forward sample prediction probability threshold as the first forward span.

It can be understood that a forward sample prediction probability threshold is set in advance. When the forward sample prediction probability output by the third neural network is greater than the preset forward sample prediction probability threshold, the corresponding second forward span is determined as the first Forward span.

It can be understood that in this embodiment, the candidate spans are double screened through the IOU value and network prediction, and the finally obtained first forward span is a span with higher quality and a higher degree of overlap with the real span.

As a specific example, the third neural network in step S1333 includes at least two layers of the first BI-LSTM network and a first fully connected network, wherein the at least two layers of the first BI-LSTM network are connected in sequence, and the A fully connected network is connected with the last layer of the first BI-LSTM network. Please refer to Figure 5. Figure 5 shows a schematic structural diagram of a third neural network provided by an embodiment of the present application. In the example shown in Figure 4, the third neural network includes two layers of the first BI-LSTM network and one layer of In the first fully connected network, two layers of the first BI-LSTM network are stacked in sequence, and the first fully connected network is connected to the last layer of the first BI-LSTM network.

Please refer to Figure 6. Based on the third neural network provided in the above example, step S1333 can be implemented through the following steps:

Step S1333a, input the embedding vector corresponding to the second forward span to the first BI-LSTM network in the first layer of the third neural network;

Step S1333b, the first BI-LSTM network of the last layer of the third neural network outputs the feature vector corresponding to the second forward span;

Step S1333c: The first fully connected network of the third neural network processes the feature vector of the second forward span and outputs the forward sample prediction probability corresponding to the second forward span.

Since the purpose here is to achieve a two-class prediction of whether the second positive span belongs to a positive sample or a negative sample, the sigmoid function can be used to output the prediction probability of the positive sample corresponding to the second positive span.

It can be understood that the third neural network adopts a multi-layer BI-LSTM network, which can enhance the feature extraction capability of the third neural network so that the extracted features of the second forward span are more accurate. After the feature is extracted from the embedding vector corresponding to the second forward span through the multi-layer BI-LSTM network, the fully connected network uses the sigmoid function to perform probability calculation on the extracted features to obtain the forward sample prediction probability of the second forward span.

Step S140: Predict the boundary offset value corresponding to the first forward span through a preset first neural network.

It can be understood that although a span with a high degree of overlap with the real entity span can be obtained through step S130, in most cases, the first forward span obtained through step S130 only partially overlaps with the real entity span. larger. For example, taking the example shown in Figure 1 as an example, the first forward span "from both sides" is obtained through step S130, and the span boundary is (7,9), while the real entity span is "both sides" and the boundary is (8,9). Therefore, the main purpose of step S140 is to predict the boundary offset value corresponding to the first forward span, and fine-tune the boundary of the first forward span obtained in step S130 based on the predicted boundary offset value to make it as close as possible to the real span. The overlap is greater, ideally complete overlap.

In order to achieve the purpose of predicting the boundary offset value corresponding to the span, the first neural network may use a regression algorithm model to predict the correct boundary of the first forward span through the regression algorithm model.

Based on the first neural network using a regression algorithm model, see Figure 7, step S140 can be performed as follows:

Step S141: Move the boundaries of the first forward span according to a plurality of preset boundary movement units to obtain a plurality of third forward spans.

For example, considering that the boundary may shift to the left or right, span can be expanded. Taking span(7,9) as an example, when calculating the left boundary, the left boundary can be moved to the left respectively. Or move 0, 1, and 2 units to the right respectively, that is, get (5,9), (6,9), (7,9), (8,9), (9,9). In the process, it is possible There will be situations where the left boundary is less than 0 or the left boundary exceeds the right boundary. In such cases, we will replace it with the original span. In the same way, when processing the right boundary, the position of the left boundary remains unchanged and the right boundary is moved. In this way, by moving the boundary of the first forward span, multiple third forward spans can be obtained.

Step S142: Splice the token feature vectors corresponding to the plurality of third forward spans to obtain a spliced feature vector.

Step S143, use the regression algorithm model to calculate the boundary offset value corresponding to the first forward span through the following formula (3):

offset＝W ₃ ·GELU(W ₂ h+b ₂ )+b ₃ (3)

Wherein, the offset represents the boundary offset value corresponding to the first forward span, the GELU(˙) represents the activation function in the regression algorithm model, and the h represents the boundary offset value corresponding to the first forward span. Splicing feature vectors, the W ₂ represents the first weight matrix, the W ₃ represents the second weight matrix, the b ₂ represents the first bias parameter, and the b ₃ represents the second bias parameter.

Step S150: Adjust the boundary of the first forward span according to the boundary offset value corresponding to the first forward span, and obtain a target span based on the first forward span after adjusting the boundary.

Continuing to take span(7,9) as an example, after prediction by the regression algorithm model, offset=(0.63,-0.15), that is, the new boundary obtained is (7.63,8.85), and after integer processing, (8,9) is obtained, In this way, the correct boundaries of span can be obtained.

Step S160: Predict the entity classification corresponding to the target span through a preset second neural network.

After the target span is obtained by adjusting the first forward span, entity classification prediction can be performed on the target span.

As an example, the second neural network includes at least two layers of a second BI-LSTM network and a second fully connected network, wherein the at least two layers of the second BI-LSTM network are connected in sequence, and the second fully connected network Connect with the last layer of the second BI-LSTM network. Please refer to Figure 8. Figure 8 shows a schematic structural diagram of a second neural network provided by an embodiment of the present application. In the example shown in Figure 8, the second neural network includes two layers of the second BI-LSTM network and one layer of In the second fully connected network, two layers of the second BI-LSTM network are stacked in sequence, and the second fully connected network is connected to the last layer of the second BI-LSTM network.

Based on the second neural network provided in the above example, see Figure 9, step S160 can be implemented through the following steps:

Step S161, input the target span into the second BI-LSTM network in the first layer of the second neural network;

Step S162, the second BI-LSTM network of the last layer of the second neural network outputs the feature vector corresponding to the target span;

Step S163: The second fully connected network of the second neural network uses the softmax function to process the feature vector of the target span, and outputs the entity classification corresponding to the target span.

It can be understood that the second neural network adopts a multi-layer BI-LSTM network, which can enhance the feature extraction capability of the second neural network so that the extracted features of the target span are more accurate. After feature extraction of the target span through the multi-layer BI-LSTM network, the fully connected network uses the softmax function to calculate the probability of the extracted features to obtain the probability that the target span corresponds to each entity classification, and then determine the target span correspondence based on the calculated probability. entity classification. It can be understood that since this is a multi-class probability calculation, the softmax function is used for probability calculation.

The embodiment of this application proposes a named entity recognition method based on deep learning. First, multiple candidate spans are identified from the sentence to be processed based on different preset recognition lengths to identify spans whose length does not exceed the preset recognition length threshold. , all possible candidate spans, and then form a candidate span set to solve the problem that long-span nested entities cannot be identified. Then, the candidate spans in the candidate span set are screened, with the purpose of eliminating low-quality candidate spans and obtaining at least one first forward span, thereby reducing subsequent calculation overhead; predicting all the candidate spans through the preset first neural network The boundary offset value corresponding to the first forward span; according to the boundary offset value corresponding to the first forward span, adjust the boundary of the first forward span, and based on the first forward span after adjusting the boundary Obtain the target span from the span; predict the entity classification corresponding to the target span through the preset second neural network. In this way, the span boundary can be fine-tuned based on the predicted boundary offset value, so that the final target span overlaps the real span as much as possible, reaching or approaching the ideal state of complete overlap, thereby improving the accuracy of entity recognition.

Please refer to Figure 10. This embodiment of the present application proposes a named entity recognition device based on deep learning. The device includes:

Acquisition module, used to obtain sentences to be processed;

As an example, the screening module may specifically include:

The IOU calculation unit is used to obtain the preset real span set, perform IOU calculation on the candidate span and the real span set, and obtain the IOU value corresponding to the candidate span;

A first screening unit is configured to determine the first forward span from the candidate spans in the candidate span set according to the IOU value corresponding to each candidate span in the candidate span set.

As an example, the first screening unit is specifically configured to: obtain the candidate span whose IOU value is greater than the preset IOU threshold from the candidate span set, and use the obtained candidate span as the second forward span; obtain Each embedding vector corresponding to the second forward span; input the embedding vector corresponding to the second forward span to a preset third neural network, so that the third neural network outputs the second forward span Corresponding forward sample prediction probability; use the second forward span whose forward sample prediction probability is greater than the preset forward sample prediction probability threshold as the first forward span.

As an example, the third neural network includes at least two layers of the first BI-LSTM network and a first fully connected network, wherein the at least two layers of the first BI-LSTM network are connected in sequence, and the first fully connected network is connected with The last layer of the first BI-LSTM network is connected; the embedding vector corresponding to the second forward span is input to the preset third neural network, so that the third neural network outputs the second forward span Predicting the probability of the forward sample corresponding to the forward span includes: inputting the embedding vector corresponding to the second forward span into the first BI-LSTM network in the first layer of the third neural network; using the third The first BI-LSTM network of the last layer of the neural network outputs the feature vector corresponding to the second forward span; the first fully connected network of the third neural network uses the sigmoid function to calculate the second forward span. The feature vector of the span is processed, and the forward sample prediction probability corresponding to the second forward span is output.

As an example, the second forward span includes multiple tokens, and the embedding vector corresponding to the second forward span is formed by splicing the embedding vectors of multiple tokens. The token embedding vector is expressed by the following formula:

h _i =E(t _i )+P _i ;

As an example, the first neural network is a regression algorithm model; predicting the boundary offset value corresponding to the first forward span through the preset first neural network includes: moving units according to multiple preset boundaries Move the boundary of the first forward span to obtain a plurality of third forward spans; splice the token feature vectors corresponding to the plurality of third forward spans to obtain a splicing feature vector; use the regression algorithm The model calculates the boundary offset value corresponding to the first forward span through the following formula:

offset=W ₃ ·GELU(W ₂ h+b ₂ )+b ₃ ;

Where, offset represents the boundary offset value corresponding to the first forward span, the GELU(˙) represents the activation function in the regression algorithm model, and the h represents the splicing feature corresponding to the first forward span. Vector, the W ₂ represents the first weight matrix, the W ₃ represents the second weight matrix, the b ₂ represents the first bias parameter, and the b ₃ represents the second bias parameter.

As an example, the second neural network includes at least two layers of a second BI-LSTM network and a second fully connected network, wherein the at least two layers of the second BI-LSTM network are connected in sequence, and the second fully connected network is The last layer of the second BI-LSTM network connection; predicting the entity classification corresponding to the target span through the preset second neural network includes: inputting the target span into the second neural network's third node. One layer of the second BI-LSTM network; the last layer of the second BI-LSTM network of the second neural network outputs the feature vector corresponding to the target span; the second layer of the second neural network The fully connected network uses the softmax function to process the feature vector of the target span and outputs the entity classification corresponding to the target span.

The embodiment of the present application also provides an electronic device, including:

at least one processor;

and, a memory communicatively connected to the at least one processor;

Wherein, the memory stores a computer program, and the computer program is executed by the at least one processor, so that the at least one processor can execute a deep learning-based named entity recognition method; wherein the depth-based The learned named entity recognition method includes: obtaining a sentence to be processed; identifying multiple candidate spans from the sentence to be processed based on different preset recognition lengths to obtain a set of candidate spans, wherein the preset recognition length is smaller than the preset recognition length Recognition length threshold, the candidate span set includes a plurality of candidate spans with a length less than or equal to the preset recognition length threshold; filter the candidate spans in the candidate span set to obtain at least one first forward span; through the preset recognition length threshold Assume that the first neural network predicts the boundary offset value corresponding to the first forward span; adjusts the boundary of the first forward span according to the boundary offset value corresponding to the first forward span, and adjusts the boundary based on the adjustment The first forward span after the boundary is used to obtain the target span; the entity classification corresponding to the target span is predicted through the preset second neural network.

Embodiments of the present application also propose a computer-readable storage medium that stores a computer program. When the computer program is executed by a processor, it implements a deep learning-based named entity recognition method; wherein, the deep learning-based naming method The entity recognition method includes: obtaining a sentence to be processed; identifying multiple candidate spans from the sentence to be processed based on different preset recognition lengths to obtain a set of candidate spans, wherein the preset recognition length is less than a preset recognition length threshold , the candidate span set includes a plurality of candidate spans with a length less than or equal to the preset recognition length threshold; filter the candidate spans in the candidate span set to obtain at least one first forward span; through the preset third A neural network predicts the boundary offset value corresponding to the first forward span; adjusts the boundary of the first forward span according to the boundary offset value corresponding to the first forward span, and based on the adjusted boundary The first forward span obtains a target span; the entity classification corresponding to the target span is predicted through a preset second neural network.

The above embodiments can be used in combination, and modules with the same names in different embodiments may be the same or different.

The above computer-readable storage medium may be non-volatile or volatile.

While specific embodiments of the present application have been described, other embodiments are within the scope of the appended claims. In some cases, the actions or steps recited in the claims can be performed in a different order than in the embodiments and still achieve desired results. Additionally, the processes depicted in the figures do not necessarily require the specific order shown, or a sequential order, to achieve desirable results. Multitasking and parallel processing are also possible or may be advantageous in certain implementations.

Each embodiment in this application is described in a progressive manner. The same and similar parts between the various embodiments can be referred to each other. Each embodiment focuses on its differences from other embodiments. In particular, the apparatus, equipment, and computer-readable storage medium embodiments are described simply because they are basically similar to the method embodiments. For relevant details, please refer to the partial description of the method embodiments.

The devices, equipment, and computer-readable storage media provided by the embodiments of the present application correspond to the methods. Therefore, the devices, equipment, and non-volatile computer storage media also have beneficial technical effects similar to those of the corresponding methods. Since the methods have been discussed above, The beneficial technical effects are described in detail. Therefore, the beneficial technical effects of the corresponding devices, equipment, and computer storage media will not be described in detail here.

The systems, devices, modules or units described in the above embodiments may be implemented by computer chips or entities, or by products with certain functions. A typical implementation device is a computer. Specifically, the computer may be, for example, a personal computer, a laptop computer, a cellular phone, a camera phone, a smartphone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or A combination of any of these devices.

For the convenience of description, when describing the above device, the functions are divided into various units and described separately. Of course, when implementing the embodiments of the present application, the functions of each unit can be implemented in the same or multiple software and/or hardware.

Those skilled in the art should understand that embodiments of the present application may be provided as methods, systems, or computer program products. Therefore, embodiments of the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment that combines software and hardware aspects. Furthermore, embodiments of the present application may take the form of a computer program product implemented on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.

This specification is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each process and/or block in the flowchart illustrations and/or block diagrams, and combinations of processes and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing device to produce a machine, such that the instructions executed by the processor of the computer or other programmable data processing device produce a use A device for realizing the functions specified in one process or multiple processes of the flowchart and/or one block or multiple blocks of the block diagram.

These computer program instructions may also be stored in a computer-readable memory that causes a computer or other programmable data processing apparatus to operate in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including the instruction means, the instructions The device implements the functions specified in a process or processes of the flowchart and/or a block or blocks of the block diagram.

These computer program instructions may also be loaded onto a computer or other programmable data processing device, causing a series of operating steps to be performed on the computer or other programmable device to produce computer-implemented processing, thereby executing on the computer or other programmable device. Instructions provide steps for implementing the functions specified in a process or processes of a flowchart diagram and/or a block or blocks of a block diagram.

In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.

Memory may include non-permanent storage in computer-readable media, random access memory (RAM) and/or non-volatile memory in the form of read-only memory (ROM) or flash memory (flash RAM). Memory is an example of computer-readable media.

Computer-readable media includes both persistent and non-volatile, removable and non-removable media that can be implemented by any method or technology for storage of information. Information may be computer-readable instructions, data structures, modules of programs, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), and read-only memory. (ROM), electrically erasable programmable read-only memory (EEPROM), flash memory or other memory technology, compact disc read-only memory (CD-ROM), digital versatile disc (DVD) or other optical storage, Magnetic tape cartridges, magnetic tape storage or other magnetic storage devices or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined in this article, computer-readable media does not include transitory media, such as modulated data signals and carrier waves.

The above descriptions are only examples of the present application and are not intended to limit the present application. To those skilled in the art, various modifications and variations may be made to this application. Any modifications, equivalent substitutions, improvements, etc. made within the spirit and principles of this application shall be included in the scope of the claims of this application.

Claims

A named entity recognition method based on deep learning, wherein the method includes:

Get the sentence to be processed;

Multiple candidate spans are identified from the sentence to be processed based on different preset recognition lengths, and a candidate span set is obtained, wherein the preset recognition length is less than a preset recognition length threshold, and the candidate span set includes a plurality of span candidates with a length less than Candidate spans equal to the preset recognition length threshold;

Screen the candidate spans in the candidate span set to obtain at least one first forward span;

Predict the boundary offset value corresponding to the first forward span through a preset first neural network;

Adjust the boundary of the first forward span according to the boundary offset value corresponding to the first forward span, and obtain the target span based on the first forward span after adjusting the boundary;

Predict the entity classification corresponding to the target span through a preset second neural network.
The method according to claim 1, wherein filtering the candidate spans in the candidate span set to obtain at least one first forward span includes:

Get the default real span collection;

Calculate the IOU between the candidate span and the real span set to obtain the IOU value corresponding to the candidate span;

The first forward span is determined from the candidate spans in the candidate span set according to the IOU value corresponding to each candidate span in the candidate span set.
The method according to claim 2, wherein the first forward span is determined from the candidate spans in the candidate span set according to the IOU value corresponding to each candidate span in the candidate span set. ,include:

Obtain the candidate span whose IOU value is greater than the preset IOU threshold from the candidate span set, and use the obtained candidate span as the second forward span;

Obtain the embedding vector corresponding to each of the second forward spans;

Input the embedding vector corresponding to the second forward span into a preset third neural network, so that the third neural network outputs the forward sample prediction probability corresponding to the second forward span;

The second forward span with a forward sample prediction probability greater than a preset forward sample prediction probability threshold is used as the first forward span.
The method according to claim 3, wherein the third neural network includes at least two layers of the first BI-LSTM network and a first fully connected network, wherein the at least two layers of the first BI-LSTM network are connected in sequence, The first fully connected network is connected to the first BI-LSTM network of the last layer;

Inputting the embedding vector corresponding to the second forward span into a preset third neural network so that the third neural network outputs the forward sample prediction probability corresponding to the second forward span includes:

Input the embedding vector corresponding to the second forward span into the first BI-LSTM network in the first layer of the third neural network;

The first BI-LSTM network in the last layer of the third neural network outputs the feature vector corresponding to the second forward span;

The first fully connected network of the third neural network uses a sigmoid function to process the feature vector of the second forward span, and outputs the forward sample prediction probability corresponding to the second forward span.
The method according to claim 4, wherein the second forward span includes multiple tokens, the embedding vector corresponding to the second forward span is formed by splicing the embedding vectors of multiple tokens, and the token embedding vector is formed by It is represented by the following formula:

h i =E(t i )+P i ;

Among them, h i represents the embedding vector of the i-th token, E(t i ) represents the word embedding vector of the i-th token, and Pi represents the position embedding vector of the i-th token.
The method according to claim 1, wherein the first neural network is a regression algorithm model;

Predicting the boundary offset value corresponding to the first forward span through a preset first neural network includes:

Move the boundary of the first forward span according to multiple preset boundary movement units to obtain multiple third forward spans;

Splice the token feature vectors corresponding to the plurality of third forward spans to obtain a spliced feature vector;

The boundary offset value corresponding to the first forward span is calculated by the regression algorithm model through the following formula:

offset=W 2 ·GELU(W 1 h+b 1 )+b 2 ;

Wherein, the offset represents the boundary offset value corresponding to the first forward span, the GELU(˙) represents the activation function in the regression algorithm model, and the h represents the boundary offset value corresponding to the first forward span. Splicing feature vectors, the W 1 represents the first weight matrix, the W 2 represents the second weight matrix, the b 1 represents the first bias parameter, and the b 2 represents the second bias parameter.
The method of claim 1, wherein the second neural network includes at least two layers of a second BI-LSTM network and a second fully connected network, wherein the at least two layers of the second BI-LSTM network are connected in sequence, The second fully connected network is connected to the second BI-LSTM network of the last layer;

Predicting the entity classification corresponding to the target span through a preset second neural network includes:

Input the target span into the second BI-LSTM network in the first layer of the second neural network;

The second BI-LSTM network in the last layer of the second neural network outputs the feature vector corresponding to the target span;

The second fully connected network of the second neural network uses a softmax function to process the feature vector of the target span, and outputs the entity classification corresponding to the target span.
A named entity recognition device based on deep learning, wherein the device includes:

Acquisition module, used to obtain sentences to be processed;

A candidate span determination module, configured to traverse the sentence to be processed based on different preset recognition lengths to obtain a candidate span set, wherein the preset recognition length is less than a preset recognition length threshold, and the candidate span set includes multiple A candidate span whose length is less than or equal to the preset recognition length threshold;

A screening module, used to screen candidate spans in the candidate span set to obtain at least one first forward span;

A first prediction module, configured to predict the boundary offset value corresponding to the first forward span through a preset first neural network;

A target span determination module, configured to perform boundary adjustment on the first forward span according to the boundary offset value corresponding to the first forward span, and obtain the target span based on the first forward span after adjusting the boundary. ;

The second prediction module is used to predict the entity classification corresponding to the target span through a preset second neural network.
An electronic device, including:

at least one processor; and,

a memory communicatively connected to the at least one processor; wherein,

The memory stores a computer program, and the computer program is executed by the at least one processor, so that the at least one processor can execute a named entity recognition method based on deep learning;

Among them, the named entity recognition method based on deep learning includes:

Get the sentence to be processed;

Multiple candidate spans are identified from the sentence to be processed based on different preset recognition lengths, and a candidate span set is obtained, wherein the preset recognition length is less than a preset recognition length threshold, and the candidate span set includes a plurality of span candidates with a length less than Candidate spans equal to the preset recognition length threshold;

Screen the candidate spans in the candidate span set to obtain at least one first forward span;

Predict the boundary offset value corresponding to the first forward span through a preset first neural network;

Adjust the boundary of the first forward span according to the boundary offset value corresponding to the first forward span, and obtain the target span based on the first forward span after adjusting the boundary;

Predict the entity classification corresponding to the target span through a preset second neural network.
The electronic device according to claim 9, wherein the screening of candidate spans in the candidate span set to obtain at least one first forward span includes:

Get the default real span collection;

Calculate the IOU between the candidate span and the real span set to obtain the IOU value corresponding to the candidate span;

The first forward span is determined from the candidate spans in the candidate span set according to the IOU value corresponding to each candidate span in the candidate span set.
The electronic device according to claim 10, wherein the first forward direction is determined from the candidate spans in the candidate span set according to the IOU value corresponding to each candidate span in the candidate span set. span, including:

Obtain the candidate span whose IOU value is greater than the preset IOU threshold from the candidate span set, and use the obtained candidate span as the second forward span;

Obtain the embedding vector corresponding to each of the second forward spans;

Input the embedding vector corresponding to the second forward span into a preset third neural network, so that the third neural network outputs the forward sample prediction probability corresponding to the second forward span;

The second forward span with a forward sample prediction probability greater than a preset forward sample prediction probability threshold is used as the first forward span.
The electronic device according to claim 11, wherein the third neural network includes at least two layers of the first BI-LSTM network and a first fully connected network, wherein the at least two layers of the first BI-LSTM network are connected in sequence , the first fully connected network is connected to the first BI-LSTM network of the last layer;

Inputting the embedding vector corresponding to the second forward span into a preset third neural network so that the third neural network outputs the forward sample prediction probability corresponding to the second forward span includes:

Input the embedding vector corresponding to the second forward span into the first BI-LSTM network in the first layer of the third neural network;

The first BI-LSTM network in the last layer of the third neural network outputs the feature vector corresponding to the second forward span;

The first fully connected network of the third neural network uses a sigmoid function to process the feature vector of the second forward span, and outputs the forward sample prediction probability corresponding to the second forward span.
The electronic device according to claim 12, wherein the second forward span includes multiple tokens, the embedding vector corresponding to the second forward span is formed by splicing the embedding vectors of multiple tokens, and the token embedding vector Expressed by the following formula:

h i =E(t i )+P i ;

Among them, h i represents the embedding vector of the i-th token, E(t i ) represents the word embedding vector of the i-th token, and Pi represents the position embedding vector of the i-th token.
The electronic device according to claim 9, wherein the first neural network is a regression algorithm model;

Predicting the boundary offset value corresponding to the first forward span through a preset first neural network includes:

Move the boundary of the first forward span according to multiple preset boundary movement units to obtain multiple third forward spans;

Splice the token feature vectors corresponding to the plurality of third forward spans to obtain a spliced feature vector;

The boundary offset value corresponding to the first forward span is calculated by the regression algorithm model through the following formula:

offset=W 2 ·GELU(W 1 h+b 1 )+b 2 ;

Wherein, the offset represents the boundary offset value corresponding to the first forward span, the GELU(˙) represents the activation function in the regression algorithm model, and the h represents the boundary offset value corresponding to the first forward span. Splicing feature vectors, the W 1 represents the first weight matrix, the W 2 represents the second weight matrix, the b 1 represents the first bias parameter, and the b 2 represents the second bias parameter.
A computer-readable storage medium storing a computer program, wherein when the computer program is executed by a processor, a named entity recognition method based on deep learning is implemented;

Among them, the named entity recognition method based on deep learning includes:

Get the sentence to be processed;

Multiple candidate spans are identified from the sentence to be processed based on different preset recognition lengths, and a candidate span set is obtained, wherein the preset recognition length is less than a preset recognition length threshold, and the candidate span set includes a plurality of span candidates with a length less than Candidate spans equal to the preset recognition length threshold;

Screen the candidate spans in the candidate span set to obtain at least one first forward span;

Predict the boundary offset value corresponding to the first forward span through a preset first neural network;

Adjust the boundary of the first forward span according to the boundary offset value corresponding to the first forward span, and obtain the target span based on the first forward span after adjusting the boundary;

Predict the entity classification corresponding to the target span through a preset second neural network.
The computer-readable storage medium according to claim 15, wherein the filtering the candidate spans in the candidate span set to obtain at least one first forward span includes:

Get the default real span collection;

Calculate the IOU between the candidate span and the real span set to obtain the IOU value corresponding to the candidate span;

The first forward span is determined from the candidate spans in the candidate span set according to the IOU value corresponding to each candidate span in the candidate span set.
The computer-readable storage medium according to claim 16, wherein the first is determined from the candidate spans in the candidate span set according to the IOU value corresponding to each candidate span in the candidate span set. A forward span, including:

Obtain the candidate span whose IOU value is greater than the preset IOU threshold from the candidate span set, and use the obtained candidate span as the second forward span;

Obtain the embedding vector corresponding to each of the second forward spans;

Input the embedding vector corresponding to the second forward span into a preset third neural network, so that the third neural network outputs the forward sample prediction probability corresponding to the second forward span;

The second forward span with a forward sample prediction probability greater than a preset forward sample prediction probability threshold is used as the first forward span.
The computer-readable storage medium of claim 17, wherein the third neural network includes at least two layers of a first BI-LSTM network and a first fully connected network, wherein the at least two layers of a first BI-LSTM network The networks are connected in sequence, and the first fully connected network is connected to the first BI-LSTM network of the last layer;

Inputting the embedding vector corresponding to the second forward span into a preset third neural network so that the third neural network outputs the forward sample prediction probability corresponding to the second forward span includes:

Input the embedding vector corresponding to the second forward span into the first BI-LSTM network in the first layer of the third neural network;

The first BI-LSTM network in the last layer of the third neural network outputs the feature vector corresponding to the second forward span;

The first fully connected network of the third neural network uses a sigmoid function to process the feature vector of the second forward span, and outputs the forward sample prediction probability corresponding to the second forward span.
The computer-readable storage medium according to claim 18, wherein the second forward span includes multiple tokens, the embedding vector corresponding to the second forward span is formed by concatenating the embedding vectors of multiple tokens, and the The token embedding vector is represented by the following formula:

h i =E(t i )+P i ;

Among them, h i represents the embedding vector of the i-th token, E(t i ) represents the word embedding vector of the i-th token, and Pi represents the position embedding vector of the i-th token.
The computer-readable storage medium of claim 15, wherein the first neural network is a regression algorithm model;

Predicting the boundary offset value corresponding to the first forward span through a preset first neural network includes:

Move the boundary of the first forward span according to a plurality of preset boundary movement units to obtain a plurality of third forward spans;

Splice the token feature vectors corresponding to the plurality of third forward spans to obtain a spliced feature vector;

The boundary offset value corresponding to the first forward span is calculated by the regression algorithm model through the following formula:

offset=W 2 ·GELU(W 1 h+b 1 )+b 2 ;

Wherein, the offset represents the boundary offset value corresponding to the first forward span, the GELU(˙) represents the activation function in the regression algorithm model, and the h represents the boundary offset value corresponding to the first forward span. Splicing feature vectors, the W 1 represents the first weight matrix, the W 2 represents the second weight matrix, the b 1 represents the first bias parameter, and the b 2 represents the second bias parameter.