WO2023173556A1 - Deep learning-based named entity recognition method and apparatus, device, and medium - Google Patents

Deep learning-based named entity recognition method and apparatus, device, and medium Download PDF

Info

Publication number
WO2023173556A1
WO2023173556A1 PCT/CN2022/090740 CN2022090740W WO2023173556A1 WO 2023173556 A1 WO2023173556 A1 WO 2023173556A1 CN 2022090740 W CN2022090740 W CN 2022090740W WO 2023173556 A1 WO2023173556 A1 WO 2023173556A1
Authority
WO
WIPO (PCT)
Prior art keywords
span
candidate
neural network
preset
spans
Prior art date
Application number
PCT/CN2022/090740
Other languages
French (fr)
Chinese (zh)
Inventor
姜鹏
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2023173556A1 publication Critical patent/WO2023173556A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Definitions

  • This application relates to the fields of artificial intelligence technology and natural language processing, and in particular to a named entity recognition method, device, electronic device and computer-readable storage medium based on deep learning.
  • NER Named Entity Recognition
  • natural language processing It is widely used in downstream tasks such as knowledge extraction and graph construction. Its main task is to extract entity nouns involved in text. Specifically, The purpose is to identify the start/end index position of each entity and the entity category.
  • Nested entities refer to multiple nouns that make up an entity, and there are entities whose individual nouns belong to another category.
  • embodiments of this application propose a named entity recognition method based on deep learning.
  • the method includes: obtaining a sentence to be processed; identifying multiple candidate spans from the sentence to be processed based on different preset recognition lengths. , obtain a set of candidate spans, wherein the preset recognition length is less than the preset recognition length threshold, and the set of candidate spans includes multiple candidate spans with lengths less than or equal to the preset recognition length threshold; for the set of candidate spans Screen the candidate spans to obtain at least one first forward span; predict the boundary offset value corresponding to the first forward span through the preset first neural network; according to the boundary offset value corresponding to the first forward span Shift the value, adjust the boundary of the first forward span, and obtain the target span based on the first forward span after adjusting the boundary; predict the entity classification corresponding to the target span through the preset second neural network.
  • an embodiment of the present application proposes a named entity recognition device based on deep learning.
  • the device includes:
  • Acquisition module used to obtain sentences to be processed
  • a candidate span determination module configured to traverse the sentence to be processed based on different preset recognition lengths to obtain a candidate span set, wherein the preset recognition length is less than a preset recognition length threshold, and the candidate span set includes multiple A candidate span whose length is less than or equal to the preset recognition length threshold;
  • a screening module used to screen candidate spans in the candidate span set to obtain at least one first forward span
  • a first prediction module configured to predict the boundary offset value corresponding to the first forward span through a preset first neural network
  • a target span determination module configured to perform boundary adjustment on the first forward span according to the boundary offset value corresponding to the first forward span, and obtain the target span based on the first forward span after adjusting the boundary.
  • the second prediction module is used to predict the entity classification corresponding to the target span through a preset second neural network.
  • an electronic device including:
  • the memory stores a computer program, and the computer program is executed by the at least one processor, so that the at least one processor can execute a named entity recognition method based on deep learning;
  • the named entity recognition method based on deep learning includes:
  • Multiple candidate spans are identified from the sentence to be processed based on different preset recognition lengths, and a candidate span set is obtained, wherein the preset recognition length is less than a preset recognition length threshold, and the candidate span set includes a plurality of span candidates with a length less than Candidate spans equal to the preset recognition length threshold;
  • embodiments of the present application propose a computer-readable storage medium that stores a computer program.
  • the computer program is executed by a processor, a named entity recognition method based on deep learning is implemented;
  • the named entity recognition method based on deep learning includes:
  • Multiple candidate spans are identified from the sentence to be processed based on different preset recognition lengths, and a candidate span set is obtained, wherein the preset recognition length is less than a preset recognition length threshold, and the candidate span set includes a plurality of span candidates with a length less than Candidate spans equal to the preset recognition length threshold;
  • the embodiments of this application propose a named entity recognition method, device, electronic device and computer-readable storage medium based on deep learning. First, multiple candidate spans are identified from the sentence to be processed based on different preset recognition lengths, and then All possible candidate spans whose length does not exceed the preset recognition length threshold are identified, and then a candidate span set is formed to solve the problem that long-span nested entities cannot be identified.
  • the candidate spans in the candidate span set are screened, with the purpose of eliminating low-quality candidate spans and obtaining at least one first forward span, thereby reducing subsequent calculation overhead; predicting all the candidate spans through the preset first neural network
  • the boundary offset value corresponding to the first forward span according to the boundary offset value corresponding to the first forward span, adjust the boundary of the first forward span, and based on the first forward span after adjusting the boundary
  • the span boundary can be fine-tuned based on the predicted boundary offset value, so that the final target span overlaps the real span as much as possible, reaching or approaching the ideal state of complete overlap, thereby improving the accuracy of entity recognition.
  • Figure 1 is a schematic diagram of entity distribution in an exemplary text sentence provided by this application.
  • Figure 2 is a schematic flowchart of a named entity recognition method based on deep learning provided by an embodiment of the present application
  • FIG. 3 is a schematic diagram of the sub-steps of step S130 in Figure 2;
  • Figure 4 is a schematic diagram of the sub-steps of step S133 in Figure 3;
  • Figure 5 is a schematic structural diagram of a third neural network provided by an embodiment of the present application.
  • Figure 6 is a schematic diagram of the sub-steps of step S1333 in Figure 4;
  • FIG. 7 is a schematic diagram of the sub-steps of step S140 in Figure 2;
  • Figure 8 is a schematic structural diagram of a second neural network provided by an embodiment of the present application.
  • Figure 9 is a schematic diagram of the sub-steps of step S160 in Figure 2;
  • Figure 10 is a schematic structural diagram of a deep learning-based named entity recognition device provided by an embodiment of the present application.
  • NLP Natural Language Processing
  • Natural Language Processing is a kind of artificial intelligence that specializes in analyzing human language. Its working principle is roughly: receiving natural language, which evolved through the natural use of human beings. Human beings use it every day. Use it to communicate and translate natural language; analyze natural language and output results through probability-based algorithms.
  • NER Named Entity Recognition
  • Embedding is a vector representation, which refers to using a low-dimensional vector to represent an object.
  • the object can be a word, a product, a movie, etc.; the nature of this embedding vector is that it can Objects corresponding to vectors with similar distances have similar meanings. For example, the distance between embedding (Avengers) and embedding (Iron Man) will be very close, but the distance between embedding (Avengers) and embedding (Gone with the Wind) It will be further away.
  • Embedding is essentially a mapping, a mapping from semantic space to vector space, while maintaining the relationship between the original sample in the semantic space as much as possible in the vector space.
  • Embedding can encode objects with low-dimensional vectors and retain their meaning. It is often used in machine learning. In the process of building a machine learning model, the object is encoded into a low-dimensional dense vector and then passed to the DNN to improve efficiency.
  • Nested entities refer to multiple nouns that make up an entity, and there are entities whose individual nouns belong to another category.
  • the main purpose of the embodiments of this application is to propose a named entity recognition method, device, electronic device and computer-readable storage medium based on deep learning, aiming to solve the problem of identifying nested entities with a large span.
  • AI Artificial Intelligence
  • digital computers or machines controlled by digital computers to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge and use knowledge to obtain the best results.
  • Basic artificial intelligence technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technology, operation/interaction systems, mechatronics and other technologies.
  • Artificial intelligence software technology mainly includes computer vision technology, robotics technology, biometric technology, speech processing technology, natural language processing technology, and machine learning/deep learning.
  • the named entity recognition method provided by the embodiments of this application relates to the technical fields of artificial intelligence and natural language processing.
  • the named entity recognition method provided by the embodiment of the present application can be applied to the terminal or the server, or can be software running in the terminal or the server.
  • the terminal can be a smartphone, a tablet, a laptop, a desktop computer, etc.
  • the server can be configured as an independent physical server, or as a server cluster or distributed system composed of multiple physical servers.
  • a cloud that can be configured to provide basic cloud computing services such as cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communications, middleware services, domain name services, security services, CDN, and big data and artificial intelligence platforms.
  • Server software can be an application that implements named entity recognition methods, etc., but is not limited to the above forms.
  • the application may be used in a variety of general or special purpose computer system environments or configurations. For example: personal computers, server computers, handheld or portable devices, tablet devices, multiprocessor systems, microprocessor-based systems, set-top boxes, programmable consumer electronics devices, network PCs, minicomputers, mainframe computers, including Distributed computing environment for any of the above systems or devices, etc.
  • the application may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer.
  • program modules include routines, programs, objects, components, data structures, etc. that perform specific tasks or implement specific abstract data types.
  • the present application may also be practiced in distributed computing environments where tasks are performed by remote processing devices connected through a communications network.
  • program modules may be located in both local and remote computer storage media including storage devices.
  • Figure 2 shows a schematic flowchart of a deep learning-based named entity recognition method proposed in an embodiment of the present application.
  • the named entity identification method provided by the embodiment of this application includes but is not limited to the following steps:
  • Step S110 Obtain the sentence to be processed.
  • the sentence to be processed here is a sentence composed of multiple words, so the sentence to be processed is also regarded as a word sequence.
  • Step S120 Identify multiple candidate spans from the sentence to be processed based on different preset recognition lengths to obtain a candidate span set, wherein the preset recognition length is less than a preset recognition length threshold, and the candidate span set includes multiple spans. candidate spans whose length is less than or equal to the preset recognition length threshold.
  • the preset recognition length threshold is L, and then the preset recognition lengths are determined to be 1, 2,..., L, and then based on each preset recognition length, the words in the sentence are traversed to extract all possible candidate spans.
  • the sentences to be processed are traversed based on the preset recognition length of 1, and multiple candidate spans of length 1 are obtained; in the second round, the sentences to be processed are traversed based on the preset recognition length of 2, and spans with length 1 are obtained. Multiple candidate spans; and so on, until the last round of traversing the sentence to be processed based on the preset recognition length L, and obtaining multiple candidate spans of length L.
  • all possible spans with lengths less than or equal to L can be obtained, and the obtained spans are formed into a candidate span set.
  • Each span in the candidate span set is a candidate span.
  • the embodiments of the present application can identify entities with a maximum length of L, and the purpose of setting the identification length threshold L is to avoid the computational overhead caused by unlimited length.
  • the identification length threshold L is to avoid the computational overhead caused by unlimited length.
  • those skilled in the art can flexibly implement it according to actual needs. Set the value of L accordingly. For example, for the example shown in Figure 1, when the value of L is set to 7, the ORG entity with a large span in Figure 1 can be identified. It can be seen that this application can solve the problem of identifying nested entities with a large span to a certain extent.
  • Step S130 Screen the candidate spans in the candidate span set to obtain at least one first forward span.
  • the candidate spans obtained in step S120 can be screened first to eliminate some candidate spans with lower quality and reduce the number of spans to be processed in subsequent steps.
  • Step S130 can be implemented through the following steps S131-S132:
  • Step S131 obtain the preset real span set
  • Step S132 perform IOU calculation on the candidate span and the real span set to obtain the IOU value corresponding to the candidate span;
  • Step S133 Determine the first forward span from the candidate spans in the candidate span set according to the IOU value corresponding to each candidate span in the candidate span set.
  • the real span set is formed by collecting multiple real spans. In order to screen all candidate spans, you can perform IOU calculations on the candidate spans and the real span set to obtain the IOU values corresponding to the candidate spans. Then, based on the IOU values corresponding to each candidate span, all candidate spans are divided into positive spans and negative spans. There are two main categories of span, and then the negative span is eliminated, and only the positive span is retained.
  • the IOU calculation between the candidate span and the real span set can be achieved through the following formula (1):
  • A represents the candidate span
  • B represents the real span set
  • IoU(A,B) represents the IOU value of the candidate span.
  • IoU(A,B) is the ratio of the intersection and union of the candidate span and the real span set. Obviously, the higher the overlap between the two, the greater the score. If the overlap degree between the candidate span and the real span set is high, it means that the quality of the candidate span is higher and it can be used as a positive span; otherwise, it means that the quality of the candidate span is low and it can be used as a negative span.
  • K candidate spans with the largest IOU values can be selected from all candidate spans in the candidate span set as the candidate spans. Describing the first forward span, it is assumed that the candidate span set has N candidate spans, 0 ⁇ K ⁇ N.
  • Step S133 can be implemented through the following steps:
  • Step S1331 obtain the candidate span whose IOU value is greater than the preset IOU threshold from the candidate span set, and use the obtained candidate span as the second forward span;
  • Step S1332 Obtain the embedding vector corresponding to each second forward span.
  • the second forward span is composed of multiple tokens
  • the embedding vector of the second forward span is formed by splicing the embedding vectors of multiple tokens.
  • the token embedding vector is expressed by the following formula (2):
  • h i represents the embedding vector of the i-th token
  • E(t i ) represents the word embedding vector of the i-th token
  • Pi represents the position embedding vector of the i-th token.
  • Step S1333 Input the embedding vector corresponding to the second forward span to a preset third neural network, so that the third neural network outputs the forward sample prediction probability corresponding to the second forward span.
  • the third neural network can adopt the architecture of BI-LSTM network + fully connected network.
  • the BI-LSTM network is used to extract features from the embedding vector corresponding to the second forward span, and then the extracted features are extracted through the fully connected network. Perform probability calculation to obtain the forward sample prediction probability of the second forward span.
  • the forward sample prediction probability represents the third neural network's prediction probability that the second forward span belongs to the forward sample.
  • Step S1334 Use the second forward span whose forward sample prediction probability is greater than the preset forward sample prediction probability threshold as the first forward span.
  • a forward sample prediction probability threshold is set in advance. When the forward sample prediction probability output by the third neural network is greater than the preset forward sample prediction probability threshold, the corresponding second forward span is determined as the first Forward span.
  • the candidate spans are double screened through the IOU value and network prediction, and the finally obtained first forward span is a span with higher quality and a higher degree of overlap with the real span.
  • the third neural network in step S1333 includes at least two layers of the first BI-LSTM network and a first fully connected network, wherein the at least two layers of the first BI-LSTM network are connected in sequence, and the A fully connected network is connected with the last layer of the first BI-LSTM network.
  • Figure 5 shows a schematic structural diagram of a third neural network provided by an embodiment of the present application.
  • the third neural network includes two layers of the first BI-LSTM network and one layer of In the first fully connected network, two layers of the first BI-LSTM network are stacked in sequence, and the first fully connected network is connected to the last layer of the first BI-LSTM network.
  • step S1333 can be implemented through the following steps:
  • Step S1333a input the embedding vector corresponding to the second forward span to the first BI-LSTM network in the first layer of the third neural network;
  • Step S1333b the first BI-LSTM network of the last layer of the third neural network outputs the feature vector corresponding to the second forward span;
  • Step S1333c The first fully connected network of the third neural network processes the feature vector of the second forward span and outputs the forward sample prediction probability corresponding to the second forward span.
  • the sigmoid function can be used to output the prediction probability of the positive sample corresponding to the second positive span.
  • the third neural network adopts a multi-layer BI-LSTM network, which can enhance the feature extraction capability of the third neural network so that the extracted features of the second forward span are more accurate.
  • the fully connected network uses the sigmoid function to perform probability calculation on the extracted features to obtain the forward sample prediction probability of the second forward span.
  • Step S140 Predict the boundary offset value corresponding to the first forward span through a preset first neural network.
  • step S130 a span with a high degree of overlap with the real entity span can be obtained through step S130, in most cases, the first forward span obtained through step S130 only partially overlaps with the real entity span. larger.
  • the main purpose of step S140 is to predict the boundary offset value corresponding to the first forward span, and fine-tune the boundary of the first forward span obtained in step S130 based on the predicted boundary offset value to make it as close as possible to the real span.
  • the overlap is greater, ideally complete overlap.
  • the first neural network may use a regression algorithm model to predict the correct boundary of the first forward span through the regression algorithm model.
  • step S140 can be performed as follows:
  • Step S141 Move the boundaries of the first forward span according to a plurality of preset boundary movement units to obtain a plurality of third forward spans.
  • span can be expanded.
  • span(7,9) when calculating the left boundary, the left boundary can be moved to the left respectively. Or move 0, 1, and 2 units to the right respectively, that is, get (5,9), (6,9), (7,9), (8,9), (9,9).
  • the left boundary is less than 0 or the left boundary exceeds the right boundary. In such cases, we will replace it with the original span.
  • the position of the left boundary remains unchanged and the right boundary is moved. In this way, by moving the boundary of the first forward span, multiple third forward spans can be obtained.
  • Step S142 Splice the token feature vectors corresponding to the plurality of third forward spans to obtain a spliced feature vector.
  • Step S143 use the regression algorithm model to calculate the boundary offset value corresponding to the first forward span through the following formula (3):
  • the offset represents the boundary offset value corresponding to the first forward span
  • the GELU( ⁇ ) represents the activation function in the regression algorithm model
  • the h represents the boundary offset value corresponding to the first forward span.
  • Step S150 Adjust the boundary of the first forward span according to the boundary offset value corresponding to the first forward span, and obtain a target span based on the first forward span after adjusting the boundary.
  • Step S160 Predict the entity classification corresponding to the target span through a preset second neural network.
  • entity classification prediction can be performed on the target span.
  • the second neural network includes at least two layers of a second BI-LSTM network and a second fully connected network, wherein the at least two layers of the second BI-LSTM network are connected in sequence, and the second fully connected network Connect with the last layer of the second BI-LSTM network.
  • Figure 8 shows a schematic structural diagram of a second neural network provided by an embodiment of the present application.
  • the second neural network includes two layers of the second BI-LSTM network and one layer of In the second fully connected network, two layers of the second BI-LSTM network are stacked in sequence, and the second fully connected network is connected to the last layer of the second BI-LSTM network.
  • step S160 can be implemented through the following steps:
  • Step S161 input the target span into the second BI-LSTM network in the first layer of the second neural network
  • Step S162 the second BI-LSTM network of the last layer of the second neural network outputs the feature vector corresponding to the target span;
  • Step S163 The second fully connected network of the second neural network uses the softmax function to process the feature vector of the target span, and outputs the entity classification corresponding to the target span.
  • the second neural network adopts a multi-layer BI-LSTM network, which can enhance the feature extraction capability of the second neural network so that the extracted features of the target span are more accurate.
  • the fully connected network uses the softmax function to calculate the probability of the extracted features to obtain the probability that the target span corresponds to each entity classification, and then determine the target span correspondence based on the calculated probability.
  • entity classification since this is a multi-class probability calculation, the softmax function is used for probability calculation.
  • the embodiment of this application proposes a named entity recognition method based on deep learning. First, multiple candidate spans are identified from the sentence to be processed based on different preset recognition lengths to identify spans whose length does not exceed the preset recognition length threshold. , all possible candidate spans, and then form a candidate span set to solve the problem that long-span nested entities cannot be identified.
  • the candidate spans in the candidate span set are screened, with the purpose of eliminating low-quality candidate spans and obtaining at least one first forward span, thereby reducing subsequent calculation overhead; predicting all the candidate spans through the preset first neural network
  • the boundary offset value corresponding to the first forward span according to the boundary offset value corresponding to the first forward span, adjust the boundary of the first forward span, and based on the first forward span after adjusting the boundary
  • the span boundary can be fine-tuned based on the predicted boundary offset value, so that the final target span overlaps the real span as much as possible, reaching or approaching the ideal state of complete overlap, thereby improving the accuracy of entity recognition.
  • This embodiment of the present application proposes a named entity recognition device based on deep learning.
  • the device includes:
  • Acquisition module used to obtain sentences to be processed
  • a candidate span determination module configured to traverse the sentence to be processed based on different preset recognition lengths to obtain a candidate span set, wherein the preset recognition length is less than a preset recognition length threshold, and the candidate span set includes multiple A candidate span whose length is less than or equal to the preset recognition length threshold;
  • a screening module used to screen candidate spans in the candidate span set to obtain at least one first forward span
  • a first prediction module configured to predict the boundary offset value corresponding to the first forward span through a preset first neural network
  • a target span determination module configured to perform boundary adjustment on the first forward span according to the boundary offset value corresponding to the first forward span, and obtain the target span based on the first forward span after adjusting the boundary.
  • the second prediction module is used to predict the entity classification corresponding to the target span through a preset second neural network.
  • the screening module may specifically include:
  • the IOU calculation unit is used to obtain the preset real span set, perform IOU calculation on the candidate span and the real span set, and obtain the IOU value corresponding to the candidate span;
  • a first screening unit is configured to determine the first forward span from the candidate spans in the candidate span set according to the IOU value corresponding to each candidate span in the candidate span set.
  • the first screening unit is specifically configured to: obtain the candidate span whose IOU value is greater than the preset IOU threshold from the candidate span set, and use the obtained candidate span as the second forward span; obtain Each embedding vector corresponding to the second forward span; input the embedding vector corresponding to the second forward span to a preset third neural network, so that the third neural network outputs the second forward span Corresponding forward sample prediction probability; use the second forward span whose forward sample prediction probability is greater than the preset forward sample prediction probability threshold as the first forward span.
  • the third neural network includes at least two layers of the first BI-LSTM network and a first fully connected network, wherein the at least two layers of the first BI-LSTM network are connected in sequence, and the first fully connected network is connected with The last layer of the first BI-LSTM network is connected; the embedding vector corresponding to the second forward span is input to the preset third neural network, so that the third neural network outputs the second forward span
  • Predicting the probability of the forward sample corresponding to the forward span includes: inputting the embedding vector corresponding to the second forward span into the first BI-LSTM network in the first layer of the third neural network; using the third The first BI-LSTM network of the last layer of the neural network outputs the feature vector corresponding to the second forward span; the first fully connected network of the third neural network uses the sigmoid function to calculate the second forward span.
  • the feature vector of the span is processed, and the forward sample prediction probability corresponding to the second forward span is output.
  • the second forward span includes multiple tokens
  • the embedding vector corresponding to the second forward span is formed by splicing the embedding vectors of multiple tokens.
  • the token embedding vector is expressed by the following formula:
  • h i represents the embedding vector of the i-th token
  • E(t i ) represents the word embedding vector of the i-th token
  • Pi represents the position embedding vector of the i-th token.
  • the first neural network is a regression algorithm model; predicting the boundary offset value corresponding to the first forward span through the preset first neural network includes: moving units according to multiple preset boundaries Move the boundary of the first forward span to obtain a plurality of third forward spans; splice the token feature vectors corresponding to the plurality of third forward spans to obtain a splicing feature vector; use the regression algorithm
  • the model calculates the boundary offset value corresponding to the first forward span through the following formula:
  • offset represents the boundary offset value corresponding to the first forward span
  • the GELU( ⁇ ) represents the activation function in the regression algorithm model
  • the h represents the splicing feature corresponding to the first forward span.
  • the W 2 represents the first weight matrix
  • the W 3 represents the second weight matrix
  • the b 2 represents the first bias parameter
  • the b 3 represents the second bias parameter.
  • the second neural network includes at least two layers of a second BI-LSTM network and a second fully connected network, wherein the at least two layers of the second BI-LSTM network are connected in sequence, and the second fully connected network is The last layer of the second BI-LSTM network connection; predicting the entity classification corresponding to the target span through the preset second neural network includes: inputting the target span into the second neural network's third node.
  • One layer of the second BI-LSTM network; the last layer of the second BI-LSTM network of the second neural network outputs the feature vector corresponding to the target span; the second layer of the second neural network
  • the fully connected network uses the softmax function to process the feature vector of the target span and outputs the entity classification corresponding to the target span.
  • the embodiment of the present application also provides an electronic device, including:
  • the memory stores a computer program, and the computer program is executed by the at least one processor, so that the at least one processor can execute a deep learning-based named entity recognition method; wherein the depth-based The learned named entity recognition method includes: obtaining a sentence to be processed; identifying multiple candidate spans from the sentence to be processed based on different preset recognition lengths to obtain a set of candidate spans, wherein the preset recognition length is smaller than the preset recognition length Recognition length threshold, the candidate span set includes a plurality of candidate spans with a length less than or equal to the preset recognition length threshold; filter the candidate spans in the candidate span set to obtain at least one first forward span; through the preset recognition length threshold Assume that the first neural network predicts the boundary offset value corresponding to the first forward span; adjusts the boundary of the first forward span according to the boundary offset value corresponding to the first forward span, and adjusts the boundary based on the adjustment The first forward span after the boundary is used to obtain the target span; the entity classification corresponding to the target span is predicted through the
  • Embodiments of the present application also propose a computer-readable storage medium that stores a computer program.
  • the computer program When the computer program is executed by a processor, it implements a deep learning-based named entity recognition method; wherein, the deep learning-based naming method
  • the entity recognition method includes: obtaining a sentence to be processed; identifying multiple candidate spans from the sentence to be processed based on different preset recognition lengths to obtain a set of candidate spans, wherein the preset recognition length is less than a preset recognition length threshold , the candidate span set includes a plurality of candidate spans with a length less than or equal to the preset recognition length threshold; filter the candidate spans in the candidate span set to obtain at least one first forward span; through the preset third A neural network predicts the boundary offset value corresponding to the first forward span; adjusts the boundary of the first forward span according to the boundary offset value corresponding to the first forward span, and based on the adjusted boundary
  • the first forward span obtains a target span; the entity classification corresponding to the target span
  • the above computer-readable storage medium may be non-volatile or volatile.
  • the devices, equipment, and computer-readable storage media provided by the embodiments of the present application correspond to the methods. Therefore, the devices, equipment, and non-volatile computer storage media also have beneficial technical effects similar to those of the corresponding methods. Since the methods have been discussed above, The beneficial technical effects are described in detail. Therefore, the beneficial technical effects of the corresponding devices, equipment, and computer storage media will not be described in detail here.
  • a typical implementation device is a computer.
  • the computer may be, for example, a personal computer, a laptop computer, a cellular phone, a camera phone, a smartphone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or A combination of any of these devices.
  • embodiments of the present application may be provided as methods, systems, or computer program products. Therefore, embodiments of the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment that combines software and hardware aspects. Furthermore, embodiments of the present application may take the form of a computer program product implemented on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.
  • computer-usable storage media including but not limited to disk storage, CD-ROM, optical storage, etc.
  • These computer program instructions may also be stored in a computer-readable memory that causes a computer or other programmable data processing apparatus to operate in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including the instruction means, the instructions
  • the device implements the functions specified in a process or processes of the flowchart and/or a block or blocks of the block diagram.
  • These computer program instructions may also be loaded onto a computer or other programmable data processing device, causing a series of operating steps to be performed on the computer or other programmable device to produce computer-implemented processing, thereby executing on the computer or other programmable device.
  • Instructions provide steps for implementing the functions specified in a process or processes of a flowchart diagram and/or a block or blocks of a block diagram.
  • a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
  • processors CPUs
  • input/output interfaces network interfaces
  • memory volatile and non-volatile memory
  • Memory may include non-permanent storage in computer-readable media, random access memory (RAM) and/or non-volatile memory in the form of read-only memory (ROM) or flash memory (flash RAM). Memory is an example of computer-readable media.
  • RAM random access memory
  • ROM read-only memory
  • flash RAM flash random access memory
  • Computer-readable media includes both persistent and non-volatile, removable and non-removable media that can be implemented by any method or technology for storage of information.
  • Information may be computer-readable instructions, data structures, modules of programs, or other data.
  • Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), and read-only memory.
  • PRAM phase change memory
  • SRAM static random access memory
  • DRAM dynamic random access memory
  • RAM random access memory
  • read-only memory read-only memory
  • ROM read-only memory
  • EEPROM electrically erasable programmable read-only memory
  • flash memory or other memory technology
  • compact disc read-only memory CD-ROM
  • DVD digital versatile disc
  • Magnetic tape cartridges magnetic tape storage or other magnetic storage devices or any other non-transmission medium that can be used to store information that can be accessed by a computing device.
  • computer-readable media does not include transitory media, such as modulated data signals and carrier waves.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Databases & Information Systems (AREA)
  • Image Analysis (AREA)

Abstract

The present application relates to the technical fields of artificial intelligence and natural language processing, and provides a deep learning-based named entity recognition method and apparatus, a device, and a medium. The method comprises: recognizing a plurality of candidate spans from a sentence to be processed, so as to recognize all possible candidate spans with the length not exceeding a preset recognition length threshold, and then forming a candidate span set, thereby solving the problem that a nested entity with a relatively long span cannot be recognized; screening the candidate spans in the candidate span set to remove low-quality candidate spans, so as to obtain at least one first forward span, thereby reducing the subsequent calculation overhead; predicting a boundary deviation value corresponding to the first forward span by means of a first neural network to obtain a target span; and predicting an entity classification corresponding to the target span by means of a second neural network. In this case, the span boundary can be finely adjusted on the basis of the predicted boundary deviation value, such that the final target span overlaps a real span as much as possible to reach or close to the ideal state of complete overlapping, thereby improving the entity recognition accuracy.

Description

基于深度学习的命名实体识别方法、装置、设备和介质Named entity recognition methods, devices, equipment and media based on deep learning
本申请要求于2022年3月15日提交中国专利局、申请号为202210255150.1,发明名称为“基于深度学习的命名实体识别方法、装置、设备和介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application requests the priority of the Chinese patent application submitted to the China Patent Office on March 15, 2022, with the application number 202210255150.1 and the invention title "Named entity recognition method, device, equipment and medium based on deep learning", and its entire content incorporated herein by reference.
技术领域Technical field
本申请涉及人工智能技术及自然语言处理领域,特别是涉及一种基于深度学习的命名实体识别方法、装置、电子设备和计算机可读存储介质。This application relates to the fields of artificial intelligence technology and natural language processing, and in particular to a named entity recognition method, device, electronic device and computer-readable storage medium based on deep learning.
背景技术Background technique
命名实体识别(Named Entity Recognition,简称NER)是自然语言处理中的一项基础性任务,广泛用于知识抽取、图谱构建等下游任务中,其主要任务是抽取文本中涉及到的实体名词,具体而言是识别每个实体的起始/结束索引位置以及实体类别。Named Entity Recognition (NER) is a basic task in natural language processing. It is widely used in downstream tasks such as knowledge extraction and graph construction. Its main task is to extract entity nouns involved in text. Specifically, The purpose is to identify the start/end index position of each entity and the entity category.
常规的实体识别是通过深度学习中的序列标注模型对文本语句中的每个语义单元进行标注,从而获得语义单元的唯一标签,通过对标签进行组合得到实体片段。在实际任务中,常出现一些文本语句存在着嵌套实体,这里的嵌套实体是指组成一个实体的多个名词中,存在着个别名词属于另一类别的实体。Conventional entity recognition uses the sequence labeling model in deep learning to label each semantic unit in the text sentence to obtain the unique label of the semantic unit, and obtain entity fragments by combining the labels. In actual tasks, there are often nested entities in some text sentences. Nested entities here refer to multiple nouns that make up an entity, and there are entities whose individual nouns belong to another category.
技术问题technical problem
以下是发明人意识到的现有技术的技术问题:常规的序列标注模型无法解决嵌套实体识别的问题。对于嵌套实体的识别,相关技术提出了一种将序列分类任务的目标从单标签变为多标签的方法,也有提出基于阅读理解(MRC)的实体识别方法、基于超图的实体识别方法等,但是这些方法仍无法解决跨度较大的嵌套实体的识别问题。The following are technical problems of the prior art that the inventor is aware of: conventional sequence annotation models cannot solve the problem of nested entity recognition. For the recognition of nested entities, related technologies have proposed a method to change the target of the sequence classification task from single label to multi-label. Entity recognition methods based on reading comprehension (MRC) and hypergraph-based entity recognition methods have also been proposed. , but these methods still cannot solve the problem of identifying nested entities with a large span.
技术解决方案Technical solutions
第一方面,本申请实施例提出了一种基于深度学习的命名实体识别方法,所述方法包括:获取待处理句子;基于不同的预设识别长度从所述待处理句子识别出多个候选span,得到候选span集合,其中,所述预设识别长度小于预设识别长度阈值,所述候选span集合包括多个长度小于等于所述预设识别长度阈值的候选span;对所述候选span集合中的候选span进行筛选,得到至少一个第一正向span;通过预设的第一神经网络预测所述第一正向span对应的边界偏移值;根据所述第一正向span对应的边界偏移值,调整所述第一正向span的边界,并基于调整边界后的所述第一正向span得到目标span;通过预设的第二神经网络预测所述目标span对应的实体分类。In the first aspect, embodiments of this application propose a named entity recognition method based on deep learning. The method includes: obtaining a sentence to be processed; identifying multiple candidate spans from the sentence to be processed based on different preset recognition lengths. , obtain a set of candidate spans, wherein the preset recognition length is less than the preset recognition length threshold, and the set of candidate spans includes multiple candidate spans with lengths less than or equal to the preset recognition length threshold; for the set of candidate spans Screen the candidate spans to obtain at least one first forward span; predict the boundary offset value corresponding to the first forward span through the preset first neural network; according to the boundary offset value corresponding to the first forward span Shift the value, adjust the boundary of the first forward span, and obtain the target span based on the first forward span after adjusting the boundary; predict the entity classification corresponding to the target span through the preset second neural network.
第二方面,本申请实施例提出了一种基于深度学习的命名实体识别装置,所述装置包括:In the second aspect, an embodiment of the present application proposes a named entity recognition device based on deep learning. The device includes:
获取模块,用于获取待处理句子;Acquisition module, used to obtain sentences to be processed;
候选span确定模块,用于基于不同的预设识别长度对所述待处理句子进行遍历,得到候选span集合,其中,所述预设识别长度小于预设识别长度阈值,所述候选span集合包括多个长度小于等于所述预设识别长度阈值的候选span;A candidate span determination module, configured to traverse the sentence to be processed based on different preset recognition lengths to obtain a candidate span set, wherein the preset recognition length is less than a preset recognition length threshold, and the candidate span set includes multiple A candidate span whose length is less than or equal to the preset recognition length threshold;
筛选模块,用于对所述候选span集合中的候选span进行筛选,得到至少一个第一正向span;A screening module, used to screen candidate spans in the candidate span set to obtain at least one first forward span;
第一预测模块,用于通过预设的第一神经网络预测所述第一正向span对应的边界偏移值;A first prediction module, configured to predict the boundary offset value corresponding to the first forward span through a preset first neural network;
目标span确定模块,用于根据所述第一正向span对应的边界偏移值,对所述第一正向 span进行边界调整,并基于调整边界后的所述第一正向span得到目标span;A target span determination module, configured to perform boundary adjustment on the first forward span according to the boundary offset value corresponding to the first forward span, and obtain the target span based on the first forward span after adjusting the boundary. ;
第二预测模块,用于通过预设的第二神经网络预测所述目标span对应的实体分类。The second prediction module is used to predict the entity classification corresponding to the target span through a preset second neural network.
第三方面,本申请实施例提出了一种电子设备,包括:In a third aspect, embodiments of the present application provide an electronic device, including:
至少一个处理器;at least one processor;
以及,与所述至少一个处理器通信连接的存储器;and, a memory communicatively connected to the at least one processor;
其中,所述存储器存储有计算机程序,所述计算机程序被所述至少一个处理器执行,以使所述至少一个处理器能够执行一种基于深度学习的命名实体识别方法;Wherein, the memory stores a computer program, and the computer program is executed by the at least one processor, so that the at least one processor can execute a named entity recognition method based on deep learning;
其中,所述基于深度学习的命名实体识别方法,包括:Among them, the named entity recognition method based on deep learning includes:
获取待处理句子;Get the sentence to be processed;
基于不同的预设识别长度从所述待处理句子识别出多个候选span,得到候选span集合,其中,所述预设识别长度小于预设识别长度阈值,所述候选span集合包括多个长度小于等于所述预设识别长度阈值的候选span;Multiple candidate spans are identified from the sentence to be processed based on different preset recognition lengths, and a candidate span set is obtained, wherein the preset recognition length is less than a preset recognition length threshold, and the candidate span set includes a plurality of span candidates with a length less than Candidate spans equal to the preset recognition length threshold;
对所述候选span集合中的候选span进行筛选,得到至少一个第一正向span;Screen the candidate spans in the candidate span set to obtain at least one first forward span;
通过预设的第一神经网络预测所述第一正向span对应的边界偏移值;Predict the boundary offset value corresponding to the first forward span through a preset first neural network;
根据所述第一正向span对应的边界偏移值,调整所述第一正向span的边界,并基于调整边界后的所述第一正向span得到目标span;Adjust the boundary of the first forward span according to the boundary offset value corresponding to the first forward span, and obtain the target span based on the first forward span after adjusting the boundary;
通过预设的第二神经网络预测所述目标span对应的实体分类。Predict the entity classification corresponding to the target span through a preset second neural network.
第四方面,本申请实施例提出了一种计算机可读存储介质,存储有计算机程序,所述计算机程序被处理器执行时实现一种基于深度学习的命名实体识别方法;In the fourth aspect, embodiments of the present application propose a computer-readable storage medium that stores a computer program. When the computer program is executed by a processor, a named entity recognition method based on deep learning is implemented;
其中,所述基于深度学习的命名实体识别方法,包括:Among them, the named entity recognition method based on deep learning includes:
获取待处理句子;Get the sentence to be processed;
基于不同的预设识别长度从所述待处理句子识别出多个候选span,得到候选span集合,其中,所述预设识别长度小于预设识别长度阈值,所述候选span集合包括多个长度小于等于所述预设识别长度阈值的候选span;Multiple candidate spans are identified from the sentence to be processed based on different preset recognition lengths, and a candidate span set is obtained, wherein the preset recognition length is less than a preset recognition length threshold, and the candidate span set includes a plurality of span candidates with a length less than Candidate spans equal to the preset recognition length threshold;
对所述候选span集合中的候选span进行筛选,得到至少一个第一正向span;Screen the candidate spans in the candidate span set to obtain at least one first forward span;
通过预设的第一神经网络预测所述第一正向span对应的边界偏移值;Predict the boundary offset value corresponding to the first forward span through a preset first neural network;
根据所述第一正向span对应的边界偏移值,调整所述第一正向span的边界,并基于调整边界后的所述第一正向span得到目标span;Adjust the boundary of the first forward span according to the boundary offset value corresponding to the first forward span, and obtain the target span based on the first forward span after adjusting the boundary;
通过预设的第二神经网络预测所述目标span对应的实体分类。Predict the entity classification corresponding to the target span through a preset second neural network.
有益效果beneficial effects
本申请实施例提出了一种基于深度学习的命名实体识别方法、装置、电子设备和计算机可读存储介质,首先基于不同的预设识别长度从所述待处理句子识别出多个候选span,以识别出长度不超过预设识别长度阈值的、所有可能的候选span,进而组成候选span集合,解决跨度较长的嵌套实体无法识别的问题。然后对所述候选span集合中的候选span进行筛选,目的是将低质量的候选span剔除,得到至少一个第一正向span,从而减少后续的计算开销;通过预设的第一神经网络预测所述第一正向span对应的边界偏移值;根据所述第一正向span对应的边界偏移值,调整所述第一正向span的边界,并基于调整边界后的所述第一正向span得到目标span;通过预设的第二神经网络预测所述目标span对应的实体分类。如此,能够基于预测的边界偏移值对span边界进行微调,使最终的目标span尽可能与真实的span重叠,达到或接近完全重叠的理想状态,从而提高实体识别准确性。The embodiments of this application propose a named entity recognition method, device, electronic device and computer-readable storage medium based on deep learning. First, multiple candidate spans are identified from the sentence to be processed based on different preset recognition lengths, and then All possible candidate spans whose length does not exceed the preset recognition length threshold are identified, and then a candidate span set is formed to solve the problem that long-span nested entities cannot be identified. Then, the candidate spans in the candidate span set are screened, with the purpose of eliminating low-quality candidate spans and obtaining at least one first forward span, thereby reducing subsequent calculation overhead; predicting all the candidate spans through the preset first neural network The boundary offset value corresponding to the first forward span; according to the boundary offset value corresponding to the first forward span, adjust the boundary of the first forward span, and based on the first forward span after adjusting the boundary Obtain the target span from the span; predict the entity classification corresponding to the target span through the preset second neural network. In this way, the span boundary can be fine-tuned based on the predicted boundary offset value, so that the final target span overlaps the real span as much as possible, reaching or approaching the ideal state of complete overlap, thereby improving the accuracy of entity recognition.
附图说明Description of the drawings
图1是本申请提供的一个示例性文本语句中的实体分布示意图;Figure 1 is a schematic diagram of entity distribution in an exemplary text sentence provided by this application;
图2是本申请实施例的提供的一种基于深度学习的命名实体识别方法的流程示意图;Figure 2 is a schematic flowchart of a named entity recognition method based on deep learning provided by an embodiment of the present application;
图3是图2中的步骤S130的子步骤示意图;Figure 3 is a schematic diagram of the sub-steps of step S130 in Figure 2;
图4是图3中的步骤S133的子步骤示意图;Figure 4 is a schematic diagram of the sub-steps of step S133 in Figure 3;
图5是本申请的一个实施例提供的第三神经网络的结构示意图;Figure 5 is a schematic structural diagram of a third neural network provided by an embodiment of the present application;
图6是图4中的步骤S1333的子步骤示意图;Figure 6 is a schematic diagram of the sub-steps of step S1333 in Figure 4;
图7是图2中的步骤S140的子步骤示意图;Figure 7 is a schematic diagram of the sub-steps of step S140 in Figure 2;
图8是本申请的一个实施例提供的第二神经网络的结构示意图;Figure 8 is a schematic structural diagram of a second neural network provided by an embodiment of the present application;
图9是图2中的步骤S160的子步骤示意图;Figure 9 is a schematic diagram of the sub-steps of step S160 in Figure 2;
图10是本申请实施例的提供的一种基于深度学习的命名实体识别装置的结构示意图。Figure 10 is a schematic structural diagram of a deep learning-based named entity recognition device provided by an embodiment of the present application.
本发明的实施方式Embodiments of the invention
为了使本申请的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本申请进行进一步详细说明。应当理解,此处所描述的具体实施例仅用以解释本申请,并不用于限定本申请。In order to make the purpose, technical solutions and advantages of the present application more clear, the present application will be further described in detail below with reference to the drawings and embodiments. It should be understood that the specific embodiments described here are only used to explain the present application and are not used to limit the present application.
需要说明的是,除非另有定义,本文所使用的所有的技术和科学术语与属于本申请的技术领域的技术人员通常理解的含义相同。本文中所使用的术语只是为了描述本申请实施例的目的,不是旨在限制本申请。It should be noted that, unless otherwise defined, all technical and scientific terms used herein have the same meanings as commonly understood by those skilled in the technical field belonging to this application. The terms used herein are only for the purpose of describing the embodiments of the present application and are not intended to limit the present application.
首先,对本申请中涉及的若干名词进行解析:First, let’s analyze some terms involved in this application:
自然语言处理(NLP,Natural Language Processing):NLP是一种专业分析人类语言的人工智能,其工作原理大概是:接收自然语言,这种语言是通过人类的自然使用演变而来的,人类每天都用它来交流转译自然语言;通过基于概率的算法分析自然语言并输出结果。Natural Language Processing (NLP, Natural Language Processing): NLP is a kind of artificial intelligence that specializes in analyzing human language. Its working principle is roughly: receiving natural language, which evolved through the natural use of human beings. Human beings use it every day. Use it to communicate and translate natural language; analyze natural language and output results through probability-based algorithms.
命名实体识别(Named Entity Recognition,NER):NER属于NLP中一个关键性基础任务,概念从字面意思上就可以理解,即识别文本中具有特定意义的实体,主要包括人名、地名、机构名、专有名词等。Named Entity Recognition (NER): NER is a key basic task in NLP. The concept can be understood literally, that is, to identify entities with specific meanings in text, mainly including names of people, places, institutions, and expertise. There are nouns etc.
嵌入(embedding):embedding是一种向量表征,是指用一个低维的向量表示一个物体,该物体可以是一个词,或是一个商品,或是一个电影等等;这个embedding向量的性质是能使距离相近的向量对应的物体有相近的含义,比如embedding(复仇者联盟)和embedding(钢铁侠)之间的距离就会很接近,但embedding(复仇者联盟)和embedding(乱世佳人)的距离就会远一些。embedding实质是一种映射,从语义空间到向量空间的映射,同时尽可能在向量空间保持原样本在语义空间的关系,如语义接近的两个词汇在向量空间中的位置也比较接近。embedding能够用低维向量对物体进行编码还能保留其含义,常应用于机器学习,在机器学习模型构建过程中,通过把物体编码为一个低维稠密向量再传给DNN,以提高效率。Embedding: Embedding is a vector representation, which refers to using a low-dimensional vector to represent an object. The object can be a word, a product, a movie, etc.; the nature of this embedding vector is that it can Objects corresponding to vectors with similar distances have similar meanings. For example, the distance between embedding (Avengers) and embedding (Iron Man) will be very close, but the distance between embedding (Avengers) and embedding (Gone with the Wind) It will be further away. Embedding is essentially a mapping, a mapping from semantic space to vector space, while maintaining the relationship between the original sample in the semantic space as much as possible in the vector space. For example, the positions of two words with close semantics are also relatively close in the vector space. Embedding can encode objects with low-dimensional vectors and retain their meaning. It is often used in machine learning. In the process of building a machine learning model, the object is encoded into a low-dimensional dense vector and then passed to the DNN to improve efficiency.
常规的实体识别是通过深度学习中的序列标注模型对文本语句中的每个语义单元进行标注,从而获得语义单元的唯一标签,通过对标签进行组合得到实体片段。在实际任务中,常出现一些文本语句存在着嵌套实体,这里的嵌套实体是指组成一个实体的多个名词中,存在着个别名词属于另一类别的实体。Conventional entity recognition uses the sequence labeling model in deep learning to label each semantic unit in the text sentence to obtain the unique label of the semantic unit, and obtain entity fragments by combining the labels. In actual tasks, there are often nested entities in some text sentences. Nested entities here refer to multiple nouns that make up an entity, and there are entities whose individual nouns belong to another category.
举例来说,请参见图1,图1所示的文本语句“The US Supreme Court will hear arguments from both sides on Friday and Florida's Leon County Circuit Court will consider the arguments on disputed state ballots on Saturday.”示例中,标注了ORG(组织)和GPE(地缘政治)两类实体,其中“Florida”与“Leon County”都是GPE类型的实体,同时又是“Florida’s Leon County Circuit Court”这个ORG类型实体的一部分,即存在嵌套实体,且该实体跨度明显较长。For example, see Figure 1, which shows the textual statement "The US Supreme Court will hear arguments from both sides on Friday and Florida's Leon County Circuit Court will consider the arguments on disputed state ballots on Saturday." Two types of entities, ORG (organization) and GPE (geopolitics), are marked. "Florida" and "Leon County" are both GPE type entities, and they are also part of the ORG type entity "Florida's Leon County Circuit Court", that is, There is a nested entity, and the entity span is significantly longer.
对于嵌套实体的识别,相关技术提出了一种将序列分类任务的目标从单标签变为多标签的方法,也有提出基于阅读理解(MRC)的实体识别方法、基于超图的实体识别方法等,但是这些方法仍无法解决跨度较大的嵌套实体的识别问题For the recognition of nested entities, related technologies have proposed a method to change the target of the sequence classification task from single label to multi-label. Entity recognition methods based on reading comprehension (MRC) and hypergraph-based entity recognition methods have also been proposed. , but these methods still cannot solve the problem of identifying nested entities with a large span.
本申请实施例的主要目的在于提出一种基于深度学习的命名实体识别方法、装置、电子设备和计算机可读存储介质,旨在解决跨度较大的嵌套实体的识别问题。The main purpose of the embodiments of this application is to propose a named entity recognition method, device, electronic device and computer-readable storage medium based on deep learning, aiming to solve the problem of identifying nested entities with a large span.
本申请实施例可以基于人工智能技术对相关的数据进行获取和处理。其中,人工智能 (Artificial Intelligence,AI)是利用数字计算机或者数字计算机控制的机器模拟、延伸和扩展人的智能,感知环境、获取知识并使用知识获得最佳结果的理论、方法、技术及应用系统。The embodiments of this application can obtain and process relevant data based on artificial intelligence technology. Among them, Artificial Intelligence (AI) is a theory, method, technology and application system that uses digital computers or machines controlled by digital computers to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge and use knowledge to obtain the best results. .
人工智能基础技术一般包括如传感器、专用人工智能芯片、云计算、分布式存储、大数据处理技术、操作/交互系统、机电一体化等技术。人工智能软件技术主要包括计算机视觉技术、机器人技术、生物识别技术、语音处理技术、自然语言处理技术以及机器学习/深度学习等几大方向。Basic artificial intelligence technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technology, operation/interaction systems, mechatronics and other technologies. Artificial intelligence software technology mainly includes computer vision technology, robotics technology, biometric technology, speech processing technology, natural language processing technology, and machine learning/deep learning.
本申请实施例提供的命名实体识别方法,涉及人工智能及自然语言处理技术领域。本申请实施例提供的命名实体识别方法可应用于终端中,也可应用于服务器端中,还可以是运行于终端或服务器端中的软件。在一些实施例中,终端可以是智能手机、平板电脑、笔记本电脑、台式计算机等;服务器端可以配置成独立的物理服务器,也可以配置成多个物理服务器构成的服务器集群或者分布式系统,还可以配置成提供云服务、云数据库、云计算、云函数、云存储、网络服务、云通信、中间件服务、域名服务、安全服务、CDN以及大数据和人工智能平台等基础云计算服务的云服务器;软件可以是实现命名实体识别方法的应用等,但并不局限于以上形式。The named entity recognition method provided by the embodiments of this application relates to the technical fields of artificial intelligence and natural language processing. The named entity recognition method provided by the embodiment of the present application can be applied to the terminal or the server, or can be software running in the terminal or the server. In some embodiments, the terminal can be a smartphone, a tablet, a laptop, a desktop computer, etc.; the server can be configured as an independent physical server, or as a server cluster or distributed system composed of multiple physical servers. A cloud that can be configured to provide basic cloud computing services such as cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communications, middleware services, domain name services, security services, CDN, and big data and artificial intelligence platforms. Server; software can be an application that implements named entity recognition methods, etc., but is not limited to the above forms.
本申请可用于众多通用或专用的计算机系统环境或配置中。例如:个人计算机、服务器计算机、手持设备或便携式设备、平板型设备、多处理器系统、基于微处理器的系统、置顶盒、可编程的消费电子设备、网络PC、小型计算机、大型计算机、包括以上任何系统或设备的分布式计算环境等等。本申请可以在由计算机执行的计算机可执行指令的一般上下文中描述,例如程序模块。一般地,程序模块包括执行特定任务或实现特定抽象数据类型的例程、程序、对象、组件、数据结构等等。也可以在分布式计算环境中实践本申请,在这些分布式计算环境中,由通过通信网络而被连接的远程处理设备来执行任务。在分布式计算环境中,程序模块可以位于包括存储设备在内的本地和远程计算机存储介质中。The application may be used in a variety of general or special purpose computer system environments or configurations. For example: personal computers, server computers, handheld or portable devices, tablet devices, multiprocessor systems, microprocessor-based systems, set-top boxes, programmable consumer electronics devices, network PCs, minicomputers, mainframe computers, including Distributed computing environment for any of the above systems or devices, etc. The application may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform specific tasks or implement specific abstract data types. The present application may also be practiced in distributed computing environments where tasks are performed by remote processing devices connected through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including storage devices.
请参见图2,图2示出了本申请实施例的提出的一种基于深度学习的命名实体识别方法的流程示意图。如图2所示,本申请实施例提供的命名实体识别方法包括但不限于如下步骤:Please refer to Figure 2. Figure 2 shows a schematic flowchart of a deep learning-based named entity recognition method proposed in an embodiment of the present application. As shown in Figure 2, the named entity identification method provided by the embodiment of this application includes but is not limited to the following steps:
步骤S110,获取待处理句子。Step S110: Obtain the sentence to be processed.
可以理解的是,这里待处理句子是由多个词组成的句子,因此待处理句子也看作是一个词序列。It can be understood that the sentence to be processed here is a sentence composed of multiple words, so the sentence to be processed is also regarded as a word sequence.
步骤S120,基于不同的预设识别长度从所述待处理句子识别出多个候选span,得到候选span集合,其中,所述预设识别长度小于预设识别长度阈值,所述候选span集合包括多个长度小于等于所述预设识别长度阈值的候选span。Step S120: Identify multiple candidate spans from the sentence to be processed based on different preset recognition lengths to obtain a candidate span set, wherein the preset recognition length is less than a preset recognition length threshold, and the candidate span set includes multiple spans. candidate spans whose length is less than or equal to the preset recognition length threshold.
示例性的,预设识别长度阈值为L,进而确定预设识别长度为1、2、……、L,然后基于各个预设识别长度,遍历句子中的词以提取出所有可能的候选span。例如,第一轮先基于预设识别长度为1对待处理句子进行遍历,得到长度为1的多个候选span;第二轮基于预设识别长度为2对待处理句子进行遍历,得到长度为1的多个候选span;如此类推,直至最后一轮基于预设识别长度为L对待处理句子进行遍历,得到长度为L的多个候选span。如此,可以获得长度小于等于L的所有可能的span,并将获得的span组成候选span集合,候选span集合中的每个span即为候选span。For example, the preset recognition length threshold is L, and then the preset recognition lengths are determined to be 1, 2,..., L, and then based on each preset recognition length, the words in the sentence are traversed to extract all possible candidate spans. For example, in the first round, the sentences to be processed are traversed based on the preset recognition length of 1, and multiple candidate spans of length 1 are obtained; in the second round, the sentences to be processed are traversed based on the preset recognition length of 2, and spans with length 1 are obtained. Multiple candidate spans; and so on, until the last round of traversing the sentence to be processed based on the preset recognition length L, and obtaining multiple candidate spans of length L. In this way, all possible spans with lengths less than or equal to L can be obtained, and the obtained spans are formed into a candidate span set. Each span in the candidate span set is a candidate span.
应能理解,本申请实施例能够识别长度最大为L的实体,而设置识别长度阈值L的目的是为了避免无限制长度带来的计算开销,具体实现时,本领域技术人员可根据实际需求灵活地设置L的数值。例如,针对图1所示示例,当将L的值设置为7,即能够识别图1中跨度较大的ORG实体。由此可见,本申请能在一定程度上解决跨度较大的嵌套实体的识别问题。It should be understood that the embodiments of the present application can identify entities with a maximum length of L, and the purpose of setting the identification length threshold L is to avoid the computational overhead caused by unlimited length. During specific implementation, those skilled in the art can flexibly implement it according to actual needs. Set the value of L accordingly. For example, for the example shown in Figure 1, when the value of L is set to 7, the ORG entity with a large span in Figure 1 can be identified. It can be seen that this application can solve the problem of identifying nested entities with a large span to a certain extent.
步骤S130,对所述候选span集合中的候选span进行筛选,得到至少一个第一正向span。Step S130: Screen the candidate spans in the candidate span set to obtain at least one first forward span.
可以理解的是,为了节约后续步骤的计算开销,可以先对步骤S120获得的候选span进行筛选,以剔除一些质量较低的候选span,减小后续步骤要处理的span数量。It can be understood that in order to save the computational cost in subsequent steps, the candidate spans obtained in step S120 can be screened first to eliminate some candidate spans with lower quality and reduce the number of spans to be processed in subsequent steps.
具体的,请参见图3,步骤S130可以通过如下步骤S131-S132实现:Specifically, please refer to Figure 3. Step S130 can be implemented through the following steps S131-S132:
步骤S131,获取预设的真实span集合;Step S131, obtain the preset real span set;
步骤S132,将候选span与所述真实span集合进行IOU计算,得到所述候选span对应的IOU值;Step S132, perform IOU calculation on the candidate span and the real span set to obtain the IOU value corresponding to the candidate span;
步骤S133,根据所述候选span集合中各个所述候选span对应的IOU值,从所述候选span集合的所述候选span中确定所述第一正向span。Step S133: Determine the first forward span from the candidate spans in the candidate span set according to the IOU value corresponding to each candidate span in the candidate span set.
可以理解的是,真实span集合是通过收集多个真实的span而形成的。为了对所有候选span进行筛选,可以将候选span与真实span集合进行IOU计算,得到候选span对应的IOU值,然后基于各个候选span对应的IOU值,将所有候选span分为正向span和负向span两大类,进而将负向span剔除,仅保留正向span。It can be understood that the real span set is formed by collecting multiple real spans. In order to screen all candidate spans, you can perform IOU calculations on the candidate spans and the real span set to obtain the IOU values corresponding to the candidate spans. Then, based on the IOU values corresponding to each candidate span, all candidate spans are divided into positive spans and negative spans. There are two main categories of span, and then the negative span is eliminated, and only the positive span is retained.
具体的,可以通过如下公式(1)实现候选span与所述真实span集合的IOU计算:Specifically, the IOU calculation between the candidate span and the real span set can be achieved through the following formula (1):
Figure PCTCN2022090740-appb-000001
Figure PCTCN2022090740-appb-000001
其中,A表示候选span,B表示真实span集合,IoU(A,B)表示候选span的IOU值。Among them, A represents the candidate span, B represents the real span set, and IoU(A,B) represents the IOU value of the candidate span.
可以理解的是,IoU(A,B)是候选span与真实span集合的交集与并集的比,显然二者重叠度越高分值越大。如果候选span与真实span集合重叠度较高,说明该候选span质量越高,可作为正向span;反之则证明该候选span质量低,可作为负向span。It can be understood that IoU(A,B) is the ratio of the intersection and union of the candidate span and the real span set. Obviously, the higher the overlap between the two, the greater the score. If the overlap degree between the candidate span and the real span set is high, it means that the quality of the candidate span is higher and it can be used as a positive span; otherwise, it means that the quality of the candidate span is low and it can be used as a negative span.
作为一种可选的实现方式,可以根据所述候选span集合中各个所述候选span对应的IOU值,从所述候选span集合的所有候选span中,选取IOU值最大的K个候选span作为所述第一正向span,这里假定候选span集合有N个候选span,0<K<N。As an optional implementation method, according to the IOU value corresponding to each candidate span in the candidate span set, K candidate spans with the largest IOU values can be selected from all candidate spans in the candidate span set as the candidate spans. Describing the first forward span, it is assumed that the candidate span set has N candidate spans, 0<K<N.
作为另一种可选的实现方式,请参见图4,步骤S133具体可以通过如下步骤实现:As another optional implementation method, please refer to Figure 4. Step S133 can be implemented through the following steps:
步骤S1331,从所述候选span集合中获取IOU值大于预设IOU阈值的所述候选span,并将获取的所述候选span作为第二正向span;Step S1331, obtain the candidate span whose IOU value is greater than the preset IOU threshold from the candidate span set, and use the obtained candidate span as the second forward span;
可以理解的是,在计算出各个候选span集合的IOU值之后,按照IOU值由大到小对所有候选span进行排序,然后基于预设的筛选个数或者筛选比例,选取前K个候选span作为第二正向span,这里假定候选span集合有N个候选span,0<K<N。It can be understood that after calculating the IOU value of each candidate span set, all candidate spans are sorted from large to small according to the IOU value, and then based on the preset screening number or screening ratio, the top K candidate spans are selected as The second forward span, here assumes that the candidate span set has N candidate spans, 0<K<N.
步骤S1332,获取各个所述第二正向span对应的嵌入向量。Step S1332: Obtain the embedding vector corresponding to each second forward span.
可以理解的是,所述第二正向span是由多个token组成,所述第二正向span的嵌入向量对应由多个token的嵌入向量拼接形成。It can be understood that the second forward span is composed of multiple tokens, and the embedding vector of the second forward span is formed by splicing the embedding vectors of multiple tokens.
可以理解的是,在实体识别中,除了token本身的词义信息外,在句子中的位置信息也很重要,为此,本申请实施例的第二正向span还引入了token的位置信息。具体的,所述token嵌入向量通过如下公式(2)表示:It can be understood that in entity recognition, in addition to the meaning information of the token itself, the position information in the sentence is also very important. For this reason, the second forward span of the embodiment of the present application also introduces the position information of the token. Specifically, the token embedding vector is expressed by the following formula (2):
h i=E(t i)+P i (2) h i =E(t i )+P i (2)
其中,h i表示第i个token的嵌入向量,E(t i)表示第i个token的词嵌入向量,P i表示第i个token的位置嵌入向量。 Among them, h i represents the embedding vector of the i-th token, E(t i ) represents the word embedding vector of the i-th token, and Pi represents the position embedding vector of the i-th token.
步骤S1333,将所述第二正向span对应的嵌入向量输入至预设的第三神经网络,使所述第三神经网络输出所述第二正向span对应的正向样本预测概率。Step S1333: Input the embedding vector corresponding to the second forward span to a preset third neural network, so that the third neural network outputs the forward sample prediction probability corresponding to the second forward span.
示例性的,第三神经网络可以采用BI-LSTM网络+全连接网络的架构,BI-LSTM网络用于对第二正向span对应的嵌入向量进行特征提取,然后通过全连接网络对提取的特征进行概率计算,得到第二正向span的正向样本预测概率。这里正向样本预测概率表征第三神经网络对第二正向span属于正向样本的预测概率。For example, the third neural network can adopt the architecture of BI-LSTM network + fully connected network. The BI-LSTM network is used to extract features from the embedding vector corresponding to the second forward span, and then the extracted features are extracted through the fully connected network. Perform probability calculation to obtain the forward sample prediction probability of the second forward span. Here, the forward sample prediction probability represents the third neural network's prediction probability that the second forward span belongs to the forward sample.
步骤S1334,将正向样本预测概率大于预设正向样本预测概率阈值的所述第二正向span作为所述第一正向span。Step S1334: Use the second forward span whose forward sample prediction probability is greater than the preset forward sample prediction probability threshold as the first forward span.
可以理解的是,预先设置一个正向样本预测概率阈值,当第三神经网络输出的正向样本预测概率大于预设正向样本预测概率阈值,则将对应的第二正向span确定为第一正向span。It can be understood that a forward sample prediction probability threshold is set in advance. When the forward sample prediction probability output by the third neural network is greater than the preset forward sample prediction probability threshold, the corresponding second forward span is determined as the first Forward span.
可以理解的是,本实施例中通过IOU值和网络预测,实现对候选span双重筛选,最终得到的第一正向span为质量较高的span,与真实的span重叠度较高。It can be understood that in this embodiment, the candidate spans are double screened through the IOU value and network prediction, and the finally obtained first forward span is a span with higher quality and a higher degree of overlap with the real span.
作为一种具体示例,步骤S1333中的第三神经网络包括至少两层第一BI-LSTM网络和第 一全连接网络,其中,所述至少两层第一BI-LSTM网络依次连接,所述第一全连接网络与最后一层所述第一BI-LSTM网络连接。请参见图5,图5示出了本申请一个实施例提供的第三神经网络的结构示意图,在图4所示的示例中,第三神经网络包括两层第一BI-LSTM网络和一层第一全连接网络,两层第一BI-LSTM网络依次堆叠,第一全连接网络与最后一层所述第一BI-LSTM网络连接。As a specific example, the third neural network in step S1333 includes at least two layers of the first BI-LSTM network and a first fully connected network, wherein the at least two layers of the first BI-LSTM network are connected in sequence, and the A fully connected network is connected with the last layer of the first BI-LSTM network. Please refer to Figure 5. Figure 5 shows a schematic structural diagram of a third neural network provided by an embodiment of the present application. In the example shown in Figure 4, the third neural network includes two layers of the first BI-LSTM network and one layer of In the first fully connected network, two layers of the first BI-LSTM network are stacked in sequence, and the first fully connected network is connected to the last layer of the first BI-LSTM network.
请参见图6,基于上述示例提供的第三神经网络,步骤S1333具体可以通过如下步骤实现:Please refer to Figure 6. Based on the third neural network provided in the above example, step S1333 can be implemented through the following steps:
步骤S1333a,将所述第二正向span对应的嵌入向量输入至所述第三神经网络的第一层所述第一BI-LSTM网络;Step S1333a, input the embedding vector corresponding to the second forward span to the first BI-LSTM network in the first layer of the third neural network;
步骤S1333b,由所述第三神经网络的最后一层所述第一BI-LSTM网络输出所述第二正向span对应的特征向量;Step S1333b, the first BI-LSTM network of the last layer of the third neural network outputs the feature vector corresponding to the second forward span;
步骤S1333c,由所述第三神经网络的第一全连接网络对所述第二正向span的特征向量进行处理,输出所述第二正向span对应的正向样本预测概率。Step S1333c: The first fully connected network of the third neural network processes the feature vector of the second forward span and outputs the forward sample prediction probability corresponding to the second forward span.
由于这里是要实现区分第二正向span属于正向样本还是负向样本的二分类预测,因此可以利用sigmoid函数,输出所述第二正向span对应的正向样本预测概率。Since the purpose here is to achieve a two-class prediction of whether the second positive span belongs to a positive sample or a negative sample, the sigmoid function can be used to output the prediction probability of the positive sample corresponding to the second positive span.
可以理解的是,第三神经网络采用多层BI-LSTM网络,可以增强第三神经网络的特征提取能力,以使提取出的第二正向span的特征更加精确。通过多层BI-LSTM网络对第二正向span对应的嵌入向量进行特征提取后,由全连接网络利用sigmoid函数对提取的特征进行概率计算,得到第二正向span的正向样本预测概率。It can be understood that the third neural network adopts a multi-layer BI-LSTM network, which can enhance the feature extraction capability of the third neural network so that the extracted features of the second forward span are more accurate. After the feature is extracted from the embedding vector corresponding to the second forward span through the multi-layer BI-LSTM network, the fully connected network uses the sigmoid function to perform probability calculation on the extracted features to obtain the forward sample prediction probability of the second forward span.
步骤S140,通过预设的第一神经网络预测所述第一正向span对应的边界偏移值。Step S140: Predict the boundary offset value corresponding to the first forward span through a preset first neural network.
可以理解的是,虽然经过步骤S130能够得到与真实的实体span重合度较高的span,但是大部分情况下,经过步骤S130得到的第一正向span与真实的实体span只是部分重叠且重叠部分较大。例如,以图1所示的示例为例,经过步骤S130得到了第一正向span“from both sides”,span边界为(7,9),而真实的实体span为“both sides”,边界为(8,9)。所以步骤S140的主要目的是预测第一正向span对应的边界偏移值,以基于预测的边界偏移值对步骤S130得到的第一正向span进行边界微调,使之尽可能与真实的span重叠度更大,理想状态下完全重叠。It can be understood that although a span with a high degree of overlap with the real entity span can be obtained through step S130, in most cases, the first forward span obtained through step S130 only partially overlaps with the real entity span. larger. For example, taking the example shown in Figure 1 as an example, the first forward span "from both sides" is obtained through step S130, and the span boundary is (7,9), while the real entity span is "both sides" and the boundary is (8,9). Therefore, the main purpose of step S140 is to predict the boundary offset value corresponding to the first forward span, and fine-tune the boundary of the first forward span obtained in step S130 based on the predicted boundary offset value to make it as close as possible to the real span. The overlap is greater, ideally complete overlap.
为了实现预测span对应的边界偏移值的目的,所述第一神经网络可以采用回归算法模型,通过回归算法模型预测第一正向span的正确边界。In order to achieve the purpose of predicting the boundary offset value corresponding to the span, the first neural network may use a regression algorithm model to predict the correct boundary of the first forward span through the regression algorithm model.
基于所述第一神经网络采用回归算法模型,请参见图7,步骤S140具体可以通过如下:Based on the first neural network using a regression algorithm model, see Figure 7, step S140 can be performed as follows:
步骤S141,根据预设的多个边界移动单位对所述第一正向span的边界进行移动,得到多个第三正向span。Step S141: Move the boundaries of the first forward span according to a plurality of preset boundary movement units to obtain a plurality of third forward spans.
举例来说,考虑到边界可能会向左或向右偏移,因此可以对span进行扩展,以span(7,9)为例,在计算左侧边界的时候,可以把左侧边界分别向左或向右分别移动0、1、2个单位,即得到(5,9)、(6,9)、(7,9)、(8,9)、(9,9),在此过程中可能会遇到左边界小于0或左边界超过右边界的情况,此类情况下,我们会用原span替代。同理,处理右边界的时候,则保持左边界位置不变,将右边界进行移动。如此,通过对第一正向span的边界进行移动,可以得到多个第三正向span。For example, considering that the boundary may shift to the left or right, span can be expanded. Taking span(7,9) as an example, when calculating the left boundary, the left boundary can be moved to the left respectively. Or move 0, 1, and 2 units to the right respectively, that is, get (5,9), (6,9), (7,9), (8,9), (9,9). In the process, it is possible There will be situations where the left boundary is less than 0 or the left boundary exceeds the right boundary. In such cases, we will replace it with the original span. In the same way, when processing the right boundary, the position of the left boundary remains unchanged and the right boundary is moved. In this way, by moving the boundary of the first forward span, multiple third forward spans can be obtained.
步骤S142,将所述多个第三正向span对应的token特征向量进行拼接,得到拼接特征向量。Step S142: Splice the token feature vectors corresponding to the plurality of third forward spans to obtain a spliced feature vector.
步骤S143,由所述回归算法模型通过如下公式(3)计算所述第一正向span对应的边界偏移值:Step S143, use the regression algorithm model to calculate the boundary offset value corresponding to the first forward span through the following formula (3):
offset=W 3·GELU(W 2h+b 2)+b 3 (3) offset=W 3 ·GELU(W 2 h+b 2 )+b 3 (3)
其中,所述offset表示所述第一正向span对应的边界偏移值,所述GELU(˙)表示所述回归算法模型中的激活函数,所述h表示所述第一正向span对应的拼接特征向量,所述W 2表示第一权重矩阵,所述W 3表示第二权重矩阵,所述b 2表示第一偏置参数,所述b 3表示第二偏 置参数。 Wherein, the offset represents the boundary offset value corresponding to the first forward span, the GELU(˙) represents the activation function in the regression algorithm model, and the h represents the boundary offset value corresponding to the first forward span. Splicing feature vectors, the W 2 represents the first weight matrix, the W 3 represents the second weight matrix, the b 2 represents the first bias parameter, and the b 3 represents the second bias parameter.
步骤S150,根据所述第一正向span对应的边界偏移值,调整所述第一正向span的边界,并基于调整边界后的所述第一正向span得到目标span。Step S150: Adjust the boundary of the first forward span according to the boundary offset value corresponding to the first forward span, and obtain a target span based on the first forward span after adjusting the boundary.
继续以span(7,9)为例,经过回归算法模型的预测,offset=(0.63,-0.15),即得到的新边界为(7.63,8.85),整数化处理后得到(8,9),如此便可以得到span正确的边界。Continuing to take span(7,9) as an example, after prediction by the regression algorithm model, offset=(0.63,-0.15), that is, the new boundary obtained is (7.63,8.85), and after integer processing, (8,9) is obtained, In this way, the correct boundaries of span can be obtained.
步骤S160,通过预设的第二神经网络预测所述目标span对应的实体分类。Step S160: Predict the entity classification corresponding to the target span through a preset second neural network.
在通过对第一正向span进行调整得到目标span后,即能够对目标span进行实体分类的预测。After the target span is obtained by adjusting the first forward span, entity classification prediction can be performed on the target span.
作为一个示例,所述第二神经网络包括至少两层第二BI-LSTM网络和第二全连接网络,其中,所述至少两层第二BI-LSTM网络依次连接,所述第二全连接网络与最后一层所述第二BI-LSTM网络连接。请参见图8,图8示出了本申请一个实施例提供的第二神经网络的结构示意图,在图8所示的示例中,第二神经网络包括两层第二BI-LSTM网络和一层第二全连接网络,两层第二BI-LSTM网络依次堆叠,第二全连接网络与最后一层所述第二BI-LSTM网络连接。As an example, the second neural network includes at least two layers of a second BI-LSTM network and a second fully connected network, wherein the at least two layers of the second BI-LSTM network are connected in sequence, and the second fully connected network Connect with the last layer of the second BI-LSTM network. Please refer to Figure 8. Figure 8 shows a schematic structural diagram of a second neural network provided by an embodiment of the present application. In the example shown in Figure 8, the second neural network includes two layers of the second BI-LSTM network and one layer of In the second fully connected network, two layers of the second BI-LSTM network are stacked in sequence, and the second fully connected network is connected to the last layer of the second BI-LSTM network.
基于上述示例提供的第二神经网络,请参见图9,步骤S160具体可以通过如下步骤实现:Based on the second neural network provided in the above example, see Figure 9, step S160 can be implemented through the following steps:
步骤S161,将所述目标span输入至所述第二神经网络的第一层所述第二BI-LSTM网络;Step S161, input the target span into the second BI-LSTM network in the first layer of the second neural network;
步骤S162,由所述第二神经网络的最后一层所述第二BI-LSTM网络输出所述目标span对应的特征向量;Step S162, the second BI-LSTM network of the last layer of the second neural network outputs the feature vector corresponding to the target span;
步骤S163,由所述第二神经网络的第二全连接网络利用softmax函数对所述目标span的特征向量进行处理,输出所述目标span对应的实体分类。Step S163: The second fully connected network of the second neural network uses the softmax function to process the feature vector of the target span, and outputs the entity classification corresponding to the target span.
可以理解的是,第二神经网络采用多层BI-LSTM网络,可以增强第二神经网络的特征提取能力,以使提取出的目标span的特征更加精确。通过多层BI-LSTM网络对目标span进行特征提取后,由全连接网络利用softmax函数对提取的特征进行概率计算,得到目标span对应各个实体分类的概率,进而基于计算出的概率确定目标span对应的实体分类。可以理解,由于这里是要做多分类的概率计算,因而采用softmax函数进行概率计算。It can be understood that the second neural network adopts a multi-layer BI-LSTM network, which can enhance the feature extraction capability of the second neural network so that the extracted features of the target span are more accurate. After feature extraction of the target span through the multi-layer BI-LSTM network, the fully connected network uses the softmax function to calculate the probability of the extracted features to obtain the probability that the target span corresponds to each entity classification, and then determine the target span correspondence based on the calculated probability. entity classification. It can be understood that since this is a multi-class probability calculation, the softmax function is used for probability calculation.
本申请实施例提出了一种基于深度学习的命名实体识别方法,首先基于不同的预设识别长度从所述待处理句子识别出多个候选span,以识别出长度不超过预设识别长度阈值的、所有可能的候选span,进而组成候选span集合,解决跨度较长的嵌套实体无法识别的问题。然后对所述候选span集合中的候选span进行筛选,目的是将低质量的候选span剔除,得到至少一个第一正向span,从而减少后续的计算开销;通过预设的第一神经网络预测所述第一正向span对应的边界偏移值;根据所述第一正向span对应的边界偏移值,调整所述第一正向span的边界,并基于调整边界后的所述第一正向span得到目标span;通过预设的第二神经网络预测所述目标span对应的实体分类。如此,能够基于预测的边界偏移值对span边界进行微调,使最终的目标span尽可能与真实的span重叠,达到或接近完全重叠的理想状态,从而提高实体识别准确性。The embodiment of this application proposes a named entity recognition method based on deep learning. First, multiple candidate spans are identified from the sentence to be processed based on different preset recognition lengths to identify spans whose length does not exceed the preset recognition length threshold. , all possible candidate spans, and then form a candidate span set to solve the problem that long-span nested entities cannot be identified. Then, the candidate spans in the candidate span set are screened, with the purpose of eliminating low-quality candidate spans and obtaining at least one first forward span, thereby reducing subsequent calculation overhead; predicting all the candidate spans through the preset first neural network The boundary offset value corresponding to the first forward span; according to the boundary offset value corresponding to the first forward span, adjust the boundary of the first forward span, and based on the first forward span after adjusting the boundary Obtain the target span from the span; predict the entity classification corresponding to the target span through the preset second neural network. In this way, the span boundary can be fine-tuned based on the predicted boundary offset value, so that the final target span overlaps the real span as much as possible, reaching or approaching the ideal state of complete overlap, thereby improving the accuracy of entity recognition.
请参见图10,本申请实施例提出了一种基于深度学习的命名实体识别装置,所述装置包括:Please refer to Figure 10. This embodiment of the present application proposes a named entity recognition device based on deep learning. The device includes:
获取模块,用于获取待处理句子;Acquisition module, used to obtain sentences to be processed;
候选span确定模块,用于基于不同的预设识别长度对所述待处理句子进行遍历,得到候选span集合,其中,所述预设识别长度小于预设识别长度阈值,所述候选span集合包括多个长度小于等于所述预设识别长度阈值的候选span;A candidate span determination module, configured to traverse the sentence to be processed based on different preset recognition lengths to obtain a candidate span set, wherein the preset recognition length is less than a preset recognition length threshold, and the candidate span set includes multiple A candidate span whose length is less than or equal to the preset recognition length threshold;
筛选模块,用于对所述候选span集合中的候选span进行筛选,得到至少一个第一正向span;A screening module, used to screen candidate spans in the candidate span set to obtain at least one first forward span;
第一预测模块,用于通过预设的第一神经网络预测所述第一正向span对应的边界偏移值;A first prediction module, configured to predict the boundary offset value corresponding to the first forward span through a preset first neural network;
目标span确定模块,用于根据所述第一正向span对应的边界偏移值,对所述第一正向 span进行边界调整,并基于调整边界后的所述第一正向span得到目标span;A target span determination module, configured to perform boundary adjustment on the first forward span according to the boundary offset value corresponding to the first forward span, and obtain the target span based on the first forward span after adjusting the boundary. ;
第二预测模块,用于通过预设的第二神经网络预测所述目标span对应的实体分类。The second prediction module is used to predict the entity classification corresponding to the target span through a preset second neural network.
作为示例,所述筛选模块,具体可以包括:As an example, the screening module may specifically include:
IOU计算单元,用于获取预设的真实span集合,将候选span与所述真实span集合进行IOU计算,得到所述候选span对应的IOU值;The IOU calculation unit is used to obtain the preset real span set, perform IOU calculation on the candidate span and the real span set, and obtain the IOU value corresponding to the candidate span;
第一筛选单元,用于根据所述候选span集合中各个所述候选span对应的IOU值,从所述候选span集合的所述候选span中确定所述第一正向span。A first screening unit is configured to determine the first forward span from the candidate spans in the candidate span set according to the IOU value corresponding to each candidate span in the candidate span set.
作为示例,所述第一筛选单元具体用于:从所述候选span集合中获取IOU值大于预设IOU阈值的所述候选span,并将获取的所述候选span作为第二正向span;获取各个所述第二正向span对应的嵌入向量;将所述第二正向span对应的嵌入向量输入至预设的第三神经网络,使所述第三神经网络输出所述第二正向span对应的正向样本预测概率;将正向样本预测概率大于预设正向样本预测概率阈值的所述第二正向span作为所述第一正向span。As an example, the first screening unit is specifically configured to: obtain the candidate span whose IOU value is greater than the preset IOU threshold from the candidate span set, and use the obtained candidate span as the second forward span; obtain Each embedding vector corresponding to the second forward span; input the embedding vector corresponding to the second forward span to a preset third neural network, so that the third neural network outputs the second forward span Corresponding forward sample prediction probability; use the second forward span whose forward sample prediction probability is greater than the preset forward sample prediction probability threshold as the first forward span.
作为示例,所述第三神经网络包括至少两层第一BI-LSTM网络和第一全连接网络,其中,所述至少两层第一BI-LSTM网络依次连接,所述第一全连接网络与最后一层所述第一BI-LSTM网络连接;所述将所述第二正向span对应的嵌入向量输入至预设的第三神经网络,使所述第三神经网络输出所述第二正向span对应的正向样本预测概率,包括:将所述第二正向span对应的嵌入向量输入至所述第三神经网络的第一层所述第一BI-LSTM网络;由所述第三神经网络的最后一层所述第一BI-LSTM网络输出所述第二正向span对应的特征向量;由所述第三神经网络的第一全连接网络利用sigmoid函数对所述第二正向span的特征向量进行处理,输出所述第二正向span对应的正向样本预测概率。As an example, the third neural network includes at least two layers of the first BI-LSTM network and a first fully connected network, wherein the at least two layers of the first BI-LSTM network are connected in sequence, and the first fully connected network is connected with The last layer of the first BI-LSTM network is connected; the embedding vector corresponding to the second forward span is input to the preset third neural network, so that the third neural network outputs the second forward span Predicting the probability of the forward sample corresponding to the forward span includes: inputting the embedding vector corresponding to the second forward span into the first BI-LSTM network in the first layer of the third neural network; using the third The first BI-LSTM network of the last layer of the neural network outputs the feature vector corresponding to the second forward span; the first fully connected network of the third neural network uses the sigmoid function to calculate the second forward span. The feature vector of the span is processed, and the forward sample prediction probability corresponding to the second forward span is output.
作为示例,所述第二正向span包括多个token,所述第二正向span对应的嵌入向量由多个token的嵌入向量拼接形成,所述token嵌入向量通过如下公式表示:As an example, the second forward span includes multiple tokens, and the embedding vector corresponding to the second forward span is formed by splicing the embedding vectors of multiple tokens. The token embedding vector is expressed by the following formula:
h i=E(t i)+P ih i =E(t i )+P i ;
其中,h i表示第i个token的嵌入向量,E(t i)表示第i个token的词嵌入向量,P i表示第i个token的位置嵌入向量。 Among them, h i represents the embedding vector of the i-th token, E(t i ) represents the word embedding vector of the i-th token, and Pi represents the position embedding vector of the i-th token.
作为示例,所述第一神经网络为回归算法模型;所述通过预设的第一神经网络预测所述第一正向span对应的边界偏移值,包括:根据预设的多个边界移动单位对所述第一正向span的边界进行移动,得到多个第三正向span;将所述多个第三正向span对应的token特征向量进行拼接,得到拼接特征向量;由所述回归算法模型通过如下公式计算所述第一正向span对应的边界偏移值:As an example, the first neural network is a regression algorithm model; predicting the boundary offset value corresponding to the first forward span through the preset first neural network includes: moving units according to multiple preset boundaries Move the boundary of the first forward span to obtain a plurality of third forward spans; splice the token feature vectors corresponding to the plurality of third forward spans to obtain a splicing feature vector; use the regression algorithm The model calculates the boundary offset value corresponding to the first forward span through the following formula:
offset=W 3·GELU(W 2h+b 2)+b 3offset=W 3 ·GELU(W 2 h+b 2 )+b 3 ;
其中,offset表示所述第一正向span对应的边界偏移值,所述GELU(˙)表示所述回归算法模型中的激活函数,所述h表示所述第一正向span对应的拼接特征向量,所述W 2表示第一权重矩阵,所述W 3表示第二权重矩阵,所述b 2表示第一偏置参数,所述b 3表示第二偏置参数。 Where, offset represents the boundary offset value corresponding to the first forward span, the GELU(˙) represents the activation function in the regression algorithm model, and the h represents the splicing feature corresponding to the first forward span. Vector, the W 2 represents the first weight matrix, the W 3 represents the second weight matrix, the b 2 represents the first bias parameter, and the b 3 represents the second bias parameter.
作为示例,所述第二神经网络包括至少两层第二BI-LSTM网络和第二全连接网络,其中,所述至少两层第二BI-LSTM网络依次连接,所述第二全连接网络与最后一层所述第二BI-LSTM网络连接;所述通过预设的第二神经网络预测所述目标span对应的实体分类,包括:将所述目标span输入至所述第二神经网络的第一层所述第二BI-LSTM网络;由所述第二神经网络的最后一层所述第二BI-LSTM网络输出所述目标span对应的特征向量;由所述第二神经网络的第二全连接网络利用softmax函数对所述目标span的特征向量进行处理,输出所述目标span对应的实体分类。As an example, the second neural network includes at least two layers of a second BI-LSTM network and a second fully connected network, wherein the at least two layers of the second BI-LSTM network are connected in sequence, and the second fully connected network is The last layer of the second BI-LSTM network connection; predicting the entity classification corresponding to the target span through the preset second neural network includes: inputting the target span into the second neural network's third node. One layer of the second BI-LSTM network; the last layer of the second BI-LSTM network of the second neural network outputs the feature vector corresponding to the target span; the second layer of the second neural network The fully connected network uses the softmax function to process the feature vector of the target span and outputs the entity classification corresponding to the target span.
本申请实施例还提出了一种电子设备,包括:The embodiment of the present application also provides an electronic device, including:
至少一个处理器;at least one processor;
以及,与所述至少一个处理器通信连接的存储器;and, a memory communicatively connected to the at least one processor;
其中,所述存储器存储有计算机程序,所述计算机程序被所述至少一个处理器执行,以 使所述至少一个处理器能够执行一种基于深度学习的命名实体识别方法;其中,所述基于深度学习的命名实体识别方法,包括:获取待处理句子;基于不同的预设识别长度从所述待处理句子识别出多个候选span,得到候选span集合,其中,所述预设识别长度小于预设识别长度阈值,所述候选span集合包括多个长度小于等于所述预设识别长度阈值的候选span;对所述候选span集合中的候选span进行筛选,得到至少一个第一正向span;通过预设的第一神经网络预测所述第一正向span对应的边界偏移值;根据所述第一正向span对应的边界偏移值,调整所述第一正向span的边界,并基于调整边界后的所述第一正向span得到目标span;通过预设的第二神经网络预测所述目标span对应的实体分类。Wherein, the memory stores a computer program, and the computer program is executed by the at least one processor, so that the at least one processor can execute a deep learning-based named entity recognition method; wherein the depth-based The learned named entity recognition method includes: obtaining a sentence to be processed; identifying multiple candidate spans from the sentence to be processed based on different preset recognition lengths to obtain a set of candidate spans, wherein the preset recognition length is smaller than the preset recognition length Recognition length threshold, the candidate span set includes a plurality of candidate spans with a length less than or equal to the preset recognition length threshold; filter the candidate spans in the candidate span set to obtain at least one first forward span; through the preset recognition length threshold Assume that the first neural network predicts the boundary offset value corresponding to the first forward span; adjusts the boundary of the first forward span according to the boundary offset value corresponding to the first forward span, and adjusts the boundary based on the adjustment The first forward span after the boundary is used to obtain the target span; the entity classification corresponding to the target span is predicted through the preset second neural network.
本申请实施例还提出了一种计算机可读存储介质,存储有计算机程序,所述计算机程序被处理器执行时实现一种基于深度学习的命名实体识别方法;其中,所述基于深度学习的命名实体识别方法,包括:获取待处理句子;基于不同的预设识别长度从所述待处理句子识别出多个候选span,得到候选span集合,其中,所述预设识别长度小于预设识别长度阈值,所述候选span集合包括多个长度小于等于所述预设识别长度阈值的候选span;对所述候选span集合中的候选span进行筛选,得到至少一个第一正向span;通过预设的第一神经网络预测所述第一正向span对应的边界偏移值;根据所述第一正向span对应的边界偏移值,调整所述第一正向span的边界,并基于调整边界后的所述第一正向span得到目标span;通过预设的第二神经网络预测所述目标span对应的实体分类。Embodiments of the present application also propose a computer-readable storage medium that stores a computer program. When the computer program is executed by a processor, it implements a deep learning-based named entity recognition method; wherein, the deep learning-based naming method The entity recognition method includes: obtaining a sentence to be processed; identifying multiple candidate spans from the sentence to be processed based on different preset recognition lengths to obtain a set of candidate spans, wherein the preset recognition length is less than a preset recognition length threshold , the candidate span set includes a plurality of candidate spans with a length less than or equal to the preset recognition length threshold; filter the candidate spans in the candidate span set to obtain at least one first forward span; through the preset third A neural network predicts the boundary offset value corresponding to the first forward span; adjusts the boundary of the first forward span according to the boundary offset value corresponding to the first forward span, and based on the adjusted boundary The first forward span obtains a target span; the entity classification corresponding to the target span is predicted through a preset second neural network.
上述各实施例可以结合使用,不同实施例之间名称相同的模块可相同可不同。The above embodiments can be used in combination, and modules with the same names in different embodiments may be the same or different.
上述计算机可读存储介质可以是非易失性,也可以是易失性。The above computer-readable storage medium may be non-volatile or volatile.
上述对本申请特定实施例进行了描述,其它实施例在所附权利要求书的范围内。在一些情况下,在权利要求书中记载的动作或步骤可以按照不同于实施例中的顺序来执行并且仍然可以实现期望的结果。另外,附图中描绘的过程不一定必须按照示出的特定顺序或者连续顺序才能实现期望的结果。在某些实施方式中,多任务处理和并行处理也是可以的或者可能是有利的。While specific embodiments of the present application have been described, other embodiments are within the scope of the appended claims. In some cases, the actions or steps recited in the claims can be performed in a different order than in the embodiments and still achieve desired results. Additionally, the processes depicted in the figures do not necessarily require the specific order shown, or a sequential order, to achieve desirable results. Multitasking and parallel processing are also possible or may be advantageous in certain implementations.
本申请中的各个实施例均采用递进的方式描述,各个实施例之间相同相似的部分互相参见即可,每个实施例重点说明的都是与其他实施例的不同之处。尤其,对于装置、设备、计算机可读存储介质实施例而言,由于其基本相似于方法实施例,所以描述的比较简单,相关之处参见方法实施例的部分说明即可。Each embodiment in this application is described in a progressive manner. The same and similar parts between the various embodiments can be referred to each other. Each embodiment focuses on its differences from other embodiments. In particular, the apparatus, equipment, and computer-readable storage medium embodiments are described simply because they are basically similar to the method embodiments. For relevant details, please refer to the partial description of the method embodiments.
本申请实施例提供的装置、设备、计算机可读存储介质与方法是对应的,因此,装置、设备、非易失性计算机存储介质也具有与对应方法类似的有益技术效果,由于上面已经对方法的有益技术效果进行了详细说明,因此,这里不再赘述对应装置、设备、计算机存储介质的有益技术效果。The devices, equipment, and computer-readable storage media provided by the embodiments of the present application correspond to the methods. Therefore, the devices, equipment, and non-volatile computer storage media also have beneficial technical effects similar to those of the corresponding methods. Since the methods have been discussed above, The beneficial technical effects are described in detail. Therefore, the beneficial technical effects of the corresponding devices, equipment, and computer storage media will not be described in detail here.
上述实施例阐明的系统、装置、模块或单元,具体可以由计算机芯片或实体实现,或者由具有某种功能的产品来实现。一种典型的实现设备为计算机。具体的,计算机例如可以为个人计算机、膝上型计算机、蜂窝电话、相机电话、智能电话、个人数字助理、媒体播放器、导航设备、电子邮件设备、游戏控制台、平板计算机、可穿戴设备或者这些设备中的任何设备的组合。The systems, devices, modules or units described in the above embodiments may be implemented by computer chips or entities, or by products with certain functions. A typical implementation device is a computer. Specifically, the computer may be, for example, a personal computer, a laptop computer, a cellular phone, a camera phone, a smartphone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or A combination of any of these devices.
为了描述的方便,描述以上装置时以功能分为各种单元分别描述。当然,在实施本申请实施例时可以把各单元的功能在同一个或多个软件和/或硬件中实现。For the convenience of description, when describing the above device, the functions are divided into various units and described separately. Of course, when implementing the embodiments of the present application, the functions of each unit can be implemented in the same or multiple software and/or hardware.
本领域内的技术人员应明白,本申请实施例可提供为方法、系统、或计算机程序产品。因此,本申请实施例可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本申请实施例可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。Those skilled in the art should understand that embodiments of the present application may be provided as methods, systems, or computer program products. Therefore, embodiments of the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment that combines software and hardware aspects. Furthermore, embodiments of the present application may take the form of a computer program product implemented on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.
本说明书是参照根据本申请实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或 方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。This specification is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each process and/or block in the flowchart illustrations and/or block diagrams, and combinations of processes and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing device to produce a machine, such that the instructions executed by the processor of the computer or other programmable data processing device produce a use A device for realizing the functions specified in one process or multiple processes of the flowchart and/or one block or multiple blocks of the block diagram.
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。These computer program instructions may also be stored in a computer-readable memory that causes a computer or other programmable data processing apparatus to operate in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including the instruction means, the instructions The device implements the functions specified in a process or processes of the flowchart and/or a block or blocks of the block diagram.
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。These computer program instructions may also be loaded onto a computer or other programmable data processing device, causing a series of operating steps to be performed on the computer or other programmable device to produce computer-implemented processing, thereby executing on the computer or other programmable device. Instructions provide steps for implementing the functions specified in a process or processes of a flowchart diagram and/or a block or blocks of a block diagram.
在一个典型的配置中,计算设备包括一个或多个处理器(CPU)、输入/输出接口、网络接口和内存。In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
内存可能包括计算机可读介质中的非永久性存储器,随机存取存储器(RAM)和/或非易失性内存等形式,如只读存储器(ROM)或闪存(flash RAM)。内存是计算机可读介质的示例。Memory may include non-permanent storage in computer-readable media, random access memory (RAM) and/or non-volatile memory in the form of read-only memory (ROM) or flash memory (flash RAM). Memory is an example of computer-readable media.
计算机可读介质包括永久性和非永久性、可移动和非可移动媒体可以由任何方法或技术来实现信息存储。信息可以是计算机可读指令、数据结构、程序的模块或其他数据。计算机的存储介质的例子包括,但不限于相变内存(PRAM)、静态随机存取存储器(SRAM)、动态随机存取存储器(DRAM)、其他类型的随机存取存储器(RAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、快闪记忆体或其他内存技术、只读光盘只读存储器(CD-ROM)、数字多功能光盘(DVD)或其他光学存储、磁盒式磁带,磁带式磁盘存储或其他磁性存储设备或任何其他非传输介质,可用于存储可以被计算设备访问的信息。按照本文中的界定,计算机可读介质不包括暂存电脑可读媒体(transitory media),如调制的数据信号和载波。Computer-readable media includes both persistent and non-volatile, removable and non-removable media that can be implemented by any method or technology for storage of information. Information may be computer-readable instructions, data structures, modules of programs, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), and read-only memory. (ROM), electrically erasable programmable read-only memory (EEPROM), flash memory or other memory technology, compact disc read-only memory (CD-ROM), digital versatile disc (DVD) or other optical storage, Magnetic tape cartridges, magnetic tape storage or other magnetic storage devices or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined in this article, computer-readable media does not include transitory media, such as modulated data signals and carrier waves.
以上所述仅为本申请实施例而已,并不用于限制本申请。对于本领域技术人员来说,本申请可以有各种更改和变化。凡在本申请的精神和原理之内所作的任何修改、等同替换、改进等,均应包含在本申请的权利要求范围之内。The above descriptions are only examples of the present application and are not intended to limit the present application. To those skilled in the art, various modifications and variations may be made to this application. Any modifications, equivalent substitutions, improvements, etc. made within the spirit and principles of this application shall be included in the scope of the claims of this application.

Claims (20)

  1. 一种基于深度学习的命名实体识别方法,其中,所述方法包括:A named entity recognition method based on deep learning, wherein the method includes:
    获取待处理句子;Get the sentence to be processed;
    基于不同的预设识别长度从所述待处理句子识别出多个候选span,得到候选span集合,其中,所述预设识别长度小于预设识别长度阈值,所述候选span集合包括多个长度小于等于所述预设识别长度阈值的候选span;Multiple candidate spans are identified from the sentence to be processed based on different preset recognition lengths, and a candidate span set is obtained, wherein the preset recognition length is less than a preset recognition length threshold, and the candidate span set includes a plurality of span candidates with a length less than Candidate spans equal to the preset recognition length threshold;
    对所述候选span集合中的候选span进行筛选,得到至少一个第一正向span;Screen the candidate spans in the candidate span set to obtain at least one first forward span;
    通过预设的第一神经网络预测所述第一正向span对应的边界偏移值;Predict the boundary offset value corresponding to the first forward span through a preset first neural network;
    根据所述第一正向span对应的边界偏移值,调整所述第一正向span的边界,并基于调整边界后的所述第一正向span得到目标span;Adjust the boundary of the first forward span according to the boundary offset value corresponding to the first forward span, and obtain the target span based on the first forward span after adjusting the boundary;
    通过预设的第二神经网络预测所述目标span对应的实体分类。Predict the entity classification corresponding to the target span through a preset second neural network.
  2. 根据权利要求1所述的方法,其中,所述对所述候选span集合中的候选span进行筛选,得到至少一个第一正向span,包括:The method according to claim 1, wherein filtering the candidate spans in the candidate span set to obtain at least one first forward span includes:
    获取预设的真实span集合;Get the default real span collection;
    将候选span与所述真实span集合进行IOU计算,得到所述候选span对应的IOU值;Calculate the IOU between the candidate span and the real span set to obtain the IOU value corresponding to the candidate span;
    根据所述候选span集合中各个所述候选span对应的IOU值,从所述候选span集合的所述候选span中确定所述第一正向span。The first forward span is determined from the candidate spans in the candidate span set according to the IOU value corresponding to each candidate span in the candidate span set.
  3. 根据权利要求2所述的方法,其中,所述根据所述候选span集合中各个所述候选span对应的IOU值,从所述候选span集合的所述候选span中确定所述第一正向span,包括:The method according to claim 2, wherein the first forward span is determined from the candidate spans in the candidate span set according to the IOU value corresponding to each candidate span in the candidate span set. ,include:
    从所述候选span集合中获取IOU值大于预设IOU阈值的所述候选span,并将获取的所述候选span作为第二正向span;Obtain the candidate span whose IOU value is greater than the preset IOU threshold from the candidate span set, and use the obtained candidate span as the second forward span;
    获取各个所述第二正向span对应的嵌入向量;Obtain the embedding vector corresponding to each of the second forward spans;
    将所述第二正向span对应的嵌入向量输入至预设的第三神经网络,使所述第三神经网络输出所述第二正向span对应的正向样本预测概率;Input the embedding vector corresponding to the second forward span into a preset third neural network, so that the third neural network outputs the forward sample prediction probability corresponding to the second forward span;
    将正向样本预测概率大于预设正向样本预测概率阈值的所述第二正向span作为所述第一正向span。The second forward span with a forward sample prediction probability greater than a preset forward sample prediction probability threshold is used as the first forward span.
  4. 根据权利要求3所述的方法,其中,所述第三神经网络包括至少两层第一BI-LSTM网络和第一全连接网络,其中,所述至少两层第一BI-LSTM网络依次连接,所述第一全连接网络与最后一层所述第一BI-LSTM网络连接;The method according to claim 3, wherein the third neural network includes at least two layers of the first BI-LSTM network and a first fully connected network, wherein the at least two layers of the first BI-LSTM network are connected in sequence, The first fully connected network is connected to the first BI-LSTM network of the last layer;
    所述将所述第二正向span对应的嵌入向量输入至预设的第三神经网络,使所述第三神经网络输出所述第二正向span对应的正向样本预测概率,包括:Inputting the embedding vector corresponding to the second forward span into a preset third neural network so that the third neural network outputs the forward sample prediction probability corresponding to the second forward span includes:
    将所述第二正向span对应的嵌入向量输入至所述第三神经网络的第一层所述第一BI-LSTM网络;Input the embedding vector corresponding to the second forward span into the first BI-LSTM network in the first layer of the third neural network;
    由所述第三神经网络的最后一层所述第一BI-LSTM网络输出所述第二正向span对应的特征向量;The first BI-LSTM network in the last layer of the third neural network outputs the feature vector corresponding to the second forward span;
    由所述第三神经网络的第一全连接网络利用sigmoid函数对所述第二正向span的特征向量进行处理,输出所述第二正向span对应的正向样本预测概率。The first fully connected network of the third neural network uses a sigmoid function to process the feature vector of the second forward span, and outputs the forward sample prediction probability corresponding to the second forward span.
  5. 根据权利要求4所述的方法,其中,所述第二正向span包括多个token,所述第二正向span对应的嵌入向量由多个token的嵌入向量拼接形成,所述token嵌入向量通过如下公式表示:The method according to claim 4, wherein the second forward span includes multiple tokens, the embedding vector corresponding to the second forward span is formed by splicing the embedding vectors of multiple tokens, and the token embedding vector is formed by It is represented by the following formula:
    h i=E(t i)+P ih i =E(t i )+P i ;
    其中,h i表示第i个token的嵌入向量,E(t i)表示第i个token的词嵌入向量,P i表示第i个token的位置嵌入向量。 Among them, h i represents the embedding vector of the i-th token, E(t i ) represents the word embedding vector of the i-th token, and Pi represents the position embedding vector of the i-th token.
  6. 根据权利要求1所述的方法,其中,所述第一神经网络为回归算法模型;The method according to claim 1, wherein the first neural network is a regression algorithm model;
    所述通过预设的第一神经网络预测所述第一正向span对应的边界偏移值,包括:Predicting the boundary offset value corresponding to the first forward span through a preset first neural network includes:
    根据预设的多个边界移动单位对所述第一正向span的边界进行移动,得到多个第三正向span;Move the boundary of the first forward span according to multiple preset boundary movement units to obtain multiple third forward spans;
    将所述多个第三正向span对应的token特征向量进行拼接,得到拼接特征向量;Splice the token feature vectors corresponding to the plurality of third forward spans to obtain a spliced feature vector;
    由所述回归算法模型通过如下公式计算所述第一正向span对应的边界偏移值:The boundary offset value corresponding to the first forward span is calculated by the regression algorithm model through the following formula:
    offset=W 2·GELU(W 1h+b 1)+b 2offset=W 2 ·GELU(W 1 h+b 1 )+b 2 ;
    其中,所述offset表示所述第一正向span对应的边界偏移值,所述GELU(˙)表示所述回归算法模型中的激活函数,所述h表示所述第一正向span对应的拼接特征向量,所述W 1表示第一权重矩阵,所述W 2表示第二权重矩阵,所述b 1表示第一偏置参数,所述b 2表示第二偏置参数。 Wherein, the offset represents the boundary offset value corresponding to the first forward span, the GELU(˙) represents the activation function in the regression algorithm model, and the h represents the boundary offset value corresponding to the first forward span. Splicing feature vectors, the W 1 represents the first weight matrix, the W 2 represents the second weight matrix, the b 1 represents the first bias parameter, and the b 2 represents the second bias parameter.
  7. 根据权利要求1所述的方法,其中,所述第二神经网络包括至少两层第二BI-LSTM网络和第二全连接网络,其中,所述至少两层第二BI-LSTM网络依次连接,所述第二全连接网络与最后一层所述第二BI-LSTM网络连接;The method of claim 1, wherein the second neural network includes at least two layers of a second BI-LSTM network and a second fully connected network, wherein the at least two layers of the second BI-LSTM network are connected in sequence, The second fully connected network is connected to the second BI-LSTM network of the last layer;
    所述通过预设的第二神经网络预测所述目标span对应的实体分类,包括:Predicting the entity classification corresponding to the target span through a preset second neural network includes:
    将所述目标span输入至所述第二神经网络的第一层所述第二BI-LSTM网络;Input the target span into the second BI-LSTM network in the first layer of the second neural network;
    由所述第二神经网络的最后一层所述第二BI-LSTM网络输出所述目标span对应的特征向量;The second BI-LSTM network in the last layer of the second neural network outputs the feature vector corresponding to the target span;
    由所述第二神经网络的第二全连接网络利用softmax函数对所述目标span的特征向量进行处理,输出所述目标span对应的实体分类。The second fully connected network of the second neural network uses a softmax function to process the feature vector of the target span, and outputs the entity classification corresponding to the target span.
  8. 一种基于深度学习的命名实体识别装置,其中,所述装置包括:A named entity recognition device based on deep learning, wherein the device includes:
    获取模块,用于获取待处理句子;Acquisition module, used to obtain sentences to be processed;
    候选span确定模块,用于基于不同的预设识别长度对所述待处理句子进行遍历,得到候选span集合,其中,所述预设识别长度小于预设识别长度阈值,所述候选span集合包括多个长度小于等于所述预设识别长度阈值的候选span;A candidate span determination module, configured to traverse the sentence to be processed based on different preset recognition lengths to obtain a candidate span set, wherein the preset recognition length is less than a preset recognition length threshold, and the candidate span set includes multiple A candidate span whose length is less than or equal to the preset recognition length threshold;
    筛选模块,用于对所述候选span集合中的候选span进行筛选,得到至少一个第一正向span;A screening module, used to screen candidate spans in the candidate span set to obtain at least one first forward span;
    第一预测模块,用于通过预设的第一神经网络预测所述第一正向span对应的边界偏移值;A first prediction module, configured to predict the boundary offset value corresponding to the first forward span through a preset first neural network;
    目标span确定模块,用于根据所述第一正向span对应的边界偏移值,对所述第一正向span进行边界调整,并基于调整边界后的所述第一正向span得到目标span;A target span determination module, configured to perform boundary adjustment on the first forward span according to the boundary offset value corresponding to the first forward span, and obtain the target span based on the first forward span after adjusting the boundary. ;
    第二预测模块,用于通过预设的第二神经网络预测所述目标span对应的实体分类。The second prediction module is used to predict the entity classification corresponding to the target span through a preset second neural network.
  9. 一种电子设备,其中,包括:An electronic device, including:
    至少一个处理器;以及,at least one processor; and,
    与所述至少一个处理器通信连接的存储器;其中,a memory communicatively connected to the at least one processor; wherein,
    所述存储器存储有计算机程序,所述计算机程序被所述至少一个处理器执行,以使所述至少一个处理器能够执行一种基于深度学习的命名实体识别方法;The memory stores a computer program, and the computer program is executed by the at least one processor, so that the at least one processor can execute a named entity recognition method based on deep learning;
    其中,所述基于深度学习的命名实体识别方法,包括:Among them, the named entity recognition method based on deep learning includes:
    获取待处理句子;Get the sentence to be processed;
    基于不同的预设识别长度从所述待处理句子识别出多个候选span,得到候选span集合,其中,所述预设识别长度小于预设识别长度阈值,所述候选span集合包括多个长度小于等于所述预设识别长度阈值的候选span;Multiple candidate spans are identified from the sentence to be processed based on different preset recognition lengths, and a candidate span set is obtained, wherein the preset recognition length is less than a preset recognition length threshold, and the candidate span set includes a plurality of span candidates with a length less than Candidate spans equal to the preset recognition length threshold;
    对所述候选span集合中的候选span进行筛选,得到至少一个第一正向span;Screen the candidate spans in the candidate span set to obtain at least one first forward span;
    通过预设的第一神经网络预测所述第一正向span对应的边界偏移值;Predict the boundary offset value corresponding to the first forward span through a preset first neural network;
    根据所述第一正向span对应的边界偏移值,调整所述第一正向span的边界,并基于调整边界后的所述第一正向span得到目标span;Adjust the boundary of the first forward span according to the boundary offset value corresponding to the first forward span, and obtain the target span based on the first forward span after adjusting the boundary;
    通过预设的第二神经网络预测所述目标span对应的实体分类。Predict the entity classification corresponding to the target span through a preset second neural network.
  10. 根据权利要求9所述的电子设备,其中,所述对所述候选span集合中的候选span进 行筛选,得到至少一个第一正向span,包括:The electronic device according to claim 9, wherein the screening of candidate spans in the candidate span set to obtain at least one first forward span includes:
    获取预设的真实span集合;Get the default real span collection;
    将候选span与所述真实span集合进行IOU计算,得到所述候选span对应的IOU值;Calculate the IOU between the candidate span and the real span set to obtain the IOU value corresponding to the candidate span;
    根据所述候选span集合中各个所述候选span对应的IOU值,从所述候选span集合的所述候选span中确定所述第一正向span。The first forward span is determined from the candidate spans in the candidate span set according to the IOU value corresponding to each candidate span in the candidate span set.
  11. 根据权利要求10所述的电子设备,其中,所述根据所述候选span集合中各个所述候选span对应的IOU值,从所述候选span集合的所述候选span中确定所述第一正向span,包括:The electronic device according to claim 10, wherein the first forward direction is determined from the candidate spans in the candidate span set according to the IOU value corresponding to each candidate span in the candidate span set. span, including:
    从所述候选span集合中获取IOU值大于预设IOU阈值的所述候选span,并将获取的所述候选span作为第二正向span;Obtain the candidate span whose IOU value is greater than the preset IOU threshold from the candidate span set, and use the obtained candidate span as the second forward span;
    获取各个所述第二正向span对应的嵌入向量;Obtain the embedding vector corresponding to each of the second forward spans;
    将所述第二正向span对应的嵌入向量输入至预设的第三神经网络,使所述第三神经网络输出所述第二正向span对应的正向样本预测概率;Input the embedding vector corresponding to the second forward span into a preset third neural network, so that the third neural network outputs the forward sample prediction probability corresponding to the second forward span;
    将正向样本预测概率大于预设正向样本预测概率阈值的所述第二正向span作为所述第一正向span。The second forward span with a forward sample prediction probability greater than a preset forward sample prediction probability threshold is used as the first forward span.
  12. 根据权利要求11所述的电子设备,其中,所述第三神经网络包括至少两层第一BI-LSTM网络和第一全连接网络,其中,所述至少两层第一BI-LSTM网络依次连接,所述第一全连接网络与最后一层所述第一BI-LSTM网络连接;The electronic device according to claim 11, wherein the third neural network includes at least two layers of the first BI-LSTM network and a first fully connected network, wherein the at least two layers of the first BI-LSTM network are connected in sequence , the first fully connected network is connected to the first BI-LSTM network of the last layer;
    所述将所述第二正向span对应的嵌入向量输入至预设的第三神经网络,使所述第三神经网络输出所述第二正向span对应的正向样本预测概率,包括:Inputting the embedding vector corresponding to the second forward span into a preset third neural network so that the third neural network outputs the forward sample prediction probability corresponding to the second forward span includes:
    将所述第二正向span对应的嵌入向量输入至所述第三神经网络的第一层所述第一BI-LSTM网络;Input the embedding vector corresponding to the second forward span into the first BI-LSTM network in the first layer of the third neural network;
    由所述第三神经网络的最后一层所述第一BI-LSTM网络输出所述第二正向span对应的特征向量;The first BI-LSTM network in the last layer of the third neural network outputs the feature vector corresponding to the second forward span;
    由所述第三神经网络的第一全连接网络利用sigmoid函数对所述第二正向span的特征向量进行处理,输出所述第二正向span对应的正向样本预测概率。The first fully connected network of the third neural network uses a sigmoid function to process the feature vector of the second forward span, and outputs the forward sample prediction probability corresponding to the second forward span.
  13. 根据权利要求12所述的电子设备,其中,所述第二正向span包括多个token,所述第二正向span对应的嵌入向量由多个token的嵌入向量拼接形成,所述token嵌入向量通过如下公式表示:The electronic device according to claim 12, wherein the second forward span includes multiple tokens, the embedding vector corresponding to the second forward span is formed by splicing the embedding vectors of multiple tokens, and the token embedding vector Expressed by the following formula:
    h i=E(t i)+P ih i =E(t i )+P i ;
    其中,h i表示第i个token的嵌入向量,E(t i)表示第i个token的词嵌入向量,P i表示第i个token的位置嵌入向量。 Among them, h i represents the embedding vector of the i-th token, E(t i ) represents the word embedding vector of the i-th token, and Pi represents the position embedding vector of the i-th token.
  14. 根据权利要求9所述的电子设备,其中,所述第一神经网络为回归算法模型;The electronic device according to claim 9, wherein the first neural network is a regression algorithm model;
    所述通过预设的第一神经网络预测所述第一正向span对应的边界偏移值,包括:Predicting the boundary offset value corresponding to the first forward span through a preset first neural network includes:
    根据预设的多个边界移动单位对所述第一正向span的边界进行移动,得到多个第三正向span;Move the boundary of the first forward span according to multiple preset boundary movement units to obtain multiple third forward spans;
    将所述多个第三正向span对应的token特征向量进行拼接,得到拼接特征向量;Splice the token feature vectors corresponding to the plurality of third forward spans to obtain a spliced feature vector;
    由所述回归算法模型通过如下公式计算所述第一正向span对应的边界偏移值:The boundary offset value corresponding to the first forward span is calculated by the regression algorithm model through the following formula:
    offset=W 2·GELU(W 1h+b 1)+b 2offset=W 2 ·GELU(W 1 h+b 1 )+b 2 ;
    其中,所述offset表示所述第一正向span对应的边界偏移值,所述GELU(˙)表示所述回归算法模型中的激活函数,所述h表示所述第一正向span对应的拼接特征向量,所述W 1表示第一权重矩阵,所述W 2表示第二权重矩阵,所述b 1表示第一偏置参数,所述b 2表示第二偏置参数。 Wherein, the offset represents the boundary offset value corresponding to the first forward span, the GELU(˙) represents the activation function in the regression algorithm model, and the h represents the boundary offset value corresponding to the first forward span. Splicing feature vectors, the W 1 represents the first weight matrix, the W 2 represents the second weight matrix, the b 1 represents the first bias parameter, and the b 2 represents the second bias parameter.
  15. 一种计算机可读存储介质,存储有计算机程序,其中,所述计算机程序被处理器执行时实现一种基于深度学习的命名实体识别方法;A computer-readable storage medium storing a computer program, wherein when the computer program is executed by a processor, a named entity recognition method based on deep learning is implemented;
    其中,所述基于深度学习的命名实体识别方法,包括:Among them, the named entity recognition method based on deep learning includes:
    获取待处理句子;Get the sentence to be processed;
    基于不同的预设识别长度从所述待处理句子识别出多个候选span,得到候选span集合,其中,所述预设识别长度小于预设识别长度阈值,所述候选span集合包括多个长度小于等于所述预设识别长度阈值的候选span;Multiple candidate spans are identified from the sentence to be processed based on different preset recognition lengths, and a candidate span set is obtained, wherein the preset recognition length is less than a preset recognition length threshold, and the candidate span set includes a plurality of span candidates with a length less than Candidate spans equal to the preset recognition length threshold;
    对所述候选span集合中的候选span进行筛选,得到至少一个第一正向span;Screen the candidate spans in the candidate span set to obtain at least one first forward span;
    通过预设的第一神经网络预测所述第一正向span对应的边界偏移值;Predict the boundary offset value corresponding to the first forward span through a preset first neural network;
    根据所述第一正向span对应的边界偏移值,调整所述第一正向span的边界,并基于调整边界后的所述第一正向span得到目标span;Adjust the boundary of the first forward span according to the boundary offset value corresponding to the first forward span, and obtain the target span based on the first forward span after adjusting the boundary;
    通过预设的第二神经网络预测所述目标span对应的实体分类。Predict the entity classification corresponding to the target span through a preset second neural network.
  16. 根据权利要求15所述的计算机可读存储介质,其中,所述对所述候选span集合中的候选span进行筛选,得到至少一个第一正向span,包括:The computer-readable storage medium according to claim 15, wherein the filtering the candidate spans in the candidate span set to obtain at least one first forward span includes:
    获取预设的真实span集合;Get the default real span collection;
    将候选span与所述真实span集合进行IOU计算,得到所述候选span对应的IOU值;Calculate the IOU between the candidate span and the real span set to obtain the IOU value corresponding to the candidate span;
    根据所述候选span集合中各个所述候选span对应的IOU值,从所述候选span集合的所述候选span中确定所述第一正向span。The first forward span is determined from the candidate spans in the candidate span set according to the IOU value corresponding to each candidate span in the candidate span set.
  17. 根据权利要求16所述的计算机可读存储介质,其中,所述根据所述候选span集合中各个所述候选span对应的IOU值,从所述候选span集合的所述候选span中确定所述第一正向span,包括:The computer-readable storage medium according to claim 16, wherein the first is determined from the candidate spans in the candidate span set according to the IOU value corresponding to each candidate span in the candidate span set. A forward span, including:
    从所述候选span集合中获取IOU值大于预设IOU阈值的所述候选span,并将获取的所述候选span作为第二正向span;Obtain the candidate span whose IOU value is greater than the preset IOU threshold from the candidate span set, and use the obtained candidate span as the second forward span;
    获取各个所述第二正向span对应的嵌入向量;Obtain the embedding vector corresponding to each of the second forward spans;
    将所述第二正向span对应的嵌入向量输入至预设的第三神经网络,使所述第三神经网络输出所述第二正向span对应的正向样本预测概率;Input the embedding vector corresponding to the second forward span into a preset third neural network, so that the third neural network outputs the forward sample prediction probability corresponding to the second forward span;
    将正向样本预测概率大于预设正向样本预测概率阈值的所述第二正向span作为所述第一正向span。The second forward span with a forward sample prediction probability greater than a preset forward sample prediction probability threshold is used as the first forward span.
  18. 根据权利要求17所述的计算机可读存储介质,其中,所述第三神经网络包括至少两层第一BI-LSTM网络和第一全连接网络,其中,所述至少两层第一BI-LSTM网络依次连接,所述第一全连接网络与最后一层所述第一BI-LSTM网络连接;The computer-readable storage medium of claim 17, wherein the third neural network includes at least two layers of a first BI-LSTM network and a first fully connected network, wherein the at least two layers of a first BI-LSTM network The networks are connected in sequence, and the first fully connected network is connected to the first BI-LSTM network of the last layer;
    所述将所述第二正向span对应的嵌入向量输入至预设的第三神经网络,使所述第三神经网络输出所述第二正向span对应的正向样本预测概率,包括:Inputting the embedding vector corresponding to the second forward span into a preset third neural network so that the third neural network outputs the forward sample prediction probability corresponding to the second forward span includes:
    将所述第二正向span对应的嵌入向量输入至所述第三神经网络的第一层所述第一BI-LSTM网络;Input the embedding vector corresponding to the second forward span into the first BI-LSTM network in the first layer of the third neural network;
    由所述第三神经网络的最后一层所述第一BI-LSTM网络输出所述第二正向span对应的特征向量;The first BI-LSTM network in the last layer of the third neural network outputs the feature vector corresponding to the second forward span;
    由所述第三神经网络的第一全连接网络利用sigmoid函数对所述第二正向span的特征向量进行处理,输出所述第二正向span对应的正向样本预测概率。The first fully connected network of the third neural network uses a sigmoid function to process the feature vector of the second forward span, and outputs the forward sample prediction probability corresponding to the second forward span.
  19. 根据权利要求18所述的计算机可读存储介质,其中,所述第二正向span包括多个token,所述第二正向span对应的嵌入向量由多个token的嵌入向量拼接形成,所述token嵌入向量通过如下公式表示:The computer-readable storage medium according to claim 18, wherein the second forward span includes multiple tokens, the embedding vector corresponding to the second forward span is formed by concatenating the embedding vectors of multiple tokens, and the The token embedding vector is represented by the following formula:
    h i=E(t i)+P ih i =E(t i )+P i ;
    其中,h i表示第i个token的嵌入向量,E(t i)表示第i个token的词嵌入向量,P i表示第i个token的位置嵌入向量。 Among them, h i represents the embedding vector of the i-th token, E(t i ) represents the word embedding vector of the i-th token, and Pi represents the position embedding vector of the i-th token.
  20. 根据权利要求15所述的计算机可读存储介质,其中,所述第一神经网络为回归算法模型;The computer-readable storage medium of claim 15, wherein the first neural network is a regression algorithm model;
    所述通过预设的第一神经网络预测所述第一正向span对应的边界偏移值,包括:Predicting the boundary offset value corresponding to the first forward span through a preset first neural network includes:
    根据预设的多个边界移动单位对所述第一正向span的边界进行移动,得到多个第三正向 span;Move the boundary of the first forward span according to a plurality of preset boundary movement units to obtain a plurality of third forward spans;
    将所述多个第三正向span对应的token特征向量进行拼接,得到拼接特征向量;Splice the token feature vectors corresponding to the plurality of third forward spans to obtain a spliced feature vector;
    由所述回归算法模型通过如下公式计算所述第一正向span对应的边界偏移值:The boundary offset value corresponding to the first forward span is calculated by the regression algorithm model through the following formula:
    offset=W 2·GELU(W 1h+b 1)+b 2offset=W 2 ·GELU(W 1 h+b 1 )+b 2 ;
    其中,所述offset表示所述第一正向span对应的边界偏移值,所述GELU(˙)表示所述回归算法模型中的激活函数,所述h表示所述第一正向span对应的拼接特征向量,所述W 1表示第一权重矩阵,所述W 2表示第二权重矩阵,所述b 1表示第一偏置参数,所述b 2表示第二偏置参数。 Wherein, the offset represents the boundary offset value corresponding to the first forward span, the GELU(˙) represents the activation function in the regression algorithm model, and the h represents the boundary offset value corresponding to the first forward span. Splicing feature vectors, the W 1 represents the first weight matrix, the W 2 represents the second weight matrix, the b 1 represents the first bias parameter, and the b 2 represents the second bias parameter.
PCT/CN2022/090740 2022-03-15 2022-04-29 Deep learning-based named entity recognition method and apparatus, device, and medium WO2023173556A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210255150.1A CN114611517B (en) 2022-03-15 2022-03-15 Named entity recognition method, device, equipment and medium based on deep learning
CN202210255150.1 2022-03-15

Publications (1)

Publication Number Publication Date
WO2023173556A1 true WO2023173556A1 (en) 2023-09-21

Family

ID=81862881

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/090740 WO2023173556A1 (en) 2022-03-15 2022-04-29 Deep learning-based named entity recognition method and apparatus, device, and medium

Country Status (2)

Country Link
CN (1) CN114611517B (en)
WO (1) WO2023173556A1 (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110502740A (en) * 2019-07-03 2019-11-26 平安科技(深圳)有限公司 Question sentence Entity recognition and link method, device, computer equipment and storage medium
CN110956193A (en) * 2018-09-27 2020-04-03 英特尔公司 Methods, systems, articles, and apparatus for improved boundary offset detection
US20210157975A1 (en) * 2017-10-17 2021-05-27 Handycontract Llc Device, system, and method for extracting named entities from sectioned documents
US20210248324A1 (en) * 2020-02-10 2021-08-12 International Business Machines Corporation Extracting relevant sentences from text corpus

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110032737B (en) * 2019-04-10 2022-03-22 贵州大学 Boundary combination named entity recognition method based on neural network
CN112989105B (en) * 2019-12-16 2024-04-26 黑盒子科技(北京)有限公司 Music structure analysis method and system
CN113742567B (en) * 2020-05-29 2023-08-22 北京达佳互联信息技术有限公司 Recommendation method and device for multimedia resources, electronic equipment and storage medium
CN112446216B (en) * 2021-02-01 2021-05-04 华东交通大学 Method and device for identifying nested named entities fusing with core word information
CN113221539B (en) * 2021-07-08 2021-09-24 华东交通大学 Method and system for identifying nested named entities integrated with syntactic information
CN113569062A (en) * 2021-09-26 2021-10-29 深圳索信达数据技术有限公司 Knowledge graph completion method and system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210157975A1 (en) * 2017-10-17 2021-05-27 Handycontract Llc Device, system, and method for extracting named entities from sectioned documents
CN110956193A (en) * 2018-09-27 2020-04-03 英特尔公司 Methods, systems, articles, and apparatus for improved boundary offset detection
CN110502740A (en) * 2019-07-03 2019-11-26 平安科技(深圳)有限公司 Question sentence Entity recognition and link method, device, computer equipment and storage medium
US20210248324A1 (en) * 2020-02-10 2021-08-12 International Business Machines Corporation Extracting relevant sentences from text corpus

Also Published As

Publication number Publication date
CN114611517B (en) 2023-07-25
CN114611517A (en) 2022-06-10

Similar Documents

Publication Publication Date Title
Song et al. Hierarchical LSTM with adjusted temporal attention for video captioning
US20210151034A1 (en) Methods and systems for multimodal content analytics
CN114612759B (en) Video processing method, video query method, model training method and model training device
US20230162481A1 (en) Pre-training of computer vision foundational models
US20210142210A1 (en) Multi-task segmented learning models
US11948078B2 (en) Joint representation learning from images and text
CN116304745B (en) Text topic matching method and system based on deep semantic information
CN112200031A (en) Network model training method and equipment for generating image corresponding word description
CN112084301B (en) Training method and device for text correction model, text correction method and device
CN110347853B (en) Image hash code generation method based on recurrent neural network
Han et al. L-Net: lightweight and fast object detector-based ShuffleNetV2
CN116975350A (en) Image-text retrieval method, device, equipment and storage medium
CN115062709A (en) Model optimization method, device, equipment, storage medium and program product
Gu et al. A robust attention-enhanced network with transformer for visual tracking
CN110867225A (en) Character-level clinical concept extraction named entity recognition method and system
Ludwig et al. Deep embedding for spatial role labeling
WO2023173556A1 (en) Deep learning-based named entity recognition method and apparatus, device, and medium
CN113095072A (en) Text processing method and device
WO2023173552A1 (en) Establishment method for target detection model, application method for target detection model, and device, apparatus and medium
CN115934883A (en) Entity relation joint extraction method based on semantic enhancement and multi-feature fusion
CN113704466B (en) Text multi-label classification method and device based on iterative network and electronic equipment
CN113095066A (en) Text processing method and device
CN116227598B (en) Event prediction method, device and medium based on dual-stage attention mechanism
CN117540007B (en) Multi-mode emotion analysis method, system and equipment based on similar mode completion
US20240126990A1 (en) Deep learning for multimedia classification

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22931583

Country of ref document: EP

Kind code of ref document: A1