CN116432648A

CN116432648A - Named entity recognition method and recognition device, electronic equipment and storage medium

Info

Publication number: CN116432648A
Application number: CN202310284774.0A
Authority: CN
Inventors: 李泽远; 王健宗
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2023-03-15
Filing date: 2023-03-15
Publication date: 2023-07-14

Abstract

The embodiment of the application provides a named entity identification method and device, electronic equipment and storage medium, and belongs to the technical field of artificial intelligence. The method comprises the following steps: acquiring an initial text to be identified; performing word segmentation processing on the initial text to obtain a word segmentation sequence; inputting the word segmentation sequence into a pre-constructed named entity recognition model to obtain target prediction probability of each text word under each initial named entity label; the named entity recognition model comprises a first classification layer and a second classification layer, wherein the first classification layer is used for carrying out initial classification prediction processing on text segmentation to obtain a first prediction result, and the second classification layer is used for carrying out calibration classification prediction processing on the text segmentation to obtain a second prediction result; and determining a target named entity label of each text word according to the target prediction probability so as to determine the named entity of the initial text. The method and the device can improve accuracy of named entity identification.

Description

Named entity recognition method and recognition device, electronic equipment and storage medium

Technical Field

The present disclosure relates to the field of artificial intelligence technologies, and in particular, to a named entity recognition method and device, an electronic device, and a storage medium.

Background

Currently, in practical application of named entity recognition (Named Entity Recognition, NER) models, calibration effects are poor due to frequent overfitting of named entity recognition models constructed by using deep neural networks. The existing calibration method for the model mainly adopts an MC Dropout method and a Deep Ensemble method, wherein the MC Dropout method is simple to operate, but the reasoning time is long, and extremely high memory resources are required for using the Deep Ensemble method, so that the calibration effect on the named entity recognition model is poor, and the accuracy of named entity recognition is further affected. Therefore, how to improve accuracy of named entity recognition becomes a technical problem to be solved.

Disclosure of Invention

The embodiment of the application mainly aims to provide a named entity identification method and device, electronic equipment and storage medium, and aims to improve accuracy of named entity identification.

To achieve the above object, a first aspect of an embodiment of the present application provides a named entity identifying method, where the method includes:

acquiring an initial text to be identified and a plurality of preset initial named entity tags;

performing word segmentation processing on the initial text to obtain a word segmentation sequence, wherein the word segmentation sequence comprises a plurality of text word segments;

Inputting the word segmentation sequence into a pre-constructed named entity recognition model to obtain target prediction probability of each text word under each initial named entity label; the named entity recognition model comprises a first classification layer and a second classification layer, wherein the first classification layer is used for carrying out initial classification prediction processing on the text segmentation to obtain a first prediction result, the second classification layer is used for carrying out calibration classification prediction processing on the text segmentation to obtain a second prediction result, and the target prediction probability is determined according to the first prediction result and the second prediction result;

determining a target naming entity label of each text word according to the target prediction probability;

and determining the named entity of the initial text according to the target named entity label of each text word.

In some embodiments, the named entity recognition model is trained by:

constructing a training sample set, wherein the training sample set comprises a plurality of sample texts and word segmentation tag data of each sample text, and each word segmentation tag data comprises a plurality of sample segmentation words of the sample texts and initial sample named entity tags of each sample segmentation word;

Constructing an initial recognition model based on a Bert model structure, wherein the initial recognition model comprises a coding layer, the first classification layer and the second classification layer;

inputting the word segmentation tag data of each sample text into the initial recognition model;

the encoding layer is used for encoding each sample word of the word segmentation tag data to obtain a word segmentation feature vector;

the initial classification prediction processing is carried out on the word segmentation feature vector through the first classification layer, so that the first classification prediction probability of the sample word segmentation under each initial sample named entity label is obtained;

performing the calibration classification prediction processing on the word segmentation feature vector through the second classification layer to obtain a second classification prediction probability of the sample word under each initial sample named entity label;

determining a target sample named entity tag of the sample word according to the first classification prediction probability and the second classification prediction probability;

and taking the initial sample named entity label corresponding to the word segmentation label data as expected output of the initial recognition model, and training the named entity recognition model according to the initial sample named entity label and the target sample named entity label.

In some embodiments, the training the named entity recognition model according to the initial sample named entity tag and the target sample named entity tag by using the initial sample named entity tag corresponding to the word segmentation tag data as the expected output of the initial recognition model includes:

determining a total loss value according to the first classification prediction probability and the second classification prediction probability;

and adjusting model parameters of the initial recognition model according to the initial sample named entity tag and the target sample named entity tag of the sample segmentation, and continuously training the adjusted initial recognition model based on the training sample set until the total loss value meets a preset training ending condition to obtain the named entity recognition model.

In some embodiments, the performing, by the first classification layer, the initial classification prediction processing on the word segmentation feature vector to obtain a first classification prediction probability of the sample word under each of the initial sample named entity labels includes:

constructing the first classification layer based on a Softmax function;

the initial classification prediction processing is carried out on the word segmentation feature vector through the first classification layer, so that a prediction regression value is obtained;

And normalizing the predictive regression value to obtain a first classification predictive probability of the sample word under each initial sample named entity label.

In some embodiments, the performing, by the second classification layer, the calibration classification prediction processing on the word segmentation feature vector to obtain a second classification prediction probability of the sample segmentation under each of the initial sample named entity labels includes:

constructing the second classification layer based on a gaussian process;

and carrying out calibration classification prediction processing on the word segmentation feature vector through the second classification layer to obtain second classification prediction probability of the sample word under each initial sample named entity label.

In some embodiments, the determining the total loss value from the first classification prediction probability and the second classification prediction probability comprises:

performing cross entropy calculation according to the first classification prediction probability and the initial sample named entity label to obtain a first loss value;

carrying out Laplacian calculation according to the second classification prediction probability and the model parameters of the initial recognition model to obtain a second loss value;

carrying out KL divergence calculation according to the first classification prediction probability and the second classification prediction probability to obtain a third loss value;

And determining a total loss value according to the first loss value, the second loss value and the third loss value.

In some embodiments, the determining the target sample named entity tag of the sample word according to the first classification prediction probability and the second classification prediction probability includes:

average value calculation is carried out on the first classification prediction probability and the second classification prediction probability, so that initial sample prediction probability of the sample segmentation under each initial sample named entity label is obtained;

and carrying out numerical comparison on the initial sample prediction probabilities of the sample word under each initial sample named entity label, determining the target sample prediction probability of the sample word, and determining the target sample named entity label of the sample word according to the target sample prediction probability.

To achieve the above object, a second aspect of the embodiments of the present application proposes a named entity recognition device, the device comprising:

the text and label acquisition module is used for acquiring an initial text to be identified and a plurality of preset initial named entity labels;

the word segmentation processing module is used for carrying out word segmentation processing on the initial text to obtain a word segmentation sequence, wherein the word segmentation sequence comprises a plurality of text word segments;

The model input module is used for inputting the word segmentation sequence into a pre-constructed named entity recognition model to obtain target prediction probability of each text word under each initial named entity label; the named entity recognition model comprises a first classification layer and a second classification layer, wherein the first classification layer is used for carrying out initial classification prediction processing on the text segmentation to obtain a first prediction result, the second classification layer is used for carrying out calibration classification prediction processing on the text segmentation to obtain a second prediction result, and the target prediction probability is determined according to the first prediction result and the second prediction result;

the label determining module is used for determining a target named entity label of each text word according to the target prediction probability;

and the named entity determining module is used for determining the named entity of the initial text according to the target named entity label of each text word.

To achieve the above object, a third aspect of the embodiments of the present application proposes an electronic device, which includes a memory and a processor, the memory storing a computer program, the processor implementing the method according to the first aspect when executing the computer program.

To achieve the above object, a fourth aspect of the embodiments of the present application proposes a storage medium, which is a computer-readable storage medium storing a computer program that, when executed by a processor, implements the method described in the first aspect.

According to the named entity identification method, the identification device, the electronic equipment and the storage medium, the initial text to be identified and a plurality of preset initial named entity tags are obtained, so that the named entity corresponding to the initial text is matched out according to the preset initial named entity tags. And performing word segmentation processing on the initial text to obtain a word segmentation sequence, wherein the word segmentation sequence comprises a plurality of text word segments. Inputting the word segmentation sequence into a pre-constructed named entity recognition model to obtain target prediction probability of each text word under each initial named entity label. In order to improve the calibration capability of a named entity recognition model and improve the recognition accuracy of named entities of an initial text, the named entity recognition model comprises a first classification layer and a second classification layer, wherein the first classification layer is used for carrying out initial classification prediction processing on text segmentation to obtain a first prediction result, the second classification layer is used for carrying out calibration classification prediction processing on the text segmentation to obtain a second prediction result, and the target prediction probability is determined according to the first prediction result and the second prediction result. Finally, determining a target naming entity label of each text word according to the target prediction probability; and determining the named entity of the initial text according to the target named entity label of each text word. The method and the device can improve accuracy of named entity identification.

Drawings

FIG. 1 is a flowchart of a named entity recognition method provided in an embodiment of the present application;

FIG. 2 is a training flow chart of a named entity recognition model provided by an embodiment of the present application;

fig. 3 is a flowchart of step S208 in fig. 2;

fig. 4 is a flowchart of step S205 in fig. 2;

fig. 5 is a flowchart of step S206 in fig. 2;

fig. 6 is a flowchart of step S207 in fig. 2;

fig. 7 is a flowchart of step S301 in fig. 3;

FIG. 8 is a schematic structural diagram of a named entity recognition device according to an embodiment of the present disclosure;

fig. 9 is a schematic hardware structure of an electronic device according to an embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be further described in detail with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the present application.

It should be noted that although functional block division is performed in a device diagram and a logic sequence is shown in a flowchart, in some cases, the steps shown or described may be performed in a different order than the block division in the device, or in the flowchart. The terms first, second and the like in the description and in the claims and in the above-described figures, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used herein is for the purpose of describing embodiments of the present application only and is not intended to be limiting of the present application.

First, several nouns referred to in this application are parsed:

artificial intelligence (artificial intelligence, AI): is a new technical science for researching and developing theories, methods, technologies and application systems for simulating, extending and expanding the intelligence of people; artificial intelligence is a branch of computer science that attempts to understand the nature of intelligence and to produce a new intelligent machine that can react in a manner similar to human intelligence, research in this field including robotics, language recognition, image recognition, natural language processing, and expert systems. Artificial intelligence can simulate the information process of consciousness and thinking of people. Artificial intelligence is also a theory, method, technique, and application system that utilizes a digital computer or digital computer-controlled machine to simulate, extend, and expand human intelligence, sense the environment, acquire knowledge, and use knowledge to obtain optimal results.

Natural language processing (natural language processing, NLP): NLP is a branch of artificial intelligence that is a interdisciplinary of computer science and linguistics, and is often referred to as computational linguistics, and is processed, understood, and applied to human languages (e.g., chinese, english, etc.). Natural language processing includes parsing, semantic analysis, chapter understanding, and the like. Natural language processing is commonly used in the technical fields of machine translation, handwriting and print character recognition, voice recognition and text-to-speech conversion, information intent recognition, information extraction and filtering, text classification and clustering, public opinion analysis and opinion mining, and the like, and relates to data mining, machine learning, knowledge acquisition, knowledge engineering, artificial intelligence research, linguistic research related to language calculation, and the like.

BERT (Bidirectional Encoder Representation from Transformers) model: the method is used for further increasing the generalization capability of the word vector model, and fully describing the character level, the word level, the sentence level and even the relation characteristics among sentences, and is constructed based on a transducer. Three types of Embedding exist in BERT, namely Token Embedding, segment Embedding, position Embedding; wherein Token documents are word vectors, the first word is a CLS Token, which can be used for the subsequent classification task; segment Embeddings is used to distinguish between two sentences, because pre-training does not only LM but also classification tasks with two sentences as input; position Embeddings, here the position word vector is not a trigonometric function in transfor, but BERT is learned through training. However, the BERT directly trains a position coding to reserve position information, each position randomly initializes a vector, adds model training, finally obtains an coding containing position information, and finally, the BERT selects direct splicing on the combination mode of Position Embedding and word coding.

Softmax function: the method is also called a normalized exponential function, is generalized on multiple classifications by a classification function sigmoid, and aims to display the results of multiple classifications in a probability form. The function may map the outputs of multiple neurons into (0, 1) intervals for multiple classification.

Gaussian Process (GP): the linear combination of any random variable in the Gaussian process is subjected to normal distribution, each finite dimensional distribution is combined normal distribution, and the probability density function of the finite dimensional distribution on a continuous index set is the Gaussian measure of all random variables, so the finite dimensional distribution is regarded as infinite dimensional generalized extension of the combined normal distribution.

Named entity recognition (Named Entity Recognition, NER for short) is a basic key task in natural language processing, and is an important basic tool for numerous NLP tasks such as information extraction, dialogue systems, knowledge graphs, syntactic analysis, machine translation and the like. However, the corpus of named entity detection is smaller at present, and the calibration effect is poor because named entity recognition models constructed by adopting the deep neural network are frequently over-fitted. Therefore, in practical application of the NER model, especially in the task of extracting a named entity in the medical field, the prediction result of the model is required to be accurate and has a good calibration effect.

Model calibration refers to the accuracy with which the score provided by a model reflects its prediction uncertainty. For example, when NER identification is performed on medical diagnosis, the model is required to make a low-confidence prediction result without identification, or the low-confidence prediction result is not returned to an expert for decision making, so that more manpower and material resources are reduced. Therefore, uncertainty calibration is performed on the NER model to avoid that the model forces an unreasonable named entity type to be given to the named entity type which may answer an error, which is an important topic for constructing named entity recognition.

The existing method for calibrating the model by utilizing uncertainty mainly adopts an MC Dropout method and a Deep Ensemble method. The MC dropouout method is to perform dropouout on model parameters in the model reasoning stage, and the MC dropouout method is simple to operate and long in reasoning time; deep seal is a method for improving model robustness by training a plurality of networks and aggregating the prediction results of the networks, and the method has good calibration capability but requires extremely high memory resources when in use, so that the calibration effect on a named entity recognition model is poor, and the accuracy of named entity recognition is further affected. Therefore, how to improve accuracy of named entity recognition becomes a technical problem to be solved.

Based on the above, the embodiment of the application provides a named entity identification method and device, electronic equipment and storage medium, and aims to improve accuracy of named entity identification.

The embodiment of the application can acquire and process the related data based on the artificial intelligence technology. Among these, artificial intelligence (Artificial Intelligence, AI) is the theory, method, technique and application system that uses a digital computer or a digital computer-controlled machine to simulate, extend and extend human intelligence, sense the environment, acquire knowledge and use knowledge to obtain optimal results.

Artificial intelligence infrastructure technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a robot technology, a biological recognition technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and other directions.

The named entity identification method provided by the embodiment of the application relates to the technical field of artificial intelligence. The named entity identification method provided by the embodiment of the application can be applied to a terminal, a server side and software running in the terminal or the server side. In some embodiments, the terminal may be a smart phone, tablet, notebook, desktop, etc.; the server side can be configured as an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, and a cloud server for providing cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, content delivery networks (Content Delivery Network, CDN), basic cloud computing services such as big data and artificial intelligent platforms and the like; the software may be an application or the like that implements the named entity recognition method, but is not limited to the above form.

The subject application is operational with numerous general purpose or special purpose computer system environments or configurations. For example: personal computers, server computers, hand-held or portable devices, tablet devices, multiprocessor systems, microprocessor-based systems, set top boxes, programmable consumer electronics, network PCs, minicomputers, mainframe computers, distributed computing environments that include any of the above systems or devices, and the like. The application may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The application may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.

In the embodiments of the present application, when related processing is required according to user information, user behavior data, user history data, user location information, and other data related to user identity or characteristics, permission or consent of the user is obtained first, and the collection, use, processing, and the like of the data comply with related laws and regulations and standards of related countries and regions. In addition, when the embodiment of the application needs to acquire the sensitive personal information of the user, the independent permission or independent consent of the user is acquired through a popup window or a jump to a confirmation page or the like, and after the independent permission or independent consent of the user is explicitly acquired, necessary user related data for enabling the embodiment of the application to normally operate is acquired.

Referring to fig. 1, fig. 1 is an optional flowchart of a named entity recognition method according to an embodiment of the present application, where the method in fig. 1 may include, but is not limited to, steps S101 to S105.

Step S101, acquiring an initial text to be identified and a plurality of preset initial named entity tags;

step S102, performing word segmentation processing on an initial text to obtain a word segmentation sequence, wherein the word segmentation sequence comprises a plurality of text word segments;

step S103, inputting the word segmentation sequence into a pre-constructed named entity recognition model to obtain target prediction probability of each text word under each initial named entity label; the named entity recognition model comprises a first classification layer and a second classification layer, wherein the first classification layer is used for carrying out initial classification prediction processing on text segmentation to obtain a first prediction result, the second classification layer is used for carrying out calibration classification prediction processing on the text segmentation to obtain a second prediction result, and the target prediction probability is determined according to the first prediction result and the second prediction result;

step S104, determining a target naming entity label of each text word according to the target prediction probability;

step S105, determining the named entity of the initial text according to the target named entity label of each text word.

In step S101 to step S105 illustrated in the embodiment of the present application, an initial text to be identified and a plurality of preset initial named entity tags are obtained, so that a named entity corresponding to the initial text is matched according to the preset plurality of initial named entity tags. And performing word segmentation processing on the initial text to obtain a word segmentation sequence, wherein the word segmentation sequence comprises a plurality of text word segments. Inputting the word segmentation sequence into a pre-constructed named entity recognition model to obtain target prediction probability of each text word under each initial named entity label. In order to improve the calibration capability of a named entity recognition model and improve the recognition accuracy of named entities of an initial text, the named entity recognition model comprises a first classification layer and a second classification layer, wherein the first classification layer is used for carrying out initial classification prediction processing on text segmentation to obtain a first prediction result, the second classification layer is used for carrying out calibration classification prediction processing on the text segmentation to obtain a second prediction result, and the target prediction probability is determined according to the first prediction result and the second prediction result. Finally, determining a target naming entity label of each text word according to the target prediction probability; and determining the named entity of the initial text according to the target named entity label of each text word. The method and the device can improve accuracy of named entity identification.

It should be noted that, the application scenario of the embodiment of the present application may include a client device and a server device, where the client device is configured to send an initial text to be identified obtained by the client device to the server device, and the server device is configured to execute the named entity identifying method provided by the embodiment of the present application, and after obtaining the initial text to be identified sent by the client device, execute the named entity identifying method of the present application on the initial text line in a classified manner.

In step S101 of some embodiments, when the user needs to determine a named entity included in the initial text to be recognized, the user may input the initial text to be recognized in the text input field to be recognized of the client device, or may convert a voice input by the user at the client device into a text sentence by other means, such as using a voice recognition technology, and the text sentence is regarded as the initial text to be recognized, not limited thereto. After the client device acquires the initial text to be recognized, which is input by the user, the initial text is sent to the server device.

It should be noted that, if the server device is used as the execution body of the named entity recognition method provided in the embodiment of the present application, the server device may directly use the initial text sent by the client device as the initial text to be recognized, and in addition, the server device may also obtain the initial text to be recognized in other manners, and no limitation is made on the specific manner in which the server device obtains the initial text to be recognized.

It should be noted that, in practical application, the named entity recognition method provided in the embodiment of the present application may also be applied to a client device, and no specific limitation is made to a specific application scenario.

It should be noted that, the initial named entity tag is used to represent a category of named entity, for example, in the medical field, the initial named entity tag may be "disorder", "drug", "treatment scheme", or the like.

It should be noted that, the initial text to be identified may relate to different application fields, for example, in the medical field, the input initial text is medical text, for example, the input medical text may be "symptom of intermittent pneumonia? ".

In step S102 of some embodiments, after the initial text to be recognized is obtained, word segmentation processing is performed on the initial text to obtain a word segmentation sequence, where the word segmentation sequence includes a plurality of text word segments. For example, when the initial text is "i am ill", the word segmentation sequence obtained by performing word segmentation processing on the initial text may be "i am ill/ill".

The word segmentation of the initial text may be performed by using a word segmentation device (token i zer), or a dictionary-based word segmentation algorithm or a statistical-based word segmentation algorithm may be used, which is not limited herein.

In step S103 of some embodiments, after the word segmentation sequence corresponding to the initial text is obtained, the word segmentation sequence is input into a pre-constructed named entity recognition model, so as to obtain a target prediction probability of each text word under each initial named entity label, where the target prediction probability is used to characterize whether each text word belongs to a named entity.

It should be noted that, in order to improve the calibration capability of the model itself while maintaining the high-precision recognition capability of the named entity recognition model, the named entity recognition model in the embodiment of the present application includes a first classification layer and a second classification layer. The first classification layer is used for carrying out initial classification prediction processing on text segmentation to obtain a first prediction result, namely the classification layer is used for enabling the named entity recognition model to have high-precision recognition capability. The second classification layer is used for carrying out calibration classification prediction processing on the text segmentation to obtain a second prediction result, namely the classification layer is used for improving the calibration capability of the model, so that the target prediction probability is determined according to the obtained first prediction result and second prediction result.

Referring to fig. 2, in some embodiments, before step S103, the named entity identifying method of the embodiments of the present application further includes: a named entity recognition model is pre-constructed and used for recognizing target prediction probability of each text word in the initial text under each initial named entity label. Specifically, the specific training process of the named entity recognition model may include, but is not limited to, steps S201 to S208:

Step S201, a training sample set is constructed, wherein the training sample set comprises a plurality of sample texts and word segmentation tag data of each sample text, and each word segmentation tag data comprises a plurality of sample segmentation words of the sample text and initial sample naming entity tags of each sample segmentation word;

step S202, constructing an initial recognition model based on a Bert model structure, wherein the initial recognition model comprises a coding layer, a first classification layer and a second classification layer;

step S203, inputting word segmentation tag data of each sample text into an initial recognition model;

step S204, performing coding processing on each sample word of the word segmentation tag data through a coding layer to obtain a word segmentation feature vector;

step S205, carrying out initial classification prediction processing on the segmentation feature vector through a first classification layer to obtain a first classification prediction probability of the sample segmentation under each initial sample named entity label;

step S206, performing calibration classification prediction processing on the segmentation feature vectors through the second classification layer to obtain second classification prediction probability of the sample segmentation under each initial sample named entity label;

step S207, determining a target sample named entity label of the sample word according to the first classification prediction probability and the second classification prediction probability;

Step S208, taking the initial sample named entity label corresponding to the word segmentation label data as the expected output of the initial recognition model, and training the named entity recognition model according to the initial sample named entity label and the target sample named entity label.

In step S201 of some embodiments, before the named entity recognition of the initial text, the named entity recognition model needs to be constructed first. Specifically, a training sample set is constructed, wherein the training sample set comprises a plurality of sample texts and word segmentation tag data of the sample texts, and each word segmentation tag data comprises a plurality of sample segmentation words of the sample texts and initial sample naming entity tags of each sample segmentation word. The initial sample naming entity label of each sample word is used for representing the real attribute identification of each sample word.

The sample text may be obtained by identifying text input in history, or may be obtained by compiling a web crawler or script program to perform targeted crawling of data.

For training sample set D, it can be expressed as

Where N represents the number of sample texts, x _i Representing the ith sample text, x _i ＝{w ₁ ,w ₂ ,…,w _t The i-th sample text consists of t sample words, w is used for representing the sample words separated by each sample text, y _i E {1,2, …, K } is used to represent the initial sample named entity tags, K is the number of preset entity tags. For example, a sample is "symptoms of intermittent pneumonia? If the intermittent pneumonia is a sample entity in the sample text, the initial sample named entity label corresponding to the sample entity can be a "disorder" type.

In steps S202 to S204 of some embodiments, after the word segmentation tag data of each sample text is input into the initial recognition model constructed based on the Bert model structure, encoding processing is performed on each sample word segmentation of the word segmentation tag data through the encoding layer, so as to obtain a word segmentation feature vector. Specifically, each sample word of the word segmentation tag data is subjected to coding processing through an initial recognition model, so that a word segmentation feature vector h of each sample word is obtained _j The dimension of the feature vector belongs to R ^D*1 。

In step S205 and step S206 of some embodiments, in order to optimize the recognition capability of the trained named entity recognition model with high precision, initial classification prediction processing is performed on the feature vectors of the segmented words through the first classification layer, so as to obtain a first classification prediction probability of the segmented words under each initial sample named entity label. And then, performing calibration classification prediction processing on the segmentation feature vectors through a second classification layer to obtain second classification prediction probability of the sample segmentation under each initial sample named entity label. When the named entity recognition model with high accuracy and calibration effect number is trained, the results of the first classification prediction probability and the second classification prediction probability which are correspondingly obtained should have higher similarity.

In step S207 and step S208 of some embodiments, a target sample named entity tag of the sample word is determined according to the first classification prediction probability and the second classification prediction probability, the target sample named entity tag being used to represent the prediction tag of each sample word. And constructing an initial recognition model based on the Bert model structure, taking all word segmentation tag data of each sample text as input data of the initial recognition model, taking initial sample named entity tags corresponding to the word segmentation tag data as expected output of the initial recognition model, training a named entity recognition model according to the initial sample named entity tags and target sample named entity tags, comparing the target sample named entity tags obtained after model processing with the initial sample named entity tags, and training the named entity recognition model.

After a plurality of sample texts are obtained, word segmentation processing is performed on each sample text to obtain a plurality of sample word segments corresponding to each sample text. The specific word segmentation processing method is the same as the word segmentation processing of the initial text, and is not repeated here.

Referring to fig. 3, in some embodiments, step 208 may include, but is not limited to, steps S301 to S302:

Step S301, determining a total loss value according to the first classification prediction probability and the second classification prediction probability;

step S302, model parameters of an initial recognition model are adjusted according to initial sample named entity tags and target sample named entity tags of sample segmentation, and training of the adjusted initial recognition model is continued based on a training sample set until the total loss value meets a preset training ending condition, so that a named entity recognition model is obtained.

In step S301 of some embodiments, in the optimization process of the model, a total loss value is determined according to the first classification prediction probability and the second classification prediction probability.

In step S302 of some embodiments, model parameters of an initial recognition model are adjusted according to initial sample named entity tags and target sample named entity tags of sample segmentation, and the adjusted initial recognition model is continuously trained based on a training sample set until a total loss value meets a preset training ending condition, that is, the performance of the initial recognition model at the moment can be considered to meet requirements, and then a named entity recognition model can be determined according to the model parameters and a network structure of the initial recognition model.

It should be noted that, the preset training ending condition may be when the total loss value of the model is smaller than a preset loss value threshold, or when the obtained similarity accuracy of the initial sample named entity tag and the target sample named entity tag is greater than or equal to a preset accuracy threshold.

Referring to fig. 4, in some embodiments, step S205 may include, but is not limited to, steps S401 to S403:

step S401, constructing a first classification layer based on a Softmax function;

step S402, carrying out initial classification prediction processing on the segmentation feature vector through a first classification layer to obtain a prediction regression value;

step S403, carrying out normalization processing on the predictive regression value to obtain a first classification predictive probability of the sample word under each initial sample named entity label.

In steps S401 to S403 of some embodiments, a first classification layer is constructed based on a Softmax function, and initial classification prediction processing is performed on the feature vectors of the segmentation by the first classification layer to obtain a prediction regression value S _i The predictive regression value S _i Probability value for representing each sample word at the first classification layer, where S _i ＝W _i h _j +ε，W _i ∈R ^K*D The method is used for representing weights in the first classification layer, and epsilon is used for representing preset deviation values in the first classification layer. In order to limit the obtained predictive regression value to a certain range (e.g., [0,1 ]]Or [ -1,1]) Therefore, adverse effects caused by singular sample data are reduced, the predicted regression value is normalized, and the first classification prediction probability of the sample word under each initial sample named entity label is obtained.

It should be noted that, the weights and bias values in the first classification layer constructed based on the Softmax function may also be learned and adjusted according to the model, and a random gradient is reduced after the model is initialized.

Referring to fig. 5, in some embodiments, step S206 may include, but is not limited to, steps S501 to S502:

step S501, constructing a second classification layer based on a Gaussian process;

step S502, performing calibration classification prediction processing on the segmentation feature vector through the second classification layer to obtain second classification prediction probability of the sample segmentation under each initial sample named entity label.

In steps S501 to S502 of some embodiments, in order to improve the calibration capability of the model, a second classification layer is constructed based on the gaussian process GP, and the obtained word segmentation feature vector h of each sample word is segmented by the second classification layer _j And performing calibration classification prediction processing to obtain second classification prediction probability of the sample segmentation under each initial sample named entity label.

The output g of the second classification layer _j ＝g(h _j ) Obeying a priori a gaussian distribution, wherein g-GP (0, k _ij )，K _ij An N x N covariance matrix for representing one of the correlations with the input sample text. Because it is difficult to compute gaussian distribution posterior on a large-scale dataset, a neural network layer constructed using Random Fourier Features (RFF) is utilized to reduce the complexity of the second classification layer computation. Specifically, use is made of K _ij ＝ΦΦ ^T To a low-dimensional approximation, the definition of the gaussian prior distribution corresponding to the second classification layer may be transformed as shown in equations (1) and (2).

Wherein the dimension of Φ is D _L * N, L represent the number of distributed layers,

for representing a fixed weight matrix sampled from a normal distribution N (0, 1), word segmentation feature vector h _j The dimension of (2) is R ^D*1 W is then _L h _i The dimension of (2) becomes D _L *1, at the same time, janus>

The dimension of the resulting phi is D _L *1。

In step S207 of some embodiments, in order to improve the calibration capability of the model itself while maintaining the high-precision recognition capability of the named entity recognition model, the target sample named entity tag of the sample segmentation is determined according to the obtained first classification prediction probability and second classification prediction probability.

Referring to fig. 6, in some embodiments, step S207 may include, but is not limited to, steps S601 to S602:

step S601, carrying out mean value calculation on the first classification prediction probability and the second classification prediction probability to obtain initial sample prediction probability of the sample segmentation under each initial sample named entity label;

step S602, comparing the initial sample prediction probabilities of the sample word under each initial sample named entity label, determining the target sample prediction probability of the sample word, and determining the target sample named entity label of the sample word according to the target sample prediction probability.

In steps S601 to S602 of some embodiments, a first classification prediction probability S is obtained according to the first classification layer _i Obtaining a second classification predictive probability g (h _i ) Average value calculation is carried out on the first classification prediction probability and the second classification prediction probability to obtain initial sample prediction probability probs of the sample segmentation under each initial sample named entity label, wherein the calculation of probs is probs= (S) _i +g(h _i ))/2. And carrying out numerical comparison on the initial sample prediction probability of the sample word under each initial sample named entity label, determining the target sample prediction probability of the sample word, and determining the target sample named entity label of the sample word according to the target sample prediction probability.

Specifically, referring to fig. 7, in some embodiments, step S301 may include, but is not limited to, steps S701 to S704:

step S701, performing cross entropy calculation according to the first classification prediction probability and the initial sample named entity label to obtain a first loss value;

step S702, carrying out Laplacian calculation according to the second classification prediction probability and the model parameters of the initial recognition model to obtain a second loss value;

step S703, performing KL divergence calculation according to the first classification prediction probability and the second classification prediction probability to obtain a third loss value;

Step S704, determining a total loss value according to the first loss value, the second loss value and the third loss value.

In steps S701 to S704 of some embodiments, the first classification layer constructed based on the Softmax function can effectively improve the accuracy of identifying the model by the named entity in the model training process, and the second classification layer constructed based on the gaussian process can improve the calibration capability of identifying the model by the named entity in the model training process, so that the accuracy of the prediction score identified by the model in reflecting the prediction uncertainty is improved, and an unreasonable entity type is avoided from being forcibly given by the model to the entity type possibly answering the error. Specifically, cross entropy calculation is performed according to the first classification prediction probability and the initial sample named entity label, and a first loss value loss1 is obtained. And carrying out Laplacian calculation according to the second classification prediction probability and the model parameters of the initial recognition model to obtain a second loss value loss2. And carrying out KL divergence calculation according to the first classification prediction probability and the second classification prediction probability to obtain a third loss value loss3. Finally, the first loss value loss1, the second loss value loss2 and the third loss value loss3 are summed to obtain a total loss value totaloss.

It should be noted that, for the first loss value loss1, the first classification layer based on the Softmax function may use a cross entropy method to solve the loss value, and then the first loss value loss1 may be expressed as shown in equation (3).

Wherein k is used to represent the serial number of the selected initial named entity tag, y _k For representing corresponding initial named entity tags, S _k For representing the corresponding first class prediction probability.

It should be noted that, for the second loss value loss2, the second classification layer constructed based on the gaussian process may also use a cross entropy method to solve the loss value, and the second loss value loss2 may be expressed as shown in the formula (4) and the formula (5).

/>

Wherein k is used to represent the serial number of the selected initial named entity tag, y _k For representing the corresponding initial named entity tag, logp (d|beta) is used to represent the cross entropy, beta-N (0, 1) beta ² Is the 2-norm of beta and squared again, and it should be noted that since the class likelihood function is not conjugated to Gaussian prior, the uncertainty of linear weights in RFF can be estimated using Laplacian approximation, let us say

Representing the maximum a posteriori estimate (MAP), the laplace posterior solution is shown in equation (6).

Wherein I represents an identity matrix, and during the training of the model,

random gradient descent updates are made along with the loss function.

It should be noted that, in order to balance the first classification layer constructed based on the Softmax function and the second classification layer constructed based on the gaussian process, the prediction probabilities obtained by the two classification layers are similar, and KL divergence calculation is performed according to the first classification prediction probability and the second classification prediction probability, so that training of the model can obtain a training result with high accuracy and better calibration effect according to the third loss value loss 3.

The solution of the first loss value and the second loss value is not limited to the solution of the loss value by the method of using the cross entropy loss function, but may be the solution of the loss value by using a hash function, and is not limited herein.

In step S104 of some embodiments, after the client device obtains the target prediction probabilities output by the model, a target named entity tag for each text word is determined according to the target prediction probabilities. For example, for medical text, the text word "interstitial pneumonia" is determined to belong to the "disorder" category or the "medication" category.

In step S105 of some embodiments, a named entity of the initial text is determined from the target named entity tag of each text word, e.g., for the medical text "is symptoms of intermittent pneumonia? "the resulting named entity includes" intermittent pneumonia "," symptom ". And then, the client device or the server device can further execute related operations such as searching and the like according to the determined named entity and return related search results to the user.

Referring to fig. 8, an embodiment of the present application further provides a named entity recognition device, which may implement the named entity recognition method, where the device includes:

a text and tag obtaining module 810, configured to obtain an initial text to be identified and a plurality of preset initial named entity tags;

the word segmentation processing module 820 is configured to perform word segmentation processing on the initial text to obtain a word segmentation sequence, where the word segmentation sequence includes a plurality of text word segments;

the model input module 830 is configured to input the word segmentation sequence into a pre-constructed named entity recognition model, so as to obtain a target prediction probability of each text word under each initial named entity label; the named entity recognition model comprises a first classification layer and a second classification layer, wherein the first classification layer is used for carrying out initial classification prediction processing on text segmentation to obtain a first prediction result, the second classification layer is used for carrying out calibration classification prediction processing on the text segmentation to obtain a second prediction result, and the target prediction probability is determined according to the first prediction result and the second prediction result;

the tag determining module 840 is configured to determine a target named entity tag of each text word according to the target prediction probability;

the named entity determining module 850 is configured to determine the named entity of the initial text according to the target named entity tag of each text word.

The specific implementation manner of the named entity recognition device is basically the same as the specific embodiment of the named entity recognition method, and is not described herein.

The embodiment of the application also provides electronic equipment, which comprises a memory and a processor, wherein the memory stores a computer program, and the processor realizes the named entity identification method when executing the computer program. The electronic equipment can be any intelligent terminal including a tablet personal computer, a vehicle-mounted computer and the like.

Referring to fig. 9, fig. 9 illustrates a hardware structure of an electronic device according to another embodiment, the electronic device includes:

the processor 901 may be implemented by a general purpose central processing unit (Central Processing Unit, CPU), a microprocessor, an application specific integrated circuit (Application Specific Integrated Circuit, ASIC), or one or more integrated circuits, etc. for executing related programs to implement the technical solutions provided by the embodiments of the present application;

the Memory 902 may be implemented in the form of a Read Only Memory (ROM), a static storage device, a dynamic storage device, or a random access Memory (Random Access Memory, RAM). The memory 902 may store an operating system and other application programs, and when the technical solutions provided in the embodiments of the present application are implemented by software or firmware, relevant program codes are stored in the memory 902, and the processor 901 invokes a named entity recognition method to execute the embodiments of the present application;

An input/output interface 903 for inputting and outputting information;

the communication interface 904 is configured to implement communication interaction between the device and other devices, and may implement communication in a wired manner (e.g. USB, network cable, etc.), or may implement communication in a wireless manner (e.g. mobile network, WIFI, bluetooth, etc.);

a bus 905 that transfers information between the various components of the device (e.g., the processor 901, the memory 902, the input/output interface 903, and the communication interface 904);

wherein the processor 901, the memory 902, the input/output interface 903 and the communication interface 904 are communicatively coupled to each other within the device via a bus 905.

The embodiment of the application also provides a storage medium, which is a computer readable storage medium, wherein the computer readable storage medium stores a computer program, and the computer program realizes the named entity identification method when being executed by a processor.

The memory, as a non-transitory computer readable storage medium, may be used to store non-transitory software programs as well as non-transitory computer executable programs. In addition, the memory may include high-speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory optionally includes memory remotely located relative to the processor, the remote memory being connectable to the processor through a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

According to the named entity identification method, the identification device, the electronic equipment and the storage medium, the initial text to be identified and a plurality of preset initial named entity tags are obtained, so that the named entity corresponding to the initial text is matched out according to the preset initial named entity tags. And performing word segmentation processing on the initial text to obtain a word segmentation sequence, wherein the word segmentation sequence comprises a plurality of text word segments. Inputting the word segmentation sequence into a pre-constructed named entity recognition model to obtain target prediction probability of each text word under each initial named entity label. In order to improve the calibration capability of a named entity recognition model and improve the recognition accuracy of named entities of an initial text, the named entity recognition model comprises a first classification layer and a second classification layer, wherein the first classification layer is used for carrying out initial classification prediction processing on text segmentation to obtain a first prediction result, the second classification layer is used for carrying out calibration classification prediction processing on the text segmentation to obtain a second prediction result, and the target prediction probability is determined according to the first prediction result and the second prediction result. Determining a target named entity label of each text word according to the target prediction probability; and determining the named entity of the initial text according to the target named entity label of each text word. The model is constructed by firstly constructing a training sample set, wherein the training sample set comprises a plurality of sample texts and word segmentation tag data of the sample texts, and each word segmentation tag data comprises a plurality of sample segmentation words of the sample texts and initial sample naming entity tags of each sample segmentation word. An initial recognition model is constructed based on the Bert model structure, and the initial recognition model comprises a coding layer, a first classification layer and a second classification layer. Carrying out coding processing on each sample word segmentation of the word segmentation tag data through a coding layer to obtain a word segmentation feature vector; the first classification layer constructed based on the Softmax function carries out initial classification prediction processing and normalization processing on the segmentation feature vector to obtain a first classification prediction probability; and carrying out calibration classification prediction processing on the segmentation feature vector based on a second classification layer constructed in a Gaussian process to obtain second classification prediction probability of the sample segmentation under each initial sample named entity label, carrying out mean value calculation and numerical comparison on the first classification prediction probability and the second classification prediction probability to determine target sample prediction probability of the sample segmentation, and determining target sample named entity labels of the sample segmentation according to the target sample prediction probability. In the parameter optimization process of the model, cross entropy calculation is carried out according to the first classification prediction probability and the initial sample named entity label, so as to obtain a first loss value; carrying out Laplacian calculation according to the second classification prediction probability and the model parameters of the initial recognition model to obtain a second loss value; and carrying out KL divergence calculation according to the first classification prediction probability and the second classification prediction probability to obtain a third loss value, and determining a total loss value according to the first loss value, the second loss value and the third loss value. Finally, model parameters of the initial recognition model are adjusted according to the initial sample named entity tags and the target sample named entity tags of the sample segmentation, and the adjusted initial recognition model is continuously trained based on the training sample set until the total loss value meets the preset training ending condition, so that the named entity recognition model is obtained. According to the method and the device, the initial text is encoded by combining the Softmax classification layer and the GP classification layer on the Bert model, and the KL divergence is combined to calculate the similarity of the two so as to improve the calibration effect of the model, so that the calibration effect is greatly improved while the high precision of the named entity recognition model is maintained, and a simple and reliable medical named entity recognition model is obtained, so that the accuracy of named entity recognition is improved.

The embodiments described in the embodiments of the present application are for more clearly describing the technical solutions of the embodiments of the present application, and do not constitute a limitation on the technical solutions provided by the embodiments of the present application, and as those skilled in the art can know that, with the evolution of technology and the appearance of new application scenarios, the technical solutions provided by the embodiments of the present application are equally applicable to similar technical problems.

It will be appreciated by those skilled in the art that the technical solutions shown in the figures do not constitute limitations of the embodiments of the present application, and may include more or fewer steps than shown, or may combine certain steps, or different steps.

The above described apparatus embodiments are merely illustrative, wherein the units illustrated as separate components may or may not be physically separate, i.e. may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

Those of ordinary skill in the art will appreciate that all or some of the steps of the methods, systems, functional modules/units in the devices disclosed above may be implemented as software, firmware, hardware, and suitable combinations thereof.

The terms "first," "second," "third," "fourth," and the like in the description of the present application and in the above-described figures, if any, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that embodiments of the present application described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

It should be understood that in this application, "at least one" means one or more, and "a plurality" means two or more. "and/or" for describing the association relationship of the association object, the representation may have three relationships, for example, "a and/or B" may represent: only a, only B and both a and B are present, wherein a, B may be singular or plural. The character "/" generally indicates that the context-dependent object is an "or" relationship. "at least one of" or the like means any combination of these items, including any combination of single item(s) or plural items(s). For example, at least one (one) of a, b or c may represent: a, b, c, "a and b", "a and c", "b and c", or "a and b and c", wherein a, b, c may be single or plural.

In the several embodiments provided in this application, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the above-described division of units is merely a logical function division, and there may be another division manner in actual implementation, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.

The units described above as separate components may or may not be physically separate, and components shown as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in each embodiment of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.

The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be embodied essentially or in part or all of the technical solution or in part in the form of a software product stored in a storage medium, including multiple instructions to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the methods of the various embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a magnetic disk, or an optical disk, or other various media capable of storing a program.

Preferred embodiments of the present application are described above with reference to the accompanying drawings, and thus do not limit the scope of the claims of the embodiments of the present application. Any modifications, equivalent substitutions and improvements made by those skilled in the art without departing from the scope and spirit of the embodiments of the present application shall fall within the scope of the claims of the embodiments of the present application.

Claims

1. A named entity recognition method, the method comprising:

2. The method of claim 1, wherein the named entity recognition model is trained by:

3. The method according to claim 2, wherein said training the named entity recognition model based on the initial sample named entity tags and the target sample named entity tags with the initial sample named entity tags corresponding to the word segmentation tag data as a desired output of the initial recognition model comprises:

4. The method according to claim 2, wherein said performing, by the first classification layer, the initial classification prediction process on the word segmentation feature vector to obtain a first classification prediction probability of the sample segmentation under each of the initial sample named entity labels, includes:

Constructing the first classification layer based on a Softmax function;

5. The method according to claim 2, wherein said performing, by the second classification layer, the calibrated classification prediction process on the word segmentation feature vector to obtain a second classification prediction probability of the sample segmentation under each of the initial sample named entity tags, includes:

constructing the second classification layer based on a gaussian process;

6. A method according to claim 3, wherein said determining a total loss value from said first classification prediction probability and said second classification prediction probability comprises:

7. The method according to any one of claims 2 to 6, wherein said determining a target sample named entity tag of the sample word from the first and second classification prediction probabilities comprises:

8. A named entity recognition device, the device comprising:

9. An electronic device comprising a memory storing a computer program and a processor implementing the method of any of claims 1 to 7 when the computer program is executed by the processor.

10. A storage medium being a computer readable storage medium storing a computer program, characterized in that the computer program when executed by a processor implements the method of any one of claims 1 to 7.