CN118133830A

CN118133830A - Named entity recognition method, named entity recognition device, named entity recognition equipment and named entity recognition computer readable storage medium

Info

Publication number: CN118133830A
Application number: CN202410532791.6A
Authority: CN
Inventors: 刘晓华; 刘泽恩; 张程剀; 左赛; 陈小梅
Original assignee: Beijing Yiyong Technology Co ltd
Current assignee: Beijing Yiyong Technology Co ltd
Priority date: 2024-04-30
Filing date: 2024-04-30
Publication date: 2024-06-04

Abstract

Provided are a named entity recognition method, apparatus, device, and computer-readable storage medium. The named entity identification method comprises the following steps: receiving text to be identified comprising one or more named entities using an input unit; embedding the text to be identified by using a large language model LLM in the encoder to extract a corresponding feature vector sequence; processing the feature vector sequence by a bi-directional gating loop unit, GRU, model in the encoder to capture forward context information of the text to be identified using a forward GRU component in the bi-directional GRU model and to capture backward context information of the text to be identified using a backward GRU component in the bi-directional GRU model, the forward context information or the backward context information comprising short-term and long-term dependencies over time of tokens in the text to be identified; and annotating, using the output unit, one or more named entities in the text to be identified based on the captured forward and backward context information.

Description

Named entity recognition method, named entity recognition device, named entity recognition equipment and named entity recognition computer readable storage medium

Technical Field

The present disclosure relates to the field of text processing, and in particular, to a method, apparatus, device, and computer readable storage medium for identifying named entities based on a large language model.

Background

In natural language processing, named Entity Recognition (NER) tasks play a vital role. It not only involves identifying critical information in the text, but also requires precise determination of the specific locations of these entities in the text. For example, in the field of medical text processing, it is of great importance how to accurately identify and define various types of entities (e.g., drugs, diseases, and symptoms) in medical case history text, and the exact location of these entities in case history text.

However, existing NER methods face significant challenges in this regard.

Traditional NER methods, such as pre-trained BERT model-based schemes in combination with Conditional Random Fields (CRF), have achieved certain achievements in medical text analysis. These methods are able to identify entities present in text to some degree of accuracy and to determine the specific locations of these entities in the text. However, the main challenge of these traditional approaches is their reliance on diverse training data. This reliance on the diversity of annotation data limits their generalization and application scope when processing complex text specific to the medical field, especially when faced with multiple hospitals, multiple disease types of data.

Large language models, such as the GPT family of models, demonstrate their excellent capabilities in text generation and understanding. These models are able to understand complex contexts and capture subtle semantic differences, which is particularly important when processing medical text with rich semantic hierarchy. However, these large generative models face unique challenges when dealing with specific NER tasks.

For example, because large language models like GPT are generative in nature, they may "create" entities that are not present in the original text during the process of identifying the entities. This tendency to creatively develop, rather than strictly follow the characteristics of the information already in the text, constitutes a significant problem for medical text analysis requiring high accuracy and reliability. Furthermore, even if these models are able to identify entities that do exist in text, they face difficulties in accurately generating the starting position of the entity in the text.

Thus, in the current large environment, there is an urgent need to develop a new NER solution that can combine the deep text understanding capabilities of large models with the accuracy of traditional NER methods.

Disclosure of Invention

The present disclosure has been made in order to solve the above-described problems. The present disclosure not only takes advantage of the advanced understanding capabilities of large models in processing complex text, but also integrates the advantages of traditional methods in accurately identifying the starting location of an entity. The fusion innovative method effectively makes up the defects of respective technologies, and provides a high-efficiency and accurate solution for identifying and determining the entities in the text, thereby greatly improving the accuracy and efficiency of text analysis.

In one aspect, the disclosure provides a named entity recognition method based on a large language model, including: receiving text to be identified comprising one or more named entities using an input unit; using a large language model LLM in the encoder to carry out embedded processing on the text to be identified so as to extract a corresponding feature vector sequence; processing the feature vector sequence by a bi-directional gating loop unit, GRU, model in the encoder to capture forward context information of the text to be identified using a forward GRU component in the bi-directional GRU model and to capture backward context information of the text to be identified using a backward GRU component in the bi-directional GRU model, the forward context information or the backward context information comprising short-term and long-term dependencies over time of tokens in the text to be identified; and annotating the one or more named entities in the text to be identified based on the captured forward context information and the backward context information using an output unit.

In some embodiments, the named entity recognition method further comprises: receiving a prompt text by using the input unit and splicing the text to be recognized and the prompt text to obtain a spliced text; and performing embedded processing on the spliced text by using the LLM to extract a corresponding feature vector sequence.

In some embodiments, the named entity recognition method further comprises: masking the hint text from the concatenated text using a masking layer in the output unit such that the output unit labels only the one or more named entities in the text to be identified.

In some embodiments, the named entity recognition method further comprises: the parameters of the encoder are low-rank decomposed using a low-rank adaptive LORA unit in the encoder to reduce the amount of parameters of the encoder that need to be trained.

In some embodiments, the LLM is a GPT model or a BERT model.

In some embodiments, the text to be identified is medical record text associated with a disease.

In some embodiments, the named entity recognition method further comprises: the named entity recognition device is pre-trained and fine-tuned before receiving the text to be recognized.

In some embodiments, pre-training the named entity recognition device includes: training the LLM using data related to a target application scenario such that the LLM captures language features and entity types related to the target application scenario.

In some embodiments, fine tuning the named entity recognition device includes: training the named entity recognition means using a training dataset comprising labeling information of named entities to adjust part of the parameters of the encoder.

In some embodiments, pre-training the named entity recognition device includes a full weight parameter of the named entity recognition deviceOptimizing, wherein d represents the output dimension of the last layer; trimming the named entity recognition device includes decomposing the full weight parameter W and the partial parameter/>, using the following formulaOr/>：

Wherein, in the updating process, the weight parameters are pre-trainedFrozen, only parameters a and B are updated, k representing the input dimension of the next layer.

In some embodiments, the partial parameters of the encoder include parameters of a low rank adaptation (LORA) unit in the encoder and parameters of the bi-directional GRU model, excluding parameters of the LLM.

In some embodiments, fine tuning the named entity recognition device further comprises: creating a training prompt word with similar semanteme aiming at each training data in the training data set; and training the named entity recognition device by using each training data in the training data set and the corresponding training prompt word.

Another aspect of the present disclosure provides a named entity recognition device based on a large language model, including: an input unit configured to receive text to be identified comprising one or more named entities; an encoder comprising a large language model LLM and a bi-directional gating loop unit, GRU, model, the LLM configured to perform embedded processing on the text to be identified to extract a corresponding sequence of feature vectors, the bi-directional GRU model configured to process the sequence of feature vectors to capture forward context information of the text to be identified using a forward GRU component in the bi-directional GRU model and to capture backward context information of the text to be identified using a backward GRU component in the bi-directional GRU model, the forward context information or the backward context information comprising short-term and long-term dependencies of tokens in the text to be identified over time; and an output unit configured to annotate the one or more named entities in the text to be identified based on the captured forward context information and the backward context information.

In some embodiments, the input unit further receives a prompt text and concatenates the text to be recognized and the prompt text to obtain a concatenated text; and the LLM performs embedded processing on the spliced text to extract a corresponding feature vector sequence.

In some embodiments, the output unit further comprises a masking layer configured to mask the hint text from the concatenated text such that the output unit labels only the one or more named entities in the text to be identified.

In some embodiments, the encoder further comprises a low-rank adaptation LORA unit configured to low-rank decompose parameters of the encoder to reduce an amount of parameters required to train the named entity recognition device.

In some embodiments, the large language model is a GPT model or a BERT model.

Another aspect of the present disclosure also provides an apparatus for named entity recognition, comprising: a processor; and a memory storing one or more computer program modules. The one or more computer program modules are configured to perform any of the named entity recognition methods described previously when executed by the processor.

Yet another aspect of the present disclosure provides a non-transitory computer-readable storage medium having instructions stored thereon that, when executed by a processor, cause the processor to perform any of the named entity recognition methods described above.

Drawings

Fig. 1 illustrates a block diagram of an example of a named entity recognition device, according to an embodiment of the present disclosure.

Fig. 2 shows a schematic diagram of an example input of an input unit of a named entity recognition device according to an embodiment of the present disclosure.

Fig. 3 shows an example of an output unit including a mask layer and a schematic diagram of NER labeling results according to an embodiment of the present disclosure.

Fig. 4 shows a schematic diagram of a single gated loop unit, GRU, in a named entity recognition arrangement according to an embodiment of the present disclosure.

Fig. 5 shows a schematic diagram of an encoder including a large language model LLM and a bi-directional GRU model according to an embodiment of the present disclosure.

Fig. 6 shows a schematic diagram of an encoder including a low rank adaptive LORA unit according to an embodiment of the present disclosure.

Fig. 7 shows a schematic diagram of an example of low-rank adaptation processing of parameters of a named entity recognition device according to an embodiment of the disclosure.

FIG. 8 illustrates an example flow chart of a named entity recognition method according to an embodiment of this disclosure.

Fig. 9 shows a schematic diagram of a named entity recognition device according to an embodiment of the principles of the present disclosure.

Fig. 10 shows a schematic diagram of a computer-readable storage medium according to an embodiment of the disclosure.

Detailed Description

Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While the embodiments of the present disclosure have been illustrated in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein, but rather, these embodiments are provided so that this disclosure will be more thorough and complete. It should be understood that the drawings and embodiments of the present disclosure are for illustration purposes only and are not intended to limit the scope of the present disclosure.

It should be understood that the various steps recited in the method embodiments of the present disclosure may be performed in a different order and/or performed in parallel. In addition, method embodiments may include other steps and/or omit certain steps.

The term "including" and variations thereof as used herein are intended to be open-ended, i.e., including, but not limited to. The term "based on" is based at least in part on. The term "one embodiment" means "at least one embodiment"; the term "another embodiment" means "at least one additional embodiment"; the term "some embodiments" means "at least some embodiments. Related definitions of other terms will be given in the description below.

It should be appreciated that references to "first," "second," etc. in this disclosure are only for distinguishing between different devices, modules, or units and not for limiting the order or interdependence of the functions performed by these devices, modules, or units.

It should be noted that references to "one", "a plurality" and "a plurality" in this disclosure are intended to be illustrative rather than limiting, and those of ordinary skill in the art will appreciate that "one or more" is intended to be understood as "one or more" unless the context clearly indicates otherwise.

The core concept of the present disclosure is to change the conventional application method to the large language model LLM. That is, in the network structure design with NER as the main task of the present application, the purpose is not to utilize the text generating capability of LLM, but to fully utilize the deep understanding capability of LLM to accurately identify and define various entities in the text to be identified.

For example, by combining the advanced text understanding capabilities of LLM with classical NER techniques, the present disclosure aims to significantly improve the accuracy and efficiency of key information identification in text to be identified (e.g., medical history).

In particular, the present disclosure proposes a novel neural network architecture that can be specifically used for NER in various fields (e.g., oncological case history text). This architecture uniquely combines the advantages of the large language model LLM and the traditional NER approach, with innovative design for each layer or unit to improve accuracy and efficiency of entity recognition.

Fig. 1 illustrates a block diagram of an example of a named entity recognition device 1000, according to an embodiment of the disclosure.

As shown in fig. 1, the named entity recognition device 1000 includes an input unit 1100, an encoder 1200, and an output unit 1300.

The input unit 1100 is configured to receive text to be recognized that includes one or more named entities.

The encoder 1200 includes a large language model LLM and a Bi-gated loop unit GRU model Bi-GRU. The LLM is configured to perform an embedded process (embedding) on the text to be recognized to extract a corresponding feature vector sequence. For example, the LLM in the encoder 1200 may include a GPT model or a BERT model, which are well known to those skilled in the art.

The Bi-GRU is configured to further process the feature vector sequence obtained through the embedded processing of the LLM to capture forward context information of the text to be recognized using a forward GRU component in the Bi-GRU and to capture backward context information of the text to be recognized using a backward GRU component in the Bi-GRU.

The forward context information or the backward context information as described above includes a short-term dependency and a long-term dependency of the tokens in the text to be recognized in time.

For example, a reset gate in the GRU (described in detail later) will be used primarily to capture short term dependencies in time of tokens in the text to be recognized, e.g., dependencies between 3 or 5 tokens in front and behind. An update gate in the GRU (described in detail later) will be mainly used to capture long-term dependencies in time of the tokens in the text to be recognized, for example, dependencies between the tokens of 10 or more distances before and after.

The output unit 1300 is configured to annotate the one or more named entities in the text to be identified based on the captured forward context information and the backward context information.

Fig. 2 shows a schematic diagram of an example input of an input unit 1100 of a named entity recognition device 1000 according to an embodiment of the disclosure.

As shown in fig. 2, the input unit 1100 may receive a piece of medical record text as text to be recognized. The medical record text describes, for example, a text of "patient is not suitable for 2023, 1 and 1 day hospital admission, and gastric cancer is primarily diagnosed". In this case, the NER task of the named entity recognition device 1000 is to recognize various named entities from the above medical record text.

For example, it is assumed that a predetermined dictionary includes named entities such as "disease name", "symptom", "medicine", "date", "treatment means", and the like. After entering the medical record text as described above, we expect the named entity recognition device 1000 to recognize and label "stomach cancer" as a "disease name" class named entity, to recognize and label "stomach discomfort" as a "symptom" class named entity, and to label "2023, 1, and" date "class named entity.

In some embodiments, the input unit 1100 is capable of receiving prompt text in addition to text to be recognized as described above.

For example, the prompt text "patient diagnosed with gastric cancer due to stomach ache" is also optionally shown (in the manner of the dashed arrow) in fig. 2. The prompt text is generally designed as a sentence similar to the text to be recognized for enhancing the recognition capability of the named entity recognition device 1000 for a particular entity type.

Taking the hint text "the patient is diagnosed with gastric cancer due to gastralgia" in fig. 2 as an example, the entity name recognition apparatus 1000 can recognize "gastric cancer" as a "disease name" class named entity with a higher probability because the same word "gastric cancer" as the text to be recognized is included therein.

In case that the text input to the input unit 1100 of the entity name recognition apparatus 1000 includes both the text to be recognized and the hint text, the input unit 1100 will splice the text to be recognized and the hint text to obtain a spliced text, for example, the text finally output from the input unit 1100 and provided to the encoder 1200 will be in the form of "text to be recognized" + "#" + "hint text", i.e. "patient is not suitable for 2023, 1 month, 1 day of stomach admission, preliminary diagnosis is made that gastric cancer # patient is diagnosed with gastric cancer due to stomach pain".

In this case, the LLM in the encoder 1200 in the entity name recognition device 1000 performs embedded processing on the spliced text to extract the corresponding feature vector sequence.

It should be noted that, although the above refers to an example of performing embedded processing on the concatenation of the text to be recognized and the prompt text, this is only for enhancing the recognition capability of the named entity recognition device 1000 on a specific entity type, and is not intended to recognize the named entities in the prompt text together.

In order to ensure that the named entity recognition means 1000 is focused on the NER processing of the text to be recognized, the present application also proposes to use an input masking technique to distinguish between the two parts of text in case the input data input to the input unit 1100 comprises a prompt text.

Fig. 3 shows an example of an output unit 1300 including a mask layer and a schematic diagram of NER labeling results according to an embodiment of the present disclosure.

As shown, the output unit 1300 may include a mask layer and a conditional random field CRF layer. The masking layer is configured to mask the hint text from the spliced text such that the CRF layer in the output unit 1300 labels only one or more named entities in the text to be identified.

For example, as shown in the lower half of fig. 3, in the procedure of NER, the spliced text "patient is not suitable for 2023, 1 st, the prompt text" patient is diagnosed with stomach cancer due to stomach pain "in the preliminary diagnosis of stomach cancer # patient is diagnosed with stomach cancer due to stomach pain" will be masked by the mask layer (displayed in gray ground color), so that only the text to be recognized "patient is not suitable for 2023, 1 st, preliminary diagnosis of stomach cancer" is provided to the CRF layer, and then the CRF can mark each word element for the text to be recognized.

For example, the CRF may label "stomach discomfort" as a "symptom" class named entity, "2023, 1 month, 1 day" as a "date" class named entity, and "stomach cancer" as a "disease name" class named entity.

It should be noted that the above labeling of various named entities with the CRF layer is exemplary and not limiting. In practical applications, those skilled in the art may also adopt other techniques to identify and label entities according to different scenarios, including but not limited to SPAN classification layers or pointer networks.

CRF is a sequence marking algorithm, which can consider the dependency relationship between words in text, for example, has good processing effect on languages with complex dependency relationship such as Chinese. For example, CRF may be employed for labeling for NER tasks where text to be identified is short.

However, for long NER tasks of the text to be identified, the CRF is limited, and better labeling performance can be obtained by using a SPAN classifier or pointer network. This is because in SPAN-based models we focus not only on the contextual representation of each word, but also on word-to-word links. That is, the SPAN classifier considers the entire entity, not just the context information of a single word, enabling more accurate recognition and segmentation of named entities.

As described above, one of the core points of the present disclosure is to enhance the accuracy and efficiency of key information identification in text to be identified (e.g., medical history) by combining advanced text understanding capabilities of LLM with classical NER techniques. That is, the combined use of LLM and Bi-GRU models in the encoder 1300 is one of the core points of the present disclosure to solve the technical problem.

LLM is an artificial intelligence model aimed at understanding and generating human language. They train on a large amount of text data and can perform a wide range of tasks including text summarization, translation, emotion analysis, and so forth. LLMs are characterized by a large scale, containing billions of parameters that can help them learn complex patterns in linguistic data. These models are typically based on deep learning architecture, which helps them to achieve impressive performance on various NLP tasks.

Taking GPT as an example, GPT 3 is LLM trained using training data of about 45 Tb, which means that it has built a strong context learning ability through a very large corpus and has learned the essence of human language civilization so that it has a very strong advanced text understanding ability.

With respect to NER, it is desirable that some mechanisms be able to store important early information in a memory cell, and some mechanisms be able to skip information of some unimportant tokens. Many approaches have been proposed in the academia to address this type of problem, with the earliest being "long short term memory" (LSTM).

Gating the cyclic unit GRU is a somewhat simplified variant compared to LSTM, generally providing the same effect as LSTM, but with a relatively simple structure and a relatively faster computation speed.

As shown in FIG. 4, a single GRU unit consists of a basic reset gateUpdate gate/>The composition is formed. Resetting the gate allows us to control the number of past states that we want to leave, and updating the gate allows us to control how many of the new states are copies of the old state. Reset gate/>Update gate/>Is input by the current time step/>And hidden state/>, of the previous time stepThe outputs of the two gates are given by two fully connected layers using the sigmoid activation function.

Reset gateThe calculation of (2) is given by the following equation (1):

(1)

Wherein, ，/>Is a weight parameter, and/>Is a bias parameter.

For example, equation (1) shows the input of the current time stepAnd hidden state/>, of the previous time stepAdding linear change and then connecting sigmoid activation function to form reset gate/>. Due to the presence of sigmoid, reset gate/>The output value of (2) is between 0 and 1 for selecting how much information is retained.

Similarly, update the doorThe calculation of (2) is given by the following formula (2):

(2)

Wherein, ，/>Is also a weight parameter, and/>Is a bias parameter.

For example, equation (2) shows the input of the current time stepAnd hidden state/>, of the previous time stepAdding the linear changes and then connecting a sigmoid activation function to form an updated gate/>. Updating gates/>, due to the presence of sigmoidThe output value of (2) is also between 0 and 1 for controlling how many of the new states are copies of the old state.

Candidate hidden statesRepresenting the updated value, by resetting the gate/>Hidden state/>, previous time stepInput of the current time step/>The common decision is as shown in equation (3):

(3)

Wherein, ，/>As are weight parameters. Here we use the tanh nonlinear activation function to ensure that the value in the candidate hidden state remains in the interval (-1, 1).

Final output of current time stepGiven by equation (4), by the updated value/>Update gate/>Hidden state/>, previous time stepAnd (3) jointly determining:

(4)

from the above calculations, it can be seen that the GRU can achieve a calculation speed approaching that of LSTM using only two gating switches.

For example, a GRU can better capture dependencies between tokens in sequences that are long in time step distance, because the reset gates therein help capture short-term dependencies in the sequence, and the update gates help capture long-term dependencies in the sequence.

By Bi-directional GRU model or Bi-GRU, we mean a GRU model made up of two GRU components, one forward GRU component for receiving forward input and the other backward GRU component for learning reverse input. Such a network of bi-directional structures can capture both past and future information, thereby modeling more fully the timing relationships in the time series data.

For example, in the example of fig. 5, the Bi-GRU includes a forward GRU component composed of 4 forward GRUs and a backward GRU component composed of 4 backward GRUs. Bi-GRU receives the input X= { X ₁,x₂, x₃,x₄ … … } from LLM and generates the hidden state H= { H ₁,h₂, h₃,h₄ … … } through the forward and backward GRU components.

For example, in the Bi-GRU model, each GRU unit has an update gate and a reset gate as described above to control the flow of information. Through these gating mechanisms, bi-GRU models can adaptively learn interactions between long-term dependencies and multivariate variables in time series data. For example, in the input example shown in fig. 2, the Bi-GRU model may help the model understand the semantic association between "stomach discomfort", "stomach pain" and "gastric cancer" diagnosis.

During training, the difference between the predicted result and the real label can be measured using an appropriate loss function (e.g., a mean square error function) and the connection weights in the network are updated by a back propagation algorithm. Through repeated iterative training, the Bi-GRU model can gradually learn the characteristics and modes of the time series data, so that accurate multivariable time series prediction is realized.

Thus, by combining LLM with Bi-GRU, the advanced text understanding capabilities of LLM can be advantageously combined with the efficient contextual information acquisition capabilities of GRU, resulting in a more efficient and accurate NER device.

In some embodiments, to better adapt the NER apparatus 1000 as described above to a particular scenario or field-specific task, the present disclosure also proposes that a low rank adaptation unit, i.e. a LORA unit as shown in fig. 6, may be included in the encoder 1200. The LORA unit may be configured to low-rank adapt the parameters of the encoder 1200 to fine tune some of the parameters of the named entity recognition device.

The overall training process for the named entity recognition device 1000 as described above will be described below in connection with fig. 6 and 7.

For example, training of the named entity recognition device 1000 may be split into two aspects, pre-training, and fine-tuning.

The pre-training of the named entity recognition device is actually a preliminary training of LLM. For example, pre-training may include: the LLM is trained using data related to a target application scenario such that the LLM is capable of capturing language features and entity types related to the target application scenario.

For example, a suitable base, such as LLAMA 2 70B or QWEN B, may first be selected from an open source large model as an initial text embedding and understanding framework. The model can then be continuously pre-trained on the selected base model using specialized medical record data to better adapt the model to the linguistic features and entity types of the medical field.

In the present disclosure, the purpose of LLM in encoder 1200 is to map input data into a high-dimensional space for processing. However, given that the named entity recognition device 1000 is to be applied to handle a subdivided small task (e.g., NER in a oncology medical record), it is not actually necessary to have a particularly complex large model LLM, and it may be sufficient to make refinement adjustments only within a certain subspace, i.e., without having to fully optimize the overall parameters.

When optimizing a certain subspace parameter, the rank of this subspace parameter matrix is referred to as the intrinsic rank of the corresponding particular task at a certain level (e.g., 90% accuracy) that achieves the performance of the full-scale parameter optimization.

When NER is performed for a particular task, the LORA unit may be used to fine tune parameters of a particular spatial range of the named entity recognition device 1000 to accommodate the particular task.

Thus, the weight matrix in the model may actually have a lower intrinsic rank. Meanwhile, the simpler the downstream task, the lower the corresponding eigenrank. Therefore, we can indirectly train some dense layers in the neural network by optimizing the rank decomposition matrix of the dense layers that varies in the adaptation process, thereby realizing that only the rank decomposition matrix of the dense layers is optimized to achieve the fine tuning effect.

For example, fine tuning the named entity recognition device 1000 may include: training the named entity recognition means using a training dataset comprising labeling information of named entities to adjust part of the parameters of the encoder.

For example, a training dataset that includes labeling information for named entities can be hundreds or thousands of oncology medical records that contain labeling information.

In some embodiments, fine tuning the named entity recognition device 1000 may further comprise: creating a training prompt word with similar semanteme aiming at each training data in the training data set; and training the named entity recognition device by using each training data in the training data set and the corresponding training prompt word.

For example, training prompt words can include prompt text corresponding to semantic approximations of individual training tumor medical records. That is, the fine tuning process focuses not only on the original training data, but also on one semantically similar sentence that is sought for each training data as a hint. This approach facilitates creating a richer context environment, helping the model to better understand and identify the various entities in the medical text.

Further, some of the parameters discussed herein generally refer to parameters of the low rank adaptation (LORA) unit in the encoder and parameters of the bi-directional GRU model, excluding parameters of the LLM.

For example, low-rank adaptation of a part of the parameters of the named entity recognition device 1000 is one of the main means of fine tuning.

As shown in FIG. 7, the LORA hypothesis weight update process also has a lower eigen rank, and the weight parameter matrix is the total weight for model pre-training(D is the upper layer output dimension) whose update can be represented using a low rank decomposition as shown in equation (5):

(5)

wherein, in the updating process, the weight parameters are pre-trained Frozen, only parameters a and B are updated, k representing the input dimension of the next layer.

For example, at the beginning of training, random Gaussian initialization is used for A, i.eUsing zero initialization for B, i.e./>A and B may then be optimized, for example, using an Adam optimizer.

In the case of only partial model parametersIn the case of updates, the named entity device 1000 with billions of parameters can be trimmed to a very precise degree using only hundreds or thousands of samples, because the pre-trained model has very small internal dimensions, i.e., there is a very low dimensional parameter, which can be as effective as trimming in full parameter space.

The disclosure also provides a named entity recognition method based on the large language model. Fig. 8 illustrates an example flowchart of a named entity recognition method 2000, according to an embodiment of this disclosure.

As shown in fig. 8, the named entity recognition method 2000 based on the large language model may include the steps of:

s2100, receiving text to be identified comprising one or more named entities by using an input unit;

S2200, performing embedded processing on the text to be identified by using a large language model LLM in the encoder so as to extract a corresponding feature vector sequence;

S2300, processing the feature vector sequence through a bidirectional gating loop unit (GRU) model in the encoder to capture forward context information of the text to be recognized by using a forward GRU component in the bidirectional GRU model and to capture backward context information of the text to be recognized by using a backward GRU component in the bidirectional GRU model, wherein the forward context information or the backward context information comprises short-term dependency and long-term dependency of a word element in the text to be recognized in time; and

And S2400, marking the one or more named entities in the text to be identified based on the captured front context information and the rear context information by using an output unit.

The various method steps illustrated in fig. 8 may be advantageously performed by the named entity recognition device 1000 as discussed above with respect to fig. 1.

For example, the input unit in step S2100 may be similar to the input unit 1100 in the named entity recognition device 1000 discussed with respect to fig. 1, and the encoders in steps S2200 and S2300 may be similar to the encoder 1200 in the named entity recognition device 1000 discussed with respect to fig. 1. In addition, the LLM and Bi-directional GRU models in the encoder in steps S2200 and S2300 may also be similar to the LLM and Bi-GRU models in the encoder 1200 discussed with respect to fig. 1. Likewise, the output unit in step S2400 may be similar to the output unit 1300 discussed with respect to fig. 1.

Furthermore, the functions and features of the various units in the named entity recognition device 1000, the operations performed, and the various algorithms discussed above with respect to fig. 1-7 are equally applicable to the named entity recognition method 2000 shown in fig. 8 and variants thereof, unless the context clearly indicates otherwise or clearly inapplicable.

In some examples, named entity recognition method 2000 may further include the steps of: receiving a prompt text by using the input unit and splicing the text to be recognized and the prompt text to obtain a spliced text; and performing embedded processing on the spliced text by using the LLM to extract a corresponding feature vector sequence.

In some examples, named entity recognition method 2000 may further include the steps of: masking the hint text from the concatenated text using a masking layer in the output unit such that the output unit labels only the one or more named entities in the text to be identified.

In some examples, named entity recognition method 2000 may further include the steps of: the parameters of the encoder are low-rank decomposed using a low-rank adaptive LORA unit in the encoder to reduce the amount of parameters of the encoder that need to be trained.

In some examples, the LLM used in named entity recognition method 2000 may be a GPT model or a BERT model.

In some examples, the text to be identified received in the named entity recognition method 2000 can be medical record text associated with a disease.

In some examples, named entity recognition method 2000 may further include the steps of: the named entity recognition device is pre-trained and fine-tuned before receiving the text to be recognized.

In some examples, in named entity recognition method 2000, pre-training the named entity recognition device includes: training the LLM using data related to a target application scenario to enable the LLM to capture language features and entity types related to the target application scenario.

In some examples, in named entity recognition method 2000, fine tuning the named entity recognition device includes: training the named entity recognition means using a training dataset comprising labeling information of named entities to adjust part of the parameters of the encoder.

In some examples, in named entity recognition method 2000, pre-training the named entity recognition device includes a full-scale weighting parameter for the named entity recognition deviceOptimizing, wherein d represents the output dimension of the last layer; trimming the named entity recognition device includes decomposing the full weight parameter W and the partial parameter/>, using the following formulaOr/>：

In some examples, in named entity recognition method 2000, the partial parameters of the encoder include parameters of a low rank adaptive LORA unit in the encoder and parameters of the bi-directional GRU model, excluding parameters of the LLM.

In some examples, in the named entity recognition method 2000, fine tuning the named entity recognition device further includes: creating a training prompt word with similar semanteme aiming at each training data in the training data set; and training the named entity recognition device by using each training data in the training data set and the corresponding training prompt word.

The embodiment of the disclosure also provides named entity recognition equipment. Fig. 9 shows a schematic diagram of a named entity recognition device according to an embodiment of the principles of the present disclosure.

As shown in fig. 9, the 3000 according to the present embodiment includes a processor 3100, a storage portion 3200, a communication portion 3300, an input/output portion 3400, and a display portion 3500, which are coupled to an I/O interface 3600.

The processor 3100 is a program control device such as a microprocessor, for example, which operates according to a program installed in the named entity recognition device 3000. The storage section 3200 is, for example, a storage element such as ROM or RAM. A program to be executed by the processor 3100 or the like may be stored in the storage section 3200. The communication section 3300 is, for example, a communication interface such as a wireless LAN module. The input/output portion 3400 is, for example, an input/output port such as an HDMI (registered trademark) (high definition multimedia interface) port, a USB (universal serial bus) port, or an AUX (auxiliary) port. The display portion 3500 is, for example, a display such as a liquid crystal display or an organic EL (electroluminescence) display.

The named entity recognition device 3000 shown in fig. 9 may be used to implement the named entity recognition method proposed by the present disclosure.

For example, named entity recognition methods according to embodiments of the present disclosure may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a non-transitory computer readable medium, the computer program comprising program code for performing the named entity recognition method described above. In such an embodiment, the computer program may be downloaded and installed from the network through the communication section 3300 or installed from the storage section 3200. The functions defined in the named entity recognition method provided by the embodiments of the present disclosure may be performed when the computer program is executed by the named entity recognition device 3000. The named entity recognition method is described in detail above with reference to the accompanying drawings, and is not described in detail herein.

Embodiments of the present disclosure also provide a non-transitory computer readable storage medium. Fig. 10 shows a schematic diagram of a computer-readable storage medium 4000 in accordance with an embodiment of the principles of the present disclosure. The computer readable storage medium 4000 stores therein computer program instructions 4100, wherein the computer program instructions 4100, when executed by a processor, perform the named entity recognition method provided by the embodiments of the present disclosure.

In the above description, the present disclosure has been described based on the embodiments. The present embodiment is merely illustrative, and it will be understood by those skilled in the art that the combination of the constituent elements and processes of the present embodiment may be modified in various ways, and such modifications are also within the scope of the present disclosure.

Claims

1. A named entity recognition method based on a large language model comprises the following steps:

receiving text to be identified comprising one or more named entities using an input unit;

Using a large language model LLM in the encoder to carry out embedded processing on the text to be identified so as to extract a corresponding feature vector sequence;

Processing the feature vector sequence by a bi-directional gating loop unit model in the encoder to capture forward context information of the text to be identified using a forward gating loop unit component in the bi-directional gating loop unit model, and to capture backward context information of the text to be identified using a backward gating loop unit component in the bi-directional gating loop unit model, the forward context information or the backward context information comprising short-term and long-term dependencies in time of tokens in the text to be identified; and

The one or more named entities in the text to be identified are annotated based on the captured forward context information and the backward context information using an output unit.

2. The named entity recognition method of claim 1, further comprising:

Receiving a prompt text by using the input unit and splicing the text to be recognized and the prompt text to obtain a spliced text;

And performing embedded processing on the spliced text by using the LLM to extract a corresponding feature vector sequence.

3. The named entity recognition method of claim 2, further comprising:

Masking the hint text from the concatenated text using a masking layer in the output unit such that the output unit labels only the one or more named entities in the text to be identified.

4. A named entity recognition method as claimed in any one of claims 1 to 3, further comprising:

the parameters of the encoder are low-rank decomposed using a low-rank adaptive LORA unit in the encoder to reduce the amount of parameters of the encoder that need to be trained.

5. A named entity recognition method as claimed in any one of claims 1 to 3, wherein the LLM is a GPT model or a BERT model.

6. A named entity recognition method as claimed in any one of claims 1 to 3, wherein the text to be recognized is medical record text associated with a disease.

7. A named entity recognition method as claimed in any one of claims 1 to 3, further comprising:

the named entity recognition device is pre-trained and fine-tuned before receiving the text to be recognized.

8. The named entity recognition method of claim 7, wherein pre-training the named entity recognition device comprises:

training the LLM using data related to a target application scenario such that the LLM captures language features and entity types related to the target application scenario.

9. The named entity recognition method of claim 7, wherein fine tuning the named entity recognition device comprises:

Training the named entity recognition means using a training dataset comprising labeling information of named entities to adjust part of the parameters of the encoder.

10. The named entity recognition method of claim 7, wherein,

Pretraining the named entity recognition device includes a full weight parameter of the named entity recognition deviceOptimizing, wherein d represents the output dimension of the last layer;

Trimming the named entity recognition device includes decomposing the full weight parameter W and updating part of the parameters using the following formula Or/>：

，

11. The named entity recognition method of claim 9, wherein,

The partial parameters of the encoder include parameters of a low rank adaptive LORA unit in the encoder and parameters of the bi-directional gating loop unit model, excluding parameters of the LLM.

12. The named entity recognition method of claim 9, wherein fine tuning the named entity recognition device further comprises:

Creating a training prompt word with similar semanteme aiming at each training data in the training data set; and

And training the named entity recognition device by using each training data in the training data set and the corresponding training prompt word.

13. A named entity recognition device based on a large language model, comprising:

an input unit configured to receive text to be identified comprising one or more named entities;

An encoder comprising a large language model LLM and a bi-directional gating cyclic unit model, the LLM configured to perform embedded processing on the text to be identified to extract a corresponding sequence of feature vectors, the bi-directional gating cyclic unit model configured to process the sequence of feature vectors to capture forward context information of the text to be identified using a forward gating cyclic unit component in the bi-directional gating cyclic unit model, and to capture backward context information of the text to be identified using a backward gating cyclic unit component in the bi-directional gating cyclic unit model, the forward context information or the backward context information comprising short-term and long-term dependencies over time of tokens in the text to be identified; and

An output unit configured to annotate the one or more named entities in the text to be identified based on the captured forward context information and the backward context information.

14. The named entity recognition device of claim 13, wherein,

The input unit also receives a prompt text and splices the text to be recognized and the prompt text to obtain a spliced text; and

And the LLM performs embedded processing on the spliced text to extract a corresponding feature vector sequence.

15. The named entity recognition device of claim 14, wherein,

The output unit further includes a masking layer configured to mask the hint text from the concatenated text such that the output unit labels only the one or more named entities in the text to be identified.

16. The named entity recognition device of any one of claims 13 to 15, wherein,

The encoder also includes a low-rank adaptation (LORA) unit configured to low-rank decompose parameters of the encoder to reduce an amount of parameters required to train the named entity recognition device.

17. The named entity recognition device of any of claims 13 to 15, wherein the large language model is a GPT model or a BERT model.

18. The named entity recognition device of any one of claims 13 to 15, wherein the text to be recognized is medical record text associated with a disease.

19. An apparatus for named entity recognition, comprising:

a processor;

A memory storing one or more computer program modules;

Wherein the one or more computer program modules are configured to, when executed by the processor, perform the named entity recognition method of any of claims 1 to 12.

20. A non-transitory computer readable storage medium having instructions stored thereon that, when executed by a processor, cause the processor to perform the named entity recognition method of any of claims 1 to 12.