CN113177411A - Training method of named entity recognition model and named entity recognition method - Google Patents

Training method of named entity recognition model and named entity recognition method Download PDF

Info

Publication number
CN113177411A
CN113177411A CN202110349239.XA CN202110349239A CN113177411A CN 113177411 A CN113177411 A CN 113177411A CN 202110349239 A CN202110349239 A CN 202110349239A CN 113177411 A CN113177411 A CN 113177411A
Authority
CN
China
Prior art keywords
training
named entity
entity recognition
sample
recognition model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110349239.XA
Other languages
Chinese (zh)
Inventor
韩瑞峰
杨红飞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Firestone Technology Co ltd
Original Assignee
Hangzhou Firestone Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Firestone Technology Co ltd filed Critical Hangzhou Firestone Technology Co ltd
Priority to CN202110349239.XA priority Critical patent/CN113177411A/en
Publication of CN113177411A publication Critical patent/CN113177411A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computing Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Databases & Information Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application relates to a training method of a named entity recognition model and a named entity recognition method, wherein a training set is obtained, wherein the training set is a labeled sample of a similar field of a target field; training a named entity recognition model by using a training set, wherein each training round comprises the following steps: the method comprises the steps of inputting a training query set and a training support set into a BERT layer of a named entity recognition model, obtaining an emission score and a transfer score of a training query set sample, inputting the emission score and the transfer score of the training query set sample into a CRF layer of the named entity recognition model, obtaining loss of the named entity recognition model, adjusting parameters of the named entity recognition model according to the loss, taking a labeled sample in a source field as a training set, transferring knowledge learned by the named entity recognition model to a scene of a small number of labeled samples by using existing labeled resources, and solving the problems that the acquisition cost of the labeled sample is high, work cannot be carried out before a sufficient amount of labeled samples in a target field are obtained, and the development efficiency is low.

Description

Training method of named entity recognition model and named entity recognition method
Technical Field
The present application relates to the field of natural language processing technologies, and in particular, to a training method for a named entity recognition model and a method for named entity recognition.
Background
Named Entity (Named Entity) refers to an Entity word with specific meaning or strong representativeness in a certain field, such as an event name, a place name, a character name and the like in the news field, Named Entity Recognition (called Entity Recognition, referred to as NER) refers to an Entity with specific meaning in a Recognition text, mainly comprising a person name, a place name, an organization name, a proper noun and the like. The training of the named entity recognition model with supervised learning depends on large-scale labeled corpora, the labeled corpora are usually obtained by artificial active labeling, the labeled corpora obtaining cost is high, a large amount of labeled corpora are needed to train to obtain an available named entity recognition model, before a large amount of labeled corpora are obtained, work cannot be carried out, and the development cost is transferred to the labeling of the corpora, so that the overall development efficiency is low.
At present, no effective solution is provided for the problems of high labeling cost and low development efficiency caused by the fact that training of a named entity recognition model with supervised learning in the related technology depends on large-scale labeled corpora.
Disclosure of Invention
The embodiment of the application provides a training method of a named entity recognition model and a named entity recognition method, and aims to at least solve the problems that training of the named entity recognition model with supervised learning in the related technology depends on large-scale marking corpora, so that marking cost is high and development efficiency is low.
In a first aspect, an embodiment of the present application provides a training method for a named entity recognition model, where the named entity recognition model includes a BERT-CRF model, and the training method includes:
acquiring a training set, wherein the training set comprises a plurality of batches of meta-training data, each batch of meta-training data comprises a training support set and a training query set, and the training set is a labeled sample of a similar field of a target field;
training a named entity recognition model with each batch of the meta-training data, wherein each training round comprises: inputting the training query set and the training support set into a BERT layer of a BERT-CRF model in a named entity recognition model, obtaining an emission score and a transfer score of a sample of the training query set, inputting the emission score and the transfer score of the sample of the training query set into a CRF layer of the BERT-CRF model in the named entity recognition model, obtaining a loss function of the named entity recognition model, and adjusting parameters of the named entity recognition model according to the loss function of the named entity recognition model.
In some embodiments, in each of the training processes, the training method further includes:
obtaining a test set, wherein the test set comprises a test support set and a test query set, and the test set is a labeled sample of a target field;
inputting the test support set and the test query set into a BERT layer of a BERT-CRF model in the named entity recognition model, and acquiring an emission score and a transfer score of a sample of the test query set;
inputting the emission score and the transfer score of the test query set sample into a CRF layer of a BERT-CRF model in a named entity recognition model, and acquiring a test category label of the test query set sample;
and judging the accuracy of the test set according to the test category label and the real category label of the test query set sample, stopping training if the accuracy of the test set is greater than or equal to a preset value, and continuing to train the named entity recognition model by using the next batch of meta-training data if the accuracy of the test set is less than the preset value.
In some embodiments, inputting the training query set and the training support set into a BERT layer of a BERT-CRF model in a named entity recognition model, and obtaining emission scores for samples of the training query set comprises:
inputting the training query set and the training support set into a BERT layer of a BERT-CRF model in a named entity recognition model to obtain a feature representation vector of a sample of the training support set and a feature representation vector of a sample of the training query set;
obtaining a predefined anchor class characteristic representation vector which is represented by a loss feedback regulation vector, wherein the number of the anchor class characteristic representation vectors is greater than or equal to the sum of the number of classes in a source field and a target field;
obtaining a feature representation vector of each class label in the training support set according to the feature representation vector of the training support set sample, and calculating singular value decomposition on a difference vector between the feature representation vector of each class label in the training support set and the anchor class feature representation vector to obtain a feature mapping function;
and according to the feature mapping function, calculating the similarity between the feature representation vector of the training query set sample and the anchor category feature representation vector to obtain the emission score of the training query set sample.
In some embodiments, obtaining the feature representation vector of each class label in the training support set according to the feature representation vector of the training support set sample includes:
and according to the class label of each word in the training support set sample, acquiring the feature expression vector of the corresponding word in each training support set sample, and calculating the average value of the feature expression vectors of all the words under each class to serve as the feature expression vector of each class label in the training support set.
In some embodiments, inputting the training query set and the training support set into a BERT layer of a BERT-CRF model in a named entity recognition model, and obtaining the transition scores of the training query set samples comprises:
and acquiring a label category transfer matrix according to the label category in the training query set sample, adjusting the label category transfer matrix according to the loss return in the training, and acquiring the transfer score of the training query set sample according to the label category transfer matrix.
In some embodiments, before the acquiring the training set and the test set, the training method further includes:
and acquiring labeled data sets of a plurality of source fields of similar fields of the target field according to the target field, wherein the labeled data set of a single field is a batch of meta-training data, and the number of various types of labels contained in each batch of meta-training data is balanced, wherein the labeled data sets of the source fields are training sets, and the labeled data set of the target field is a test set.
In some embodiments, after equalizing the number of the types of labels included in each batch of the meta-training data, the training method further includes:
with the serial number in every word or sub-word correspondence model vocabulary, and the serial number of the label that every word or sub-word correspond in all labels in the field of belongings preserves in the field data, will sample in the meta-training data is in serial number among the field data is preserved in the metadata serial number, every batch meta-training data passes through the field data with the metadata serial number loads, wherein, every batch the mark data set of meta-training data includes a plurality of samples, and every sample includes a plurality of words or sub-words, and every word or sub-word all has the label.
In a second aspect, an embodiment of the present application provides a method for recognizing a named entity, where the method performs named entity recognition by using a named entity recognition model obtained by the named entity recognition model training method in the first aspect, and the method includes:
acquiring a prediction support set and a prediction query set of a target field, wherein the prediction support set is a labeled data set, and the prediction query set is an unlabeled data set;
and inputting the prediction support set and the prediction query set into a named entity recognition model to obtain a prediction category label of the prediction query set sample.
In a third aspect, an embodiment of the present application provides an electronic device, which includes a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor, when executing the computer program, implements the method for training a named entity recognition model and the method for named entity recognition as described above.
In a fourth aspect, the present application provides a storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the method for training a named entity recognition model and the method for named entity recognition as described above.
Compared with the related art, the training method of the named entity recognition model provided by the embodiment of the application comprises the steps that the named entity recognition model comprises a BERT-CRF model, a training set is obtained, the training set comprises a plurality of batches of meta-training data, each batch of meta-training data comprises a training support set and a training query set, and the training set is a labeled sample of a similar field of a target field; training the named entity recognition model by using each batch of meta-training data, wherein each round of training comprises the following steps: inputting a training query set and a training support set into a BERT layer of a BERT-CRF model in a named entity recognition model, acquiring an emission score and a transfer score of a sample of the training query set, inputting the emission score and the transfer score of the sample of the training query set into a CRF layer of the BERT-CRF model in the named entity recognition model, acquiring loss of the named entity recognition model, adjusting parameters of the named entity recognition model according to the loss of the named entity recognition model, taking a labeled sample of a source field as a training set when a target field only has a very small amount of labeled samples and a source field has a large amount of labeled samples, skillfully utilizing existing labeling resources, transferring the knowledge learned by the named entity recognition model to a scene of a small amount of labeled samples, solving the problem that the labeled samples are high in acquisition cost and cannot work before a sufficient amount of labeled samples of the target field are acquired, the whole development efficiency is low.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:
FIG. 1 is a flow diagram of a method of training a named entity recognition model according to an embodiment of the present application;
FIG. 2 is a flow diagram of testing a named entity recognition model according to an embodiment of the present application;
FIG. 3 is a flow diagram of a method of named entity identification according to an embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be described and illustrated below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments provided in the present application without any inventive step are within the scope of protection of the present application. Moreover, it should be appreciated that in the development of any such actual implementation, as in any engineering or design project, numerous implementation-specific decisions must be made to achieve the developers' specific goals, such as compliance with system-related and business-related constraints, which may vary from one implementation to another.
Reference in the specification to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the specification. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Those of ordinary skill in the art will explicitly and implicitly appreciate that the embodiments described herein may be combined with other embodiments without conflict.
Unless defined otherwise, technical or scientific terms referred to herein shall have the ordinary meaning as understood by those of ordinary skill in the art to which this application belongs. Reference to "a," "an," "the," and similar words throughout this application are not to be construed as limiting in number, and may refer to the singular or the plural. The present application is directed to the use of the terms "including," "comprising," "having," and any variations thereof, which are intended to cover non-exclusive inclusions; for example, a process, method, system, article, or apparatus that comprises a list of steps or modules (elements) is not limited to the listed steps or elements, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus. Reference to "connected," "coupled," and the like in this application is not intended to be limited to physical or mechanical connections, but may include electrical connections, whether direct or indirect. Reference herein to "a plurality" means greater than or equal to two. "and/or" describes an association relationship of associated objects, meaning that three relationships may exist, for example, "A and/or B" may mean: a exists alone, A and B exist simultaneously, and B exists alone. Reference herein to the terms "first," "second," "third," and the like, are merely to distinguish similar objects and do not denote a particular ordering for the objects.
Before describing embodiments of the present invention in detail, some of the terms used therein will be explained as follows:
the BERT model, a language model developed and released by Google (Google) in the end of 2018. All referred to as bi-directional Encoder characterizations (Bidirectional Encoder Representations) from the transformer.
CRF (conditional Random fields), which represents a conditional Random field, is a probabilistic undirected graph model for solving a conditional probability P (y x) given an input Random variable x. What the conditional random field model needs to model is the conditional probability distribution of the input variables and the output variables. Conditional random fields are commonly used to label or analyze sequence data, such as natural language text or biological sequences. When the method is used for sequence marking, the input and output random variables are two sequences with equal length.
An entity tag is a tag for indicating whether or not a corresponding word is an entity and which kind of entity it is.
The embodiment provides a method for training a named entity recognition model, where the named entity recognition model includes a BERT-CRF model, fig. 1 is a flowchart of a method for training a named entity recognition model according to an embodiment of the present application, and as shown in fig. 1, the method includes the following steps:
step S101, a training set is obtained, wherein the training set comprises a plurality of batches of meta-training data, each batch of meta-training data comprises a training support set and a training query set, and the training set is a labeled sample of a similar field of a target field; in this embodiment, a similar field of a target field is referred to as a source field, the source field is used as a training set, a labeled sample of each source field is used as a set of meta-training data, each set of meta-training data includes N training support set samples and M training query set samples, N and M are integers arbitrarily greater than 1, and a training sample in the training set includes a training sentence and a category label corresponding to an entity in the training sentence;
step S102, training the named entity recognition model by using each batch of meta-training data, wherein each round of training comprises the following steps: inputting the training query set and the training support set into a BERT layer of a BERT-CRF model in the named entity recognition model, obtaining an emission score and a transfer score of a training query set sample, inputting the emission score and the transfer score of the training query set sample into a CRF layer of the BERT-CRF model in the named entity recognition model, obtaining a loss function of the named entity recognition model, and adjusting parameters of the named entity recognition model according to the loss function of the named entity recognition model.
Through the above steps S101 and S102, compared to the related art, when identifying the named entity in the target domain, a large number of labeled samples of the target field are needed to train the named entity model, the trained named entity recognition model can only recognize the named entity in the target field, and the labeled samples of the target field need to be obtained and manually labeled, which results in high cost, the named entity recognition model training method has the advantages that the named entity model is trained by using the labeling samples of the similar fields of the target field, a large amount of labeling resources of the similar fields are utilized, the knowledge learned by the named entity recognition model is transferred to the target field scene with only a small amount of labeling samples, the cost of manual labeling samples is greatly reduced, and the overall development efficiency is improved.
In some embodiments, after training the named entity recognition model with the labeled sample of the source field, the labeled sample of the target field is used to test the named entity recognition model, and it is determined whether the named entity recognition model needs to be trained continuously according to the test result, fig. 2 is a flowchart of testing the named entity recognition model according to the embodiment of the present application, and as shown in fig. 2, the testing method includes the following steps:
step S201, in each training process, obtaining a test set, wherein the test set comprises a test support set and a test query set, and the test set is a very small number of labeled samples in a target field;
step S202, inputting the test support set and the test query set into a BERT layer of a BERT-CRF model in the named entity recognition model, and acquiring an emission score and a transfer score of a test query set sample; inputting the emission fraction and the transfer fraction of the test query set sample into a CRF layer of a BERT-CRF model in a named entity recognition model, and acquiring a test category label of the test query set sample;
step S203, judging the accuracy of the test set according to the test category label and the real category label of the test query set sample, stopping training if the accuracy of the test set is greater than or equal to a preset value, and continuing to train the named entity recognition model by using the next batch of training sets if the accuracy of the prediction result is less than the preset value.
Through the steps S201 to S203, when there are only a very small number of labeled samples in the target field and the named entity recognition model is not enough to be trained, the named entity recognition model is trained by using a large number of labeled samples in the near field of the target field, and the named entity recognition model is tested by using the very small number of labeled samples in the target field, if the accuracy of the test result does not meet the requirement, the named entity recognition model is trained by using a large number of labeled samples in the near field of the target field, and when the accuracy of the test result meets the requirement, the named entity recognition model is trained, so that the requirement for conducting named entity recognition on unlabeled samples in the target field is met, the named entity recognition on unlabeled samples in the target field is conducted, and the accuracy of the recognition result can be ensured.
In some embodiments, inputting the training query set and the training support set into a BERT layer of a BERT-CRF model in the named entity recognition model, and obtaining the emission scores of the training query set samples comprises:
inputting the training query set and the training support set into a BERT layer of a BERT-CRF model in the named entity recognition model to obtain a feature expression vector of a training support set sample and a feature expression vector of the training query set sample; wherein, because the training query set sample may present different meanings after being spliced with different training support set samples, in order to better represent the character of the character in the sample, after connecting the training query set sample with each sample in the training support set, calculating the character of each character in the training query set sample and the training support set sample through the unsupervised model (BERT model), obtaining the character of each query-support sample pair, calculating the average value, cutting the character expression vector of the training query set sample and the character expression vector of the training support set sample according to the length before splicing, for example, before splicing the training query set sample and the training support set sample, one training query set sample has 100 characters, one training support set sample has 50 characters, the sentence after splicing is 150 characters, obtaining the character of 150 characters, cutting according to the length before splicing, the first 100 words are characterized by the feature representation vectors of the training query set samples, and the last 50 words are characterized by the feature representation vectors of the training support set samples.
Acquiring a predefined anchor class characteristic expression vector, wherein the anchor class characteristic expression vector is expressed by a loss feedback regulation vector, and the number of the anchor class characteristic expression vectors is greater than or equal to the sum of the number of classes in the source field and the target field; in this embodiment, the anchor category feature represents that the vector dimension is consistent with the vector dimension of each word in the sample;
obtaining a feature representation vector of each class label in the training support set according to the feature representation vector of the training support set sample, and calculating singular value decomposition on a difference vector between the feature representation vector of each class label in the training support set and the anchor class feature representation vector to obtain a feature mapping function; in this embodiment, for the class label of each word in the training support set sample, the feature representation vector of the corresponding word in each sample is found, and an average Value of the feature representation vectors of all words under the class label is calculated as the feature representation vector of each class label in the training support set, where the average Value of the feature representation vectors of all words is obtained by dividing by the L1 mode of the feature representation vector of each class label in the training support set, and normalizing the feature representation vector of each class label in the training support set, and Singular Value Decomposition (SVD) is an algorithm widely applied in the machine learning field, and SVD may not be used for feature Decomposition in a dimension reduction algorithm, and may also be used in the fields of recommendation systems, natural language processing, and the like.
And according to the feature mapping function, calculating the similarity between the feature representation vector of the training query set sample and the anchor category feature representation vector to obtain the emission score of the training query set sample. In this embodiment, after the feature expression vector and the anchor category feature expression vector of the training query set sample are mapped to the same space by using the feature mapping function, the feature expression vector of each word in the training query set sample and each anchor category feature expression vector are used to calculate the dot product similarity, and the similarity between the feature expression vector of each word and all the anchor category feature expression vectors is obtained, that is, the emission score of each word is obtained.
In some embodiments, inputting the training query set and the training support set into a BERT layer of a BERT-CRF model in the named entity recognition model, and obtaining the transition scores of the training query set samples comprises:
and obtaining a label class transfer matrix according to the label class in the training query set sample, returning and adjusting the label class transfer matrix according to the loss in the training, and obtaining the transfer score of the training query set sample according to the label class transfer matrix. In this embodiment, the named entity identification has three labeling methods for label categories, which are, respectively, BIO, BMES and biees, where by way of example, BIO, B-X represents the beginning of entity X, I-X represents the end of entity, O represents that the entity does not belong to any type, and according to the label category set by BIO, 3 rows and 5 columns of label category transition matrices are preset, 3 rows correspond to 3 label categories of BIO, and 5 columns correspond to probabilities of transferring from BIO to "O, B of the same type, B of different types, I of the same type, and I of different types", where the same and different types refer to whether the named entity types are the same as "person name" and "place name", and the label category transition matrices are instantiated and expanded for the number of label types of the training query set, that is, the types are changed into all types of the training query set, for example, the training query has "person name" and "place name" 2 type labels in common, then the label category transition matrix is expanded into 5 rows and 5 columns, wherein 5 rows correspond to O, B-person name, I-person name, B-place name and I-place name, 5 columns correspond to O, B-person name, I-person name, B-place name and I-place name, and the transition scores of the training query set sample can be obtained according to the label category transition matrix.
In some embodiments, before the training set and the test set are obtained, according to the target field, the labeled data sets of multiple source fields in close fields of the target field are obtained, and the labeled data set of a single field is a set of meta-training data, so that the number of various types of labels contained in each set of meta-training data is balanced. In this embodiment, in order to ensure that the number of each type of label included in each batch of meta-training data is as uniform as possible, a portion of each sample with each type of label is added to each batch of meta-training data, and then the number of samples of labels including too many samples is reduced, and it is ensured that the number of samples of other labels is not affected when the samples are reduced.
In some embodiments, after the number of each type of tag included in each batch of meta-training data is equalized, each word or sub-word (word) is stored in domain data (domain _ data) corresponding to a sequence number in a model word list and sequence numbers of all tags in a domain corresponding to each word or sub-word, the sequence numbers of samples in the domain data in the meta-training data are stored in a meta-data sequence number (meta _ data _ idx), and each batch of meta-training data is loaded through domain _ data and meta _ data _ idx, wherein a labeled data set of each batch of meta-training data includes a plurality of samples, each sample includes a plurality of words or sub-words, and each word or sub-word is tagged. In this embodiment, in order to accelerate the loading speed of the meta training data during training and reduce the memory required for loading the meta training data, each batch of meta training data is loaded through domain _ data and meta _ data _ idx, that is, the serial number of each batch of meta training data in the meta _ data _ idx is used to find corresponding data from the domain _ data to form each batch of meta training data, wherein the word model is one of Subword models, the partition granularity of the Subword models is between words and characters, for example, "looking" can be divided into two subwords of "look" and "ing", and the divided "look" and "ing" can be used to construct other words, for example, "look" and "ed" subwords can form word "hooked", so that the method of Subword can greatly reduce the size of the model word table, and simultaneously can better process similar words.
The embodiment of the present application further provides a method for identifying a named entity, fig. 3 is a flowchart of the method for identifying a named entity according to the embodiment of the present application, and as shown in fig. 3, the method for identifying a named entity performs named entity identification by using a named entity identification model obtained by any one of the above named entity identification model training methods, and the method includes the following steps:
step S301, a prediction support set and a prediction query set of a target field are obtained, wherein the prediction support set is a labeled data set, and the prediction query set is an unlabeled data set;
step S302, inputting the prediction support set and the prediction query set into the named entity recognition model to obtain a prediction category label of the prediction query set sample.
Through the steps S301 to S302, the named entity recognition model trained by the labeled samples in the similar fields of the target field is used for recognizing the unlabeled samples in the target field.
It should be noted that the steps illustrated in the above-described flow diagrams or in the flow diagrams of the figures may be performed in a computer system, such as a set of computer-executable instructions, and that, although a logical order is illustrated in the flow diagrams, in some cases, the steps illustrated or described may be performed in an order different than here.
The present embodiment also provides an electronic device comprising a memory having a computer program stored therein and a processor configured to execute the computer program to perform the steps of any of the above method embodiments.
Optionally, the electronic apparatus may further include a transmission device and an input/output device, wherein the transmission device is connected to the processor, and the input/output device is connected to the processor.
It should be noted that, for specific examples in this embodiment, reference may be made to examples described in the foregoing embodiments and optional implementations, and details of this embodiment are not described herein again.
In addition, in combination with the training method of the named entity recognition model and the method of named entity recognition in the above embodiments, the embodiments of the present application may provide a storage medium to implement. The storage medium having stored thereon a computer program; the computer program, when executed by a processor, implements a method of training a named entity recognition model and a method of named entity recognition in any of the above embodiments.
In one embodiment, a computer device is provided, which may be a terminal. The computer device includes a processor, a memory, a network interface, a display screen, and an input device connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a method of training a named entity recognition model and a method of named entity recognition. The display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, a key, a track ball or a touch pad arranged on the shell of the computer equipment, an external keyboard, a touch pad or a mouse and the like.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in the embodiments provided herein may include non-volatile and/or volatile memory, among others. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).
It should be understood by those skilled in the art that various features of the above-described embodiments can be combined in any combination, and for the sake of brevity, all possible combinations of features in the above-described embodiments are not described in detail, but rather, all combinations of features which are not inconsistent with each other should be construed as being within the scope of the present disclosure.
The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims (10)

1. A method for training a named entity recognition model, wherein the named entity recognition model comprises a BERT-CRF model, the method comprising:
acquiring a training set, wherein the training set comprises a plurality of batches of meta-training data, each batch of meta-training data comprises a training support set and a training query set, and the training set is a labeled sample of a similar field of a target field;
training a named entity recognition model with each batch of the meta-training data, wherein each training round comprises: inputting the training query set and the training support set into a BERT layer of a BERT-CRF model in a named entity recognition model, obtaining an emission score and a transfer score of a sample of the training query set, inputting the emission score and the transfer score of the sample of the training query set into a CRF layer of the BERT-CRF model in the named entity recognition model, obtaining a loss function of the named entity recognition model, and adjusting parameters of the named entity recognition model according to the loss function of the named entity recognition model.
2. The training method of claim 1, wherein during each training round, the training method further comprises:
obtaining a test set, wherein the test set comprises a test support set and a test query set, and the test set is a labeled sample of a target field;
inputting the test support set and the test query set into a BERT layer of a BERT-CRF model in the named entity recognition model, and acquiring an emission score and a transfer score of a sample of the test query set;
inputting the emission score and the transfer score of the test query set sample into a CRF layer of a BERT-CRF model in a named entity recognition model, and acquiring a test category label of the test query set sample;
and judging the accuracy of the test set according to the test category label and the real category label of the test query set sample, stopping training if the accuracy of the test set is greater than or equal to a preset value, and continuing to train the named entity recognition model by using the next batch of meta-training data if the accuracy of the test set is less than the preset value.
3. The training method of claim 1, wherein the training query set and the training support set are input to a BERT layer of a BERT-CRF model in a named entity recognition model, and obtaining the emission scores of the training query set samples comprises:
inputting the training query set and the training support set into a BERT layer of a BERT-CRF model in a named entity recognition model to obtain a feature representation vector of a sample of the training support set and a feature representation vector of a sample of the training query set;
obtaining a predefined anchor class characteristic representation vector which is represented by a loss feedback regulation vector, wherein the number of the anchor class characteristic representation vectors is greater than or equal to the sum of the number of classes in a source field and a target field;
obtaining a feature representation vector of each class label in the training support set according to the feature representation vector of the training support set sample, and calculating singular value decomposition on a difference vector between the feature representation vector of each class label in the training support set and the anchor class feature representation vector to obtain a feature mapping function;
and according to the feature mapping function, calculating the similarity between the feature representation vector of the training query set sample and the anchor category feature representation vector to obtain the emission score of the training query set sample.
4. The training method according to claim 3, wherein obtaining the feature representation vector of each class label in the training support set according to the feature representation vector of the training support set sample comprises:
and according to the class label of each word in the training support set sample, acquiring the feature expression vector of the corresponding word in each training support set sample, and calculating the average value of the feature expression vectors of all the words under each class to serve as the feature expression vector of each class label in the training support set.
5. The training method of claim 1, wherein the training query set and the training support set are input to a BERT layer of a BERT-CRF model in a named entity recognition model, and obtaining transition scores for samples of the training query set comprises:
and acquiring a label category transfer matrix according to the label category in the training query set sample, adjusting the label category transfer matrix according to the loss return in the training, and acquiring the transfer score of the training query set sample according to the label category transfer matrix.
6. The training method of claim 2, wherein prior to the obtaining of the training set and the test set, the training method further comprises:
and acquiring labeled data sets of a plurality of source fields of similar fields of the target field according to the target field, wherein the labeled data set of a single field is a batch of meta-training data, and the number of various types of labels contained in each batch of meta-training data is balanced, wherein the labeled data sets of the source fields are training sets, and the labeled data set of the target field is a test set.
7. The training method according to claim 6, wherein after equalizing the number of the types of labels included in each batch of the meta-training data, the training method further comprises:
with the serial number in every word or sub-word correspondence model vocabulary, and the serial number of the label that every word or sub-word correspond in all labels in the field of belongings preserves in the field data, will sample in the meta-training data is in serial number among the field data is preserved in the metadata serial number, every batch meta-training data passes through the field data with the metadata serial number loads, wherein, every batch the mark data set of meta-training data includes a plurality of samples, and every sample includes a plurality of words or sub-words, and every word or sub-word all has the label.
8. A method for named entity recognition, wherein the method uses the named entity recognition model obtained by the named entity recognition model training method of any one of claim 1 to claim 7 to perform named entity recognition, and the method comprises:
acquiring a prediction support set and a prediction query set of a target field, wherein the prediction support set is a labeled data set, and the prediction query set is an unlabeled data set;
and inputting the prediction support set and the prediction query set into a named entity recognition model to obtain a prediction category label of the prediction query set sample.
9. An electronic device comprising a memory and a processor, wherein the memory has stored therein a computer program, and wherein the processor is arranged to execute the computer program to perform the steps of the method according to any of claims 1 to 8.
10. A storage medium, in which a computer program is stored, wherein the computer program is arranged to perform the steps of the method of any one of claims 1 to 8 when executed.
CN202110349239.XA 2021-03-31 2021-03-31 Training method of named entity recognition model and named entity recognition method Pending CN113177411A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110349239.XA CN113177411A (en) 2021-03-31 2021-03-31 Training method of named entity recognition model and named entity recognition method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110349239.XA CN113177411A (en) 2021-03-31 2021-03-31 Training method of named entity recognition model and named entity recognition method

Publications (1)

Publication Number Publication Date
CN113177411A true CN113177411A (en) 2021-07-27

Family

ID=76922852

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110349239.XA Pending CN113177411A (en) 2021-03-31 2021-03-31 Training method of named entity recognition model and named entity recognition method

Country Status (1)

Country Link
CN (1) CN113177411A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113554173A (en) * 2021-08-09 2021-10-26 上海明略人工智能(集团)有限公司 Domain knowledge labeling method, system, electronic device and medium
CN116205235A (en) * 2023-05-05 2023-06-02 北京脉络洞察科技有限公司 Data set dividing method and device and electronic equipment
CN116432656A (en) * 2023-06-13 2023-07-14 河海大学 Small sample named entity identification method for dam emergency response

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113554173A (en) * 2021-08-09 2021-10-26 上海明略人工智能(集团)有限公司 Domain knowledge labeling method, system, electronic device and medium
CN116205235A (en) * 2023-05-05 2023-06-02 北京脉络洞察科技有限公司 Data set dividing method and device and electronic equipment
CN116432656A (en) * 2023-06-13 2023-07-14 河海大学 Small sample named entity identification method for dam emergency response
CN116432656B (en) * 2023-06-13 2023-08-29 河海大学 Small sample named entity identification method for dam emergency response

Similar Documents

Publication Publication Date Title
US10528667B2 (en) Artificial intelligence based method and apparatus for generating information
US11151177B2 (en) Search method and apparatus based on artificial intelligence
US10606949B2 (en) Artificial intelligence based method and apparatus for checking text
CN113177411A (en) Training method of named entity recognition model and named entity recognition method
CN111553164A (en) Training method and device for named entity recognition model and computer equipment
CN108959271B (en) Description text generation method and device, computer equipment and readable storage medium
CN113536735B (en) Text marking method, system and storage medium based on keywords
CN113569135B (en) Recommendation method, device, computer equipment and storage medium based on user portrait
CN112380837B (en) Similar sentence matching method, device, equipment and medium based on translation model
CN111259113B (en) Text matching method, text matching device, computer readable storage medium and computer equipment
CN112052684A (en) Named entity identification method, device, equipment and storage medium for power metering
CN114443850B (en) Label generation method, system, device and medium based on semantic similar model
US11238050B2 (en) Method and apparatus for determining response for user input data, and medium
CN110598210B (en) Entity recognition model training, entity recognition method, entity recognition device, entity recognition equipment and medium
CN110968725B (en) Image content description information generation method, electronic device and storage medium
CN114026556A (en) Semantic element prediction method, computer device and storage medium background
CN113449489B (en) Punctuation mark labeling method, punctuation mark labeling device, computer equipment and storage medium
CN113191152B (en) Entity identification method and system based on entity extension
CN112100377A (en) Text classification method and device, computer equipment and storage medium
CN115146068B (en) Method, device, equipment and storage medium for extracting relation triples
CN113947086A (en) Sample data generation method, training method, corpus generation method and apparatus
CN113591469A (en) Text enhancement method and system based on word interpretation
CN114492437A (en) Keyword recognition method and device, electronic equipment and storage medium
CN114638229A (en) Entity identification method, device, medium and equipment of record data
CN114328894A (en) Document processing method, document processing device, electronic equipment and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination