CN110851597A - Method and device for sentence annotation based on similar entity replacement - Google Patents

Method and device for sentence annotation based on similar entity replacement Download PDF

Info

Publication number
CN110851597A
CN110851597A CN201911032391.4A CN201911032391A CN110851597A CN 110851597 A CN110851597 A CN 110851597A CN 201911032391 A CN201911032391 A CN 201911032391A CN 110851597 A CN110851597 A CN 110851597A
Authority
CN
China
Prior art keywords
entity
user
sentence
label sequence
layer
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911032391.4A
Other languages
Chinese (zh)
Inventor
胡伟凤
高雪松
陈维强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Qingdao Juhaolian Technology Co Ltd
Original Assignee
Qingdao Juhaolian Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Qingdao Juhaolian Technology Co Ltd filed Critical Qingdao Juhaolian Technology Co Ltd
Priority to CN201911032391.4A priority Critical patent/CN110851597A/en
Publication of CN110851597A publication Critical patent/CN110851597A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Abstract

The invention discloses a sentence labeling method and device based on similar entity replacement, the method comprises the steps of obtaining sentences input by a user, determining an entity label sequence corresponding to the sentences input by the user according to the sentences input by the user and a named entity identification model, determining whether similar entities exist in the entities in the entity label sequence according to the entity labels in the entity label sequence, and if so, generating a new entity label sequence of the sentences according to the similar entities and the entity label sequence. Compared with the existing serial solution of a long-time memory model and a probability distribution prediction model, the entity label sequence identified by the character embedding layer, the first feature learning layer, the second feature learning layer and the probability prediction layer has the advantages that the identification accuracy is remarkably improved, the discovery capability of new words can be effectively improved through the replacement of similar entities, and the data in the training set of the model is expanded.

Description

Method and device for sentence annotation based on similar entity replacement
Technical Field
The embodiment of the invention relates to the technical field of natural language processing, in particular to a method and a device for sentence annotation based on similar entity replacement.
Background
Named entity recognition is a basic task in natural language processing, and lays a foundation for a series of tasks such as downstream entity linking, relation extraction, semantic search, automatic question answering and the like. The industry widely applies a serial solution of a long-time memory model and a probability distribution prediction model, but the model training needs to rely on a large amount of manually labeled data, and particularly in the application of the Chinese vertical field, the industrial application effect of the model completely depends on a large amount of field knowledge for training. In practical application, the expressive force of named entity recognition of the system needs to consider not only accuracy but also recall rate, and in the vertical field, the ability to find new words which do not appear in a training set or appear in the training set frequently needs to be improved urgently.
Disclosure of Invention
The embodiment of the invention provides a sentence marking method and device based on similar entity replacement, which are used for increasing the discovery capability of new words and expanding the data of a training set.
In a first aspect, an embodiment of the present invention provides a method for statement annotation based on homogeneous entity replacement, including:
acquiring a sentence input by a user;
determining an entity tag sequence corresponding to the sentence input by the user according to the sentence input by the user and the named entity recognition model; the named entity recognition model comprises a character embedding layer, a first feature learning layer, a second feature learning layer and a probability prediction layer; the named entity recognition model is obtained by training an entity label sequence training set;
and determining whether the entity in the entity label sequence has the same kind of entity or not according to the entity label in the entity label sequence, and if so, generating a new entity label sequence of the sentence according to the same kind of entity and the entity label sequence.
In the technical scheme, the entity label sequences identified by the character embedding layer, the first feature learning layer, the second feature learning layer and the probability prediction layer are compared with the existing LSTM + CRF model, the identification accuracy is remarkably improved, the discovery capability of new words can be effectively improved through the replacement of similar entities, and the data in the training set of the model is expanded.
Optionally, the determining, according to the sentence input by the user and the named entity recognition model, an entity tag sequence corresponding to the sentence input by the user includes:
converting the sentence input by the user into a first embedded space vector through the character embedding layer;
inputting the first embedded space vector to the first feature learning layer, and extracting a first feature of the sentence input by the user;
inputting the first features of the sentences input by the user to the second feature learning layer, and extracting the second features of the sentences input by the user;
and inputting the second characteristic of the sentence input by the user to the probability prediction layer to obtain an entity tag sequence corresponding to the sentence input by the user.
Optionally, the generating an entity tag sequence of a new sentence according to the homogeneous entity and the entity tag sequence includes:
and replacing the entity with the same entity label in the entity label sequence with the same entity as the entity label sequence of the new statement.
Optionally, after generating the entity tag sequence of the new sentence, the method further includes:
and putting the entity label sequence of the new sentence into the entity label sequence training set, and retraining the named entity recognition model.
In a second aspect, an embodiment of the present invention provides a device for statement annotation based on homogeneous entity replacement, including:
the acquiring unit is used for acquiring the sentence input by the user;
the processing unit is used for determining an entity tag sequence corresponding to the sentence input by the user according to the sentence input by the user and the named entity recognition model; the named entity recognition model comprises a character embedding layer, a first feature learning layer, a second feature learning layer and a probability prediction layer; the named entity recognition model is obtained by training an entity label sequence training set; and determining whether the entity in the entity label sequence has a homogeneous entity or not according to the entity label in the entity label sequence, if so, generating a new entity label sequence of the sentence according to the homogeneous entity and the entity label sequence.
Optionally, the processing unit is specifically configured to:
converting the sentence input by the user into a first embedded space vector through the character embedding layer;
inputting the first embedded space vector to the first feature learning layer, and extracting a first feature of the sentence input by the user;
inputting the first features of the sentences input by the user to the second feature learning layer, and extracting the second features of the sentences input by the user;
and inputting the second characteristic of the sentence input by the user to the probability prediction layer to obtain an entity tag sequence corresponding to the sentence input by the user.
Optionally, the processing unit is specifically configured to:
and replacing the entity with the same entity label in the entity label sequence with the same entity as the entity label sequence of the new statement.
Optionally, the processing unit is further configured to:
after generating the entity label sequence of the new sentence, putting the entity label sequence of the new sentence into the entity label sequence training set, and retraining the named entity recognition model.
In a third aspect, an embodiment of the present invention further provides a computing device, including:
a memory for storing program instructions;
and the processor is used for calling the program instruction stored in the memory and executing the statement marking method based on the similar entity replacement according to the obtained program.
In a fourth aspect, an embodiment of the present invention further provides a computer-readable non-volatile storage medium, which includes computer-readable instructions, and when the computer reads and executes the computer-readable instructions, the computer is caused to execute the above method for statement annotation based on entity-of-same-class replacement.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
Fig. 1 is a schematic diagram of a system architecture according to an embodiment of the present invention;
fig. 2 is a schematic flowchart of a method for sentence annotation based on homogeneous entity replacement according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of a word segmentation and named entity recognition annotation provided in an embodiment of the present invention;
FIG. 4 is a diagram illustrating a named entity recognition model according to an embodiment of the present invention;
FIG. 5 is a diagram illustrating a new word discovery according to an embodiment of the present invention;
FIG. 6 is a schematic diagram of experimental results provided by an embodiment of the present invention;
fig. 7 is a schematic structural diagram of a device for sentence annotation based on homogeneous entity replacement according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention clearer, the present invention will be described in further detail with reference to the accompanying drawings, and it is apparent that the described embodiments are only a part of the embodiments of the present invention, not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Fig. 1 illustrates an exemplary system architecture to which embodiments of the present invention may be applied, which may be a server 100, where the server 100 may include a processor 110, a communication interface 120, and a memory 130.
The communication interface 120 is used for the smart device to perform communication, receive and transmit information transmitted by the smart device, and implement communication.
The processor 110 is a control center of the server 100, connects various parts of the entire server 100 using various interfaces and routes, performs various functions of the server 100 and processes data by operating or executing software programs and/or modules stored in the memory 130 and calling data stored in the memory 130. Alternatively, processor 110 may include one or more processing units.
The memory 130 may be used to store software programs and modules, and the processor 110 executes various functional applications and data processing by operating the software programs and modules stored in the memory 130. The memory 130 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function, and the like; the storage data area may store data created according to a business process, and the like. Further, the memory 130 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device.
It should be noted that the structure shown in fig. 1 is only an example, and the embodiment of the present invention is not limited thereto.
Based on the above description, fig. 2 shows in detail a flow of a method for statement annotation based on homogeneous entity replacement according to an embodiment of the present invention, where the flow may be performed by a device for statement annotation based on homogeneous entity replacement, and the device may be located in the server 100 shown in fig. 1, or may be the server 100.
As shown in fig. 2, the process specifically includes:
step 201, obtaining a sentence input by a user.
In the embodiment of the invention, the recognition of the participle and the named entity is a basic task in natural language processing, lays a foundation for a series of tasks such as downstream entity linking, relation extraction, semantic search and automatic question and answer, and both the participle and the named entity recognition can be used as sequence tagging problems to be solved (for example, a BIO (B-begin; I-inside, inside; O-outside, outside) tagging set), wherein the input and output of the participle and the named entity recognition can be shown in FIG. 3, and as can be seen from FIG. 3, the input sentence of ' I's air conditioner is suddenly uncooled at present ', and the entity tag sequence output by the participle model is different from the entity tag sequence output by the named entity recognition model.
Step 202, determining an entity tag sequence corresponding to the sentence input by the user according to the sentence input by the user and the named entity recognition model.
In the embodiment of the present invention, the named entity recognition model may include a character embedding layer, a first feature learning layer, a second feature learning layer, and a probability prediction layer, and is obtained by training an entity tag sequence training set. In a specific implementation process, the first feature layer may be a CNN (Convolutional Neural Networks) layer, and may also be referred to as a short-distance feature learning layer. The second feature layer may be a Bi-directional Long Short-Term Memory (Bi-LSTM) layer, and may also be referred to as a Long distance feature learning layer. The probabilistic prediction layer may be a CRF (Conditional Random Field) layer, wherein the structure of the model may be as shown in fig. 4.
Specifically, when an entity tag sequence corresponding to a sentence input by a user is obtained, the sentence input by the user can be converted into a first embedded space vector through a character embedding layer, then the first embedded space vector is input into a first feature learning layer, a first feature of the sentence input by the user is extracted, the first feature of the sentence input by the user is input into a second feature learning layer, a second feature of the sentence input by the user is extracted, and finally the second feature of the sentence input by the user is input into a probability prediction layer, so that the entity tag sequence corresponding to the sentence input by the user is obtained.
Based on the named entity recognition model shown in fig. 4, when the named entity recognition is performed, the input chinese character sequence (sentence) can be converted into an embedded space vector at the character embedding layer:
Figure BDA0002250522800000061
wherein, wiFor one-hot vector representation of each character, V is the size of the dictionary space, N is the input sequence length, and D is the embedding dimension size.
Then, through a CNN feature extraction layer: local features of the text are extracted as input to the Bi-LSTM layer.
Specifically, c ═ c1,c2,…cN],ci∈RM
Figure BDA0002250522800000062
Wherein the content of the first and second substances,
Figure BDA0002250522800000063
represents from
Figure BDA0002250522800000064
To
Figure BDA0002250522800000065
F is an activation function RELU, M is the number of filters, w belongs to RKDIs a filter of CNN, K is the window size, and the context association information of each ci is a link of the values of all window filters at the current location.
The Bi-LSTM layer can be used for extracting medium-long distance relevant information at two sides of the text, finally the CRF layer decodes the relevant information, the characteristics extracted by the Bi-LSTM layer are used as input, and the label of each element in the sequence is calculated. I.e. for a given input h ═ h1,h2,…,hN]Calculating its output label y ═ y1,y2,…,yN],yi∈RLAnd representing the one-hot value of the ith character sequence, wherein L is the size of the label space. In the probabilistic model CRF, for a given input hThe conditional probability of the output sequence y is:
Figure BDA0002250522800000071
wherein Y(s) is the set of all possible tag sequences for all input sequences s,
Figure BDA0002250522800000072
W∈R2S×L,T∈RL×Lfor the parameter denoted by θ, { W, T }.
In a CRF layer, the loss function may be:
LNER=-∑s∈Slog(p(ys|hs;θ));
wherein S is a training set statement set, ys、hsRespectively, the hidden layer and the tag sequence of statement S.
Under the condition that only a small amount of labeled data exist in a training set, the named entity recognition model provided by the embodiment of the invention can improve the accuracy and the recall rate of the named entity recognition method.
Step 203, determining whether there is a homogeneous entity in the entity tag sequence according to the entity tag in the entity tag sequence, and if so, generating an entity tag sequence of a new sentence according to the homogeneous entity and the entity tag sequence.
Specifically, entities with the same entity tag in the entity tag sequence may be replaced with similar entities, and used as the entity tag sequence of the new sentence.
By the automatic labeling data construction method based on the similar entity replacement, more pseudo labeling samples are constructed from a small amount of existing labeling data, the generalization capability of the model to new words which do not appear in a training set or have low appearance frequency is remarkably improved, and if a certain entity name in a sentence is replaced by another entity of the same type, the new sentence is still correct in syntax and semantics.
Therefore, the entity identification tag sequence of the known sentence is available, and the entity tag sequence of the new sentence can be generated. As shown in detail in fig. 5.
In addition, after the entity label sequence of the new sentence is generated, the entity label sequence of the new sentence can be put into the entity label sequence training set, and the named entity recognition model is retrained, so that the accuracy and the recall rate of the named entity recognition training are improved.
Experiments show that the automatic labeling data and the manual labeling data are combined in a ratio of 50% + 50% to generate a training set of the model, and experimental results shown in fig. 6 show that the model added with the automatic labeling data can obviously improve the accuracy rate, the recall rate and the new word discovery capability of the model. Particularly, when the training data is less, the model expressive force advantage is more obvious.
The embodiment shows that a sentence input by a user is obtained, an entity tag sequence corresponding to the sentence input by the user is determined according to the sentence input by the user and a named entity recognition model, the named entity recognition model comprises a character embedding layer, a first feature learning layer, a second feature learning layer and a probability prediction layer, the named entity recognition model is obtained by training an entity tag sequence training set, whether similar entities exist in entities in the entity tag sequence is determined according to entity tags in the entity tag sequence, and if so, a new entity tag sequence of the sentence is generated according to the similar entities and the entity tag sequence. Compared with the existing serial solution of a long-time memory model and a probability distribution prediction model, the entity label sequence identified by the character embedding layer, the first feature learning layer, the second feature learning layer and the probability prediction layer has the advantages that the identification accuracy is remarkably improved, the discovery capability of new words can be effectively improved through the replacement of similar entities, and the data in the training set of the model is expanded.
Based on the same technical concept, fig. 7 exemplarily shows a structure of an apparatus for sentence tagging based on homogeneous entity replacement, which can perform a flow of sentence tagging based on homogeneous entity replacement and is located in the server 100 shown in fig. 1, or in the server 100.
As shown in fig. 7, the apparatus specifically includes:
an obtaining unit 701, configured to obtain a sentence input by a user;
a processing unit 702, configured to determine, according to the sentence input by the user and the named entity identification model, an entity tag sequence corresponding to the sentence input by the user; the named entity recognition model comprises a character embedding layer, a first feature learning layer, a second feature learning layer and a probability prediction layer; the named entity recognition model is obtained by training an entity label sequence training set; and determining whether the entity in the entity label sequence has a homogeneous entity or not according to the entity label in the entity label sequence, if so, generating a new entity label sequence of the sentence according to the homogeneous entity and the entity label sequence.
Optionally, the processing unit 702 is specifically configured to:
converting the sentence input by the user into a first embedded space vector through the character embedding layer;
inputting the first embedded space vector to the first feature learning layer, and extracting a first feature of the sentence input by the user;
inputting the first features of the sentences input by the user to the second feature learning layer, and extracting the second features of the sentences input by the user;
and inputting the second characteristic of the sentence input by the user to the probability prediction layer to obtain an entity tag sequence corresponding to the sentence input by the user.
Optionally, the processing unit 702 is specifically configured to:
and replacing the entity with the same entity label in the entity label sequence with the same entity as the entity label sequence of the new statement.
Optionally, the processing unit 702 is further configured to:
after generating the entity label sequence of the new sentence, putting the entity label sequence of the new sentence into the entity label sequence training set, and retraining the named entity recognition model.
Based on the same technical concept, an embodiment of the present invention further provides a computing device, including:
a memory for storing program instructions;
and the processor is used for calling the program instructions stored in the memory and executing the statement marking method based on the similar entity replacement according to the obtained program.
Based on the same technical concept, the embodiment of the present invention further provides a computer-readable non-volatile storage medium, which includes computer-readable instructions, and when the computer reads and executes the computer-readable instructions, the computer is enabled to execute the above statement annotation method based on entity-of-the-same-class replacement.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all such alterations and modifications as fall within the scope of the invention.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims (10)

1. A method for sentence annotation based on homogeneous entity replacement is characterized by comprising the following steps:
acquiring a sentence input by a user;
determining an entity tag sequence corresponding to the sentence input by the user according to the sentence input by the user and the named entity recognition model; the named entity recognition model comprises a character embedding layer, a first feature learning layer, a second feature learning layer and a probability prediction layer; the named entity recognition model is obtained by training an entity label sequence training set;
and determining whether the entity in the entity label sequence has the same kind of entity or not according to the entity label in the entity label sequence, and if so, generating a new entity label sequence of the sentence according to the same kind of entity and the entity label sequence.
2. The method of claim 1, wherein determining the entity tag sequence corresponding to the sentence input by the user according to the sentence input by the user and the named entity recognition model comprises:
converting the sentence input by the user into a first embedded space vector through the character embedding layer;
inputting the first embedded space vector to the first feature learning layer, and extracting a first feature of the sentence input by the user;
inputting the first features of the sentences input by the user to the second feature learning layer, and extracting the second features of the sentences input by the user;
and inputting the second characteristic of the sentence input by the user to the probability prediction layer to obtain an entity tag sequence corresponding to the sentence input by the user.
3. The method of claim 1, wherein said generating an entity tag sequence for a new sentence from said homogeneous entity and said entity tag sequence comprises:
and replacing the entity with the same entity label in the entity label sequence with the same entity as the entity label sequence of the new statement.
4. The method of any of claims 1 to 3, further comprising, after generating the entity tag sequence of the new sentence:
and putting the entity label sequence of the new sentence into the entity label sequence training set, and retraining the named entity recognition model.
5. An apparatus for sentence annotation based on homogeneous entity replacement, comprising:
the acquiring unit is used for acquiring the sentence input by the user;
the processing unit is used for determining an entity tag sequence corresponding to the sentence input by the user according to the sentence input by the user and the named entity recognition model; the named entity recognition model comprises a character embedding layer, a first feature learning layer, a second feature learning layer and a probability prediction layer; the named entity recognition model is obtained by training an entity label sequence training set; and determining whether the entity in the entity label sequence has a homogeneous entity or not according to the entity label in the entity label sequence, if so, generating a new entity label sequence of the sentence according to the homogeneous entity and the entity label sequence.
6. The apparatus as claimed in claim 5, wherein said processing unit is specifically configured to:
converting the sentence input by the user into a first embedded space vector through the character embedding layer;
inputting the first embedded space vector to the first feature learning layer, and extracting a first feature of the sentence input by the user;
inputting the first features of the sentences input by the user to the second feature learning layer, and extracting the second features of the sentences input by the user;
and inputting the second characteristic of the sentence input by the user to the probability prediction layer to obtain an entity tag sequence corresponding to the sentence input by the user.
7. The apparatus as claimed in claim 5, wherein said processing unit is specifically configured to:
and replacing the entity with the same entity label in the entity label sequence with the same entity as the entity label sequence of the new statement.
8. The apparatus of any of claims 5 to 7, wherein the processing unit is further to:
after generating the entity label sequence of the new sentence, putting the entity label sequence of the new sentence into the entity label sequence training set, and retraining the named entity recognition model.
9. A computing device, comprising:
a memory for storing program instructions;
a processor for calling program instructions stored in said memory to execute the method of any one of claims 1 to 4 in accordance with the obtained program.
10. A computer-readable non-transitory storage medium including computer-readable instructions which, when read and executed by a computer, cause the computer to perform the method of any one of claims 1 to 4.
CN201911032391.4A 2019-10-28 2019-10-28 Method and device for sentence annotation based on similar entity replacement Pending CN110851597A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911032391.4A CN110851597A (en) 2019-10-28 2019-10-28 Method and device for sentence annotation based on similar entity replacement

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911032391.4A CN110851597A (en) 2019-10-28 2019-10-28 Method and device for sentence annotation based on similar entity replacement

Publications (1)

Publication Number Publication Date
CN110851597A true CN110851597A (en) 2020-02-28

Family

ID=69598506

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911032391.4A Pending CN110851597A (en) 2019-10-28 2019-10-28 Method and device for sentence annotation based on similar entity replacement

Country Status (1)

Country Link
CN (1) CN110851597A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112766485A (en) * 2020-12-31 2021-05-07 平安科技(深圳)有限公司 Training method, device, equipment and medium for named entity model
CN114610852A (en) * 2022-05-10 2022-06-10 天津大学 Course learning-based fine-grained Chinese syntax analysis method and device

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105550227A (en) * 2015-12-07 2016-05-04 中国建设银行股份有限公司 Named entity identification method and device
CN106569998A (en) * 2016-10-27 2017-04-19 浙江大学 Text named entity recognition method based on Bi-LSTM, CNN and CRF
US20180365211A1 (en) * 2015-12-11 2018-12-20 Beijing Gridsum Technology Co., Ltd. Method and Device for Recognizing Domain Named Entity
CN109992773A (en) * 2019-03-20 2019-07-09 华南理工大学 Term vector training method, system, equipment and medium based on multi-task learning
KR20190103951A (en) * 2019-02-14 2019-09-05 주식회사 머니브레인 Method, computer device and computer readable recording medium for building or updating knowledgebase models for interactive ai agent systen, by labeling identifiable but not-learnable data in training data set
CN110263338A (en) * 2019-06-18 2019-09-20 北京明略软件系统有限公司 Replace entity name method, apparatus, storage medium and electronic device
CN110276075A (en) * 2019-06-21 2019-09-24 腾讯科技(深圳)有限公司 Model training method, name entity recognition method, device, equipment and medium

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105550227A (en) * 2015-12-07 2016-05-04 中国建设银行股份有限公司 Named entity identification method and device
US20180365211A1 (en) * 2015-12-11 2018-12-20 Beijing Gridsum Technology Co., Ltd. Method and Device for Recognizing Domain Named Entity
CN106569998A (en) * 2016-10-27 2017-04-19 浙江大学 Text named entity recognition method based on Bi-LSTM, CNN and CRF
KR20190103951A (en) * 2019-02-14 2019-09-05 주식회사 머니브레인 Method, computer device and computer readable recording medium for building or updating knowledgebase models for interactive ai agent systen, by labeling identifiable but not-learnable data in training data set
CN109992773A (en) * 2019-03-20 2019-07-09 华南理工大学 Term vector training method, system, equipment and medium based on multi-task learning
CN110263338A (en) * 2019-06-18 2019-09-20 北京明略软件系统有限公司 Replace entity name method, apparatus, storage medium and electronic device
CN110276075A (en) * 2019-06-21 2019-09-24 腾讯科技(深圳)有限公司 Model training method, name entity recognition method, device, equipment and medium

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112766485A (en) * 2020-12-31 2021-05-07 平安科技(深圳)有限公司 Training method, device, equipment and medium for named entity model
CN112766485B (en) * 2020-12-31 2023-10-24 平安科技(深圳)有限公司 Named entity model training method, device, equipment and medium
CN114610852A (en) * 2022-05-10 2022-06-10 天津大学 Course learning-based fine-grained Chinese syntax analysis method and device

Similar Documents

Publication Publication Date Title
CN112685565B (en) Text classification method based on multi-mode information fusion and related equipment thereof
CN109034203B (en) Method, device, equipment and medium for training expression recommendation model and recommending expression
CN111858843B (en) Text classification method and device
CN111985229A (en) Sequence labeling method and device and computer equipment
CN111046656A (en) Text processing method and device, electronic equipment and readable storage medium
CN114596566B (en) Text recognition method and related device
CN111291566A (en) Event subject identification method and device and storage medium
CN110597966A (en) Automatic question answering method and device
CN108205524B (en) Text data processing method and device
CN114757176A (en) Method for obtaining target intention recognition model and intention recognition method
CN112613306A (en) Method, device, electronic equipment and storage medium for extracting entity relationship
CN109783801B (en) Electronic device, multi-label classification method and storage medium
CN113221555A (en) Keyword identification method, device and equipment based on multitask model
CN111858898A (en) Text processing method and device based on artificial intelligence and electronic equipment
CN112328761A (en) Intention label setting method and device, computer equipment and storage medium
CN111742322A (en) System and method for domain and language independent definition extraction using deep neural networks
CN111563380A (en) Named entity identification method and device
CN110851597A (en) Method and device for sentence annotation based on similar entity replacement
CN112667803A (en) Text emotion classification method and device
CN115129862A (en) Statement entity processing method and device, computer equipment and storage medium
CN110852103A (en) Named entity identification method and device
CN112188311B (en) Method and apparatus for determining video material of news
CN112307749A (en) Text error detection method and device, computer equipment and storage medium
CN110888983A (en) Positive and negative emotion analysis method, terminal device and storage medium
CN109558580B (en) Text analysis method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20200228

RJ01 Rejection of invention patent application after publication