CN109977402B - Named entity identification method and system - Google Patents

Named entity identification method and system Download PDF

Info

Publication number
CN109977402B
CN109977402B CN201910202512.9A CN201910202512A CN109977402B CN 109977402 B CN109977402 B CN 109977402B CN 201910202512 A CN201910202512 A CN 201910202512A CN 109977402 B CN109977402 B CN 109977402B
Authority
CN
China
Prior art keywords
text
character
processed
information
named entity
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910202512.9A
Other languages
Chinese (zh)
Other versions
CN109977402A (en
Inventor
张金贺
徐安华
欧阳佑
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Mininglamp Software System Co ltd
Original Assignee
Beijing Mininglamp Software System Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Mininglamp Software System Co ltd filed Critical Beijing Mininglamp Software System Co ltd
Priority to CN201910202512.9A priority Critical patent/CN109977402B/en
Publication of CN109977402A publication Critical patent/CN109977402A/en
Application granted granted Critical
Publication of CN109977402B publication Critical patent/CN109977402B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Machine Translation (AREA)
  • Document Processing Apparatus (AREA)

Abstract

The application discloses a named entity identification method and a named entity identification system, wherein the method comprises the following steps: preprocessing a text to be processed to obtain a preprocessing result; obtaining character-level expression information sensitive to context information corresponding to the text to be processed according to the preprocessing result; creating conditional random field CRF decoding units which correspond to different named entity types one by one, wherein each conditional random field CRF decoding unit decodes character-level expression information sensitive to the context information respectively to generate a label sequence corresponding to each named entity type; and extracting corresponding named entities according to the label sequences respectively. The method and the device solve the problem of low efficiency in the overlapped named entity identification scheme in the prior art, reduce redundant information through a sharing mechanism, reduce inference time, enable different types of entities to be mutually assisted during identification, and improve the identification effect of single type of entities.

Description

Named entity identification method and system
Technical Field
The present application relates to the field of natural language processing, and in particular, to a named entity recognition method and system.
Background
Natural Language Processing (NLP) is an important direction in the fields of computer science and artificial intelligence, and studies on various theories and methods for realizing effective communication between people and computers using Natural Language. Applications based on natural language processing have begun to affect aspects of people's life and production, such as intelligent question and answer robots, automatic text summarization, and so on. As a basic stone for information extraction, named Entity Recognition (NER) technology is applied in every mature NLP application. Named entity recognition refers to entities identified by names, such as: name of person, place name, organization name, time, etc. Due to the location of the keystone where the NER technique is located, the effect of NER will directly affect the effect of the entire chain of information extraction. A problem to be solved by a NER system is to identify all entities contained in the input text. For example, the text "zhangxiaoming, sunrise at 9 month 27 in 1961 in hong kong in china" contains three entities, zhangxiaoming (name of person), 9 month 27 day in 1961 (time), and hong kong in china (place).
Traditionally, the NER system is mostly implemented based on Conditional Random Fields (CRFs) of a given feature template. The CRF algorithm decodes text by labeling the text with the correct predictive label. Based on the general BIESO label system, taking the text "zhaoming sheng in hong kong of china" as an example, the schematic diagram after labeling the text is shown in fig. 1, wherein labels of three characters included in the named entity "zhaoming" are B _ PER, I _ PER, and E _ PER, respectively.
Recently, people gradually derive more demands on named entity recognition systems in production and life, such as the phenomenon that named entities overlap. As shown in fig. 2, the text "go together to washington, dc" contains overlapping entities "washington, dc" (place) and "washington" (name of a person). Where "washington" has two labels: (1) B _ PER, I _ PER and E _ PER; and (2) B _ LOC, I _ LOC and I _ LOC. However, CRF algorithms based on feature templates can only sequence a label for a text, and are ineffective for such texts containing overlapping entities.
To solve the above problem, one feasible solution is to allocate a separate NER system for each type of entity to enable decoding of a single text sequence into multiple tag sequences. As for the text containing the overlapped named entities as shown in fig. 2, two NER systems can be created, which are respectively responsible for the recognition of the name of a person and the name of a place individually as shown in fig. 3, wherein NER (person name) is responsible for the recognition of the name of a person entity in the text and NER (place name) is responsible for the recognition of the name of a place entity in the text. However, due to the independence between these sub NER systems, knowledge of commonality is difficult to share between subsystems, and there is a high degree of information redundancy across the entire system. Therefore, in practical situations, the solution is inefficient.
How to solve the problem of low efficiency in the overlapping named entity identification scheme in the prior art, and reduce redundant information, thereby improving the identification effect of single-class entities, is a problem to be solved urgently at present.
Disclosure of Invention
The method for identifying the named entities mainly aims to solve the problem that an overlapping named entity identification scheme in the prior art is low in efficiency, redundant information is reduced through a sharing mechanism, inference time is shortened, different types of entities can cooperate with one another during identification, and therefore the identification effect of single entities is improved.
In order to achieve the above object, an embodiment of the present application provides a named entity identification method, including:
preprocessing a text to be processed to obtain a preprocessing result;
obtaining character-level expression information which is sensitive to context information corresponding to the text to be processed according to the preprocessing result;
creating conditional random field CRF decoding units which correspond to different named entity types one by one, wherein each conditional random field CRF decoding unit decodes the character-level expression information sensitive to the context information respectively to generate a label sequence corresponding to each named entity type;
and extracting corresponding named entities according to the label sequences respectively.
Optionally, wherein the type of the preprocessing result includes: and corresponding to the character set of the text to be processed, performing word collection after word segmentation on the text to be processed, and performing sentence segmentation on the text to be processed to obtain a sentence set and a part of speech set corresponding to the word collection.
Optionally, the obtaining, according to the preprocessing result, character-level expression information sensitive to context information corresponding to the text to be processed includes:
constructing feature information corresponding to the type according to the type of the preprocessing result;
and processing the characteristic information to obtain character-level expression information sensitive to the context information of the text to be processed.
Optionally, wherein the feature information includes: character coding information corresponding to the character set, word segmentation boundary information corresponding to the word set, sentence boundary distance information corresponding to the sentence subset, and part of speech feature information corresponding to the part of speech set.
Optionally, the processing the feature information to obtain character-level expression information sensitive to context information corresponding to the text to be processed includes:
and scanning the characteristic information from a forward dimension and a reverse dimension by using a bidirectional long-time and short-time memory cyclic neural network to construct character-level expression information sensitive to the context information of the text to be processed.
An embodiment of the present application further provides a named entity recognition system, including:
the text preprocessing module is used for preprocessing the text to be processed to obtain a preprocessing result;
the encoding module is used for obtaining character-level expression information which is sensitive to the context information corresponding to the text to be processed according to the preprocessing result;
the multitask CRF decoding module is arranged for creating conditional random field CRF decoding units which correspond to different named entity types one by one, and each conditional random field CRF decoding unit decodes the character-level expression information sensitive to the context information to generate a label sequence corresponding to each named entity type;
and the output integration module is arranged for extracting corresponding named entities according to the label sequences respectively.
Optionally, wherein the type of the preprocessing result includes: and corresponding to the character set of the text to be processed, performing word aggregation after word segmentation on the text to be processed, and performing sentence segmentation on the text to be processed and word part set corresponding to the word aggregation.
Optionally, the encoding module is specifically configured to:
the characteristic extraction module is used for constructing characteristic information corresponding to the type according to the type of the preprocessing result;
and the context expression construction module is configured to process the characteristic information to obtain character-level expression information sensitive to the context information corresponding to the text to be processed.
Optionally, wherein the feature information includes: character coding information corresponding to the character set, word segmentation boundary information corresponding to the word set, sentence boundary distance information corresponding to the sentence set and part of speech characteristic information corresponding to the part of speech set.
Optionally, the context expression building module is specifically configured to:
and scanning the characteristic information from two dimensions of forward and reverse by using a bidirectional long-time and short-time memory cyclic neural network to construct character-level expression information sensitive to the context information corresponding to the text to be processed.
The technical scheme provided by the application comprises the following steps: preprocessing a text to be processed to obtain a preprocessing result; obtaining character-level expression information sensitive to context information corresponding to the text to be processed according to the preprocessing result; creating conditional random field CRF decoding units which correspond to different named entity types one by one, wherein each conditional random field CRF decoding unit decodes character-level expression information sensitive to the context information to generate a label sequence corresponding to each named entity type; and extracting corresponding named entities according to the label sequences respectively.
The application provides a named entity recognition system based on a multitask learning mechanism to solve the problem of low efficiency in an overlapped named entity recognition scheme in the prior art, redundant information is reduced through a sharing mechanism, inference time is reduced, different types of entities can be mutually assisted during recognition, and therefore the recognition effect of single entities is improved.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application in a non-limiting sense. In the drawings:
FIG. 1 is a diagram illustrating a CRF decoding tag sequence in the prior art;
FIG. 2 is a diagram illustrating a tag sequence when exemplary text contains overlapping entities in the prior art;
FIG. 3 is a diagram of a prior art set of independent NER systems;
FIG. 4 is a schematic diagram of a multitasking learning system;
FIG. 5 is a schematic diagram of a named entity recognition system based on multitask learning according to the present application;
fig. 6 is a flowchart of a named entity recognition method according to embodiment 1 of the present application;
fig. 7 is a diagram showing a structure of a named entity recognition system according to embodiment 2 of the present application;
the implementation, functional features and advantages of the objectives of the present application will be further described with reference to the accompanying drawings.
Detailed Description
The present application will be described in detail below with reference to the accompanying drawings in conjunction with embodiments. It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict.
The multi-task learning mechanism combines a plurality of subtasks for learning, can mine and utilize common knowledge of different subtasks, and can learn to obtain special knowledge of the subtasks at the same time. The multi-task learning mechanism is widely applied to many fields of machine learning, such as images (semantic segmentation + depth prediction), heterogeneous text classification, and the like. Compared with the strategy of learning each subtask independently, the mechanism of multi-task joint learning enables different subtasks to assist each other to obtain better effect. Fig. 4 is a schematic diagram of a multitask learning system.
The method and the system for identifying the named entities are designed based on a multitask learning mechanism. By abstracting each type of entity recognition task as a subtask and modeling the named entity recognition system as a multitask learning neural network system with an encoding module shared among the subtasks and a decoding module independent among the subtasks. The multitask CRFs structure in the decoding stage allows a multitask model to learn to obtain the specific knowledge of each type of named entity, and simultaneously reduces redundant information through a sharing mechanism, so that the problem of low efficiency in an overlapped named entity identification scheme in the prior art is solved, and fig. 5 is a schematic diagram of the named entity identification system based on multitask learning.
Fig. 6 is a flowchart of a named entity recognition method according to embodiment 1 of the present application, including the following steps:
step 601: preprocessing a text to be processed to obtain a preprocessing result;
the "text to be processed" in this application may be the input text of the user, and may contain overlapping named entities, such as: the text "family goes to washington, dc" in fig. 2 includes two named entities "washington" and "washington, dc", and both named entities include "washington", that is, in the text, two types of named entities are partially overlapped.
In this step 601, the text to be processed is processed to generate various information that can be used for subsequent multitask model input.
In an exemplary embodiment, a corresponding word/word library may be first constructed based on the data set, and the low frequency words/words may be added to the low frequency word/word library. For the text d to be processed, the preprocessing stage performs word segmentation, sentence segmentation and part-of-speech recognition on the text d, and replaces low-frequency characters appearing in the text with uniform invalid characters.
In an exemplary embodiment, after step 601, a preprocessing result { C, W, S, P } may be obtained according to the text d to be processed, where C, W, S, P respectively represent a character set, a word set, a sentence set, and a part-of-speech set. This information can be integrated and input into subsequent multitask models for named entity recognition.
Step 602: obtaining character-level expression information sensitive to context information corresponding to the text to be processed according to the preprocessing result;
specifically, the step 602 may be implemented by the following specific steps:
step 6021: constructing characteristic information corresponding to the type according to the type of the preprocessing result;
in this step 6021, the textual information from the pre-processing is received and constructed into input features. Four character-level characteristics including characters, word segmentation boundaries, sentence boundary distances and part-of-speech characteristics can be constructed by processing the preprocessed text information. These features are input into the subsequent multitask model after discretization and vectorization. The various features are configured as follows:
character encoding: each character in the text is converted to a corresponding character encoding after passing through the query vocabulary.
Word segmentation boundary: given the word segmentation information of the input text, if (1) the character appears at the head of a certain vocabulary, the word segmentation boundary characteristic is coded as 0; (2) The character appears at the tail of a certain vocabulary, and the character is coded as 1 by the segmentation boundary characteristics; and (3) otherwise, the segmentation boundary feature code is 2.
Sentence boundary distance: given sentence break information of the input text, the sentence boundary distance feature of the character can be defined as log 2 (d 1 ) And log 2 (d 2 ) In which d is 1 ,d 2 The distance between the character and the beginning and the end of the sentence is marked respectively.
The part of speech characteristics: the part-of-speech information of the given input text comprises nouns, verbs, adjectives, pronouns, numerics, quantifiers and the like, and the part-of-speech characteristics of the characters are defined as the part-of-speech codes of the words in which the characters are located.
Step 6022: and processing the characteristic information to obtain character-level expression information which is sensitive to the context information corresponding to the text to be processed.
In this step 6022, a recurrent neural network common to the language model may be employed to capture information of the character context. Specifically, based on the features of four character levels, a bidirectional long-short time memory cyclic neural network is adopted to scan texts from two dimensions of forward and reverse directions, and a character level expression sensitive to context information is constructed.
Step 603: creating conditional random field CRF decoding units which correspond to different named entity types one by one, wherein each conditional random field CRF decoding unit decodes character-level expression information sensitive to the context information respectively to generate a label sequence corresponding to each named entity type;
in this step 603, the application defines the types of named entities to be acquired based on design requirements, and then assigns a conditional random field CRF decoding unit to each type of named entity, all of which form a set { CRF for N types of entities 1 ,CRF 2 ,…,CRF N }. In order to exploit as much knowledge as possible of the commonality between different entity types to improve the effect of individual tasks, these conditional random field CRF decoding units will receive common inputs (context information sensitive character-level representation information).
The context information sensitive character-level representation information from the previous step is subjected to parallel decoding operation in this step. Each conditional random field CRF decoding unit outputs a decoded label sequence S for the text i ={s 1 ,s 2 ,…,s |M| },
Figure BDA0001991585150000072
Step 604: and extracting corresponding named entities according to the label sequences respectively.
In this step, all N tag sequences decoded by different CRF decoding units in the previous step are processed, and then the overlapped set of named entities can be extracted. For example, for the example sentence "home going to Washington, D.C." CRF 1 The tag sequence corresponding to the named entity in the place type is obtained through decoding, and the location' Washington D.C. can be extracted from the decoded tag sequence in the step; CRF 2 The tag sequence corresponding to the named entity in the name type is obtained through decoding, and the tag sequence after decoding can extract the name 'Washington' in the step.
The named entity recognition system is trained through a learner, different from a strategy of alternately training a multi-task model according to subtasks, the named entity recognition system adopts a joint optimization mechanism to carry out joint learning on a multi-task CRFs structure, and the optimization target (loss function) is as follows:
Figure BDA0001991585150000071
wherein, J i (θ) characterizing the loss function of the i-th decoding unit, w i Are weighting factors used to balance different tasks. Considering that different subtasks of the present application are named entity recognition tasks, and the dimension of the loss function corresponding to the subtasks is the same, the present application sets the weighting factor w i =1,
Figure BDA0001991585150000081
Based on the joint optimization target, the parameters in the multi-task CRFs neural network structure can be learned by adopting a back propagation algorithm.
It should be noted that, the present application provides a named entity recognition system based on a multitask learning mechanism to solve the problem of low efficiency in the overlapped named entity recognition scheme in the prior art, and reduces redundant information and inference time through a sharing mechanism, so that mutual assistance can be performed during recognition of different types of entities, thereby improving the recognition effect of single type of entities.
Fig. 7 is a structural diagram of a named entity recognition system in embodiment 2 of the present application, and as shown in fig. 7, the system includes:
the text preprocessing module is used for preprocessing the text to be processed to obtain a preprocessing result;
the encoding module is used for obtaining character-level expression information sensitive to the context information of the text to be processed according to the preprocessing result;
the multitask CRF decoding module is arranged for creating conditional random field CRF decoding units which correspond to different named entity types one by one, and each conditional random field CRF decoding unit decodes the character-level expression information sensitive to the context information to generate a label sequence corresponding to each named entity type;
and the output integration module is set to extract corresponding named entities according to the label sequences respectively.
Wherein the type of the preprocessing result comprises: and corresponding to the character set of the text to be processed, performing word collection after word segmentation on the text to be processed, and performing sentence segmentation on the text to be processed to obtain a sentence set and a part of speech set corresponding to the word collection.
Specifically, the encoding module is specifically configured to:
the characteristic extraction module is used for constructing characteristic information corresponding to the type according to the type of the preprocessing result;
and the context expression construction module is used for processing the characteristic information to obtain character-level expression information sensitive to the context information corresponding to the text to be processed.
Wherein the feature information includes: character coding information corresponding to the character set, word segmentation boundary information corresponding to the word set, sentence boundary distance information corresponding to the sentence subset and part of speech characteristic information corresponding to the part of speech set.
Specifically, the context expression building module is specifically configured to:
and scanning the characteristic information from two dimensions of forward and reverse by using a bidirectional long-time and short-time memory cyclic neural network to construct character-level expression information sensitive to the context information corresponding to the text to be processed.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrases "comprising a component of' 8230; \8230;" does not exclude the presence of additional like elements in a process, method, article, or apparatus that comprises the element.
The above-mentioned serial numbers of the embodiments of the present application are merely for description and do not represent the merits of the embodiments.
Through the description of the foregoing embodiments, it is clear to those skilled in the art that the method of the foregoing embodiments may be implemented by software plus a necessary general hardware platform, and certainly may also be implemented by hardware, but the former is a better embodiment in many cases. Based on such understanding, the technical solutions of the present application may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal (such as a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present application.
The above description is only a preferred embodiment of the present application, and not intended to limit the scope of the present application, and all modifications of equivalent structures and equivalent processes, which are made by the contents of the specification and the drawings of the present application, or which are directly or indirectly applied to other related technical fields, are also included in the scope of the present application.

Claims (10)

1. A named entity recognition method, comprising:
preprocessing a text to be processed to obtain a preprocessing result;
obtaining character-level expression information sensitive to context information corresponding to the text to be processed according to the preprocessing result;
creating conditional random field CRF decoding units corresponding to different named entity types one by one, wherein each conditional random field CRF decoding unit respectively decodes character-level expression information sensitive to the context information, and each conditional random field CRF decoding unit outputs a decoded label sequence for the text
Figure FDA0003799943600000011
And extracting corresponding named entities according to the label sequences, processing all the N label sequences decoded by different CRF decoding units in the previous step, and then extracting a superimposable named entity set.
2. The method of claim 1, wherein the type of the pre-processing result comprises: and corresponding to the character set of the text to be processed, performing word collection after word segmentation on the text to be processed, and performing sentence segmentation on the text to be processed to obtain a sentence set and a part of speech set corresponding to the word collection.
3. The method according to claim 2, wherein the obtaining of the character-level expression information sensitive to the context information corresponding to the text to be processed according to the preprocessing result comprises:
constructing characteristic information corresponding to the type according to the type of the preprocessing result;
and processing the characteristic information to obtain character-level expression information which is sensitive to the context information corresponding to the text to be processed.
4. The method of claim 3, wherein the feature information comprises: character coding information corresponding to the character set, word segmentation boundary information corresponding to the word set, sentence boundary distance information corresponding to the sentence subset, and part of speech characteristic information corresponding to the part of speech set.
5. The method according to claim 4, wherein the processing the feature information to obtain character-level expression information sensitive to context information corresponding to the text to be processed comprises:
and scanning the characteristic information from two dimensions of forward and reverse by using a bidirectional long-time memory cyclic neural network to construct character-level expression information sensitive to the context information of the text to be processed.
6. A named entity recognition system, comprising:
the text preprocessing module is used for preprocessing the text to be processed to obtain a preprocessing result;
the encoding module is arranged to obtain character-level expression information which is sensitive to the context information corresponding to the text to be processed according to the preprocessing result;
the multitask CRF decoding module is arranged for creating conditional random field CRF decoding units corresponding to different named entity types one by one, each conditional random field CRF decoding unit decodes character-level expression information sensitive to the context information, and each conditional random field CRF decoding unit outputs a decoded label sequence for the text
Figure FDA0003799943600000021
And the output integration module is set to extract corresponding named entities according to the label sequences, process all the N label sequences decoded by different CRF decoding units in the previous step and extract a stackable named entity set.
7. The system of claim 6, wherein the type of the pre-processing result comprises: and corresponding to the character set of the text to be processed, performing word collection after word segmentation on the text to be processed, and performing sentence segmentation on the text to be processed to obtain a sentence set and a part of speech set corresponding to the word collection.
8. The system of claim 7, wherein the encoding module is specifically configured to:
the characteristic extraction module is used for constructing characteristic information corresponding to the type according to the type of the preprocessing result;
and the context expression construction module is used for processing the characteristic information to obtain character-level expression information sensitive to the context information corresponding to the text to be processed.
9. The system of claim 8, wherein the feature information comprises: character coding information corresponding to the character set, word segmentation boundary information corresponding to the word set, sentence boundary distance information corresponding to the sentence set and part of speech characteristic information corresponding to the part of speech set.
10. The system of claim 9, wherein the context expression building module is specifically configured to:
and scanning the characteristic information from two dimensions of forward and reverse by using a bidirectional long-time memory cyclic neural network to construct character-level expression information sensitive to the context information of the text to be processed.
CN201910202512.9A 2019-03-11 2019-03-11 Named entity identification method and system Active CN109977402B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910202512.9A CN109977402B (en) 2019-03-11 2019-03-11 Named entity identification method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910202512.9A CN109977402B (en) 2019-03-11 2019-03-11 Named entity identification method and system

Publications (2)

Publication Number Publication Date
CN109977402A CN109977402A (en) 2019-07-05
CN109977402B true CN109977402B (en) 2022-11-11

Family

ID=67079236

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910202512.9A Active CN109977402B (en) 2019-03-11 2019-03-11 Named entity identification method and system

Country Status (1)

Country Link
CN (1) CN109977402B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110598212A (en) * 2019-09-05 2019-12-20 清华大学 Rapid named body identification method
CN110705258A (en) * 2019-09-18 2020-01-17 北京明略软件系统有限公司 Text entity identification method and device
CN111191275A (en) * 2019-11-28 2020-05-22 深圳云安宝科技有限公司 Sensitive data identification method, system and device
CN114240506A (en) * 2021-12-21 2022-03-25 北京有竹居网络技术有限公司 Modeling method of multi-task model, promotion content processing method and related device

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107644014A (en) * 2017-09-25 2018-01-30 南京安链数据科技有限公司 A kind of name entity recognition method based on two-way LSTM and CRF
CN108536679A (en) * 2018-04-13 2018-09-14 腾讯科技(成都)有限公司 Name entity recognition method, device, equipment and computer readable storage medium

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104750687B (en) * 2013-12-25 2018-03-20 株式会社东芝 Improve method and device, machine translation method and the device of bilingualism corpora

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107644014A (en) * 2017-09-25 2018-01-30 南京安链数据科技有限公司 A kind of name entity recognition method based on two-way LSTM and CRF
CN108536679A (en) * 2018-04-13 2018-09-14 腾讯科技(成都)有限公司 Name entity recognition method, device, equipment and computer readable storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于深度学习的主题建模方法研究;朱佳辉;《中国优秀硕士学委论文全文库 信息科技辑》;20170815;I138-587 *

Also Published As

Publication number Publication date
CN109977402A (en) 2019-07-05

Similar Documents

Publication Publication Date Title
CN109977402B (en) Named entity identification method and system
CN109299273B (en) Multi-source multi-label text classification method and system based on improved seq2seq model
US11860684B2 (en) Few-shot named-entity recognition
Yan et al. ConvMath: a convolutional sequence network for mathematical expression recognition
CN116775872A (en) Text processing method and device, electronic equipment and storage medium
CN112188311B (en) Method and apparatus for determining video material of news
CN115796182A (en) Multi-modal named entity recognition method based on entity-level cross-modal interaction
CN109933773A (en) A kind of multiple semantic sentence analysis system and method
CN111428012A (en) Intelligent question-answering method, device, equipment and storage medium based on attention mechanism
CN111694936B (en) Method, device, computer equipment and storage medium for identification of AI intelligent interview
CN116595979A (en) Named entity recognition method, device and medium based on label prompt
CN116341519A (en) Event causal relation extraction method, device and storage medium based on background knowledge
CN115759102A (en) Chinese poetry wine culture named entity recognition method
CN116483314A (en) Automatic intelligent activity diagram generation method
CN112800186B (en) Reading understanding model training method and device and reading understanding method and device
CN115033683A (en) Abstract generation method, device, equipment and storage medium
CN115203388A (en) Machine reading understanding method and device, computer equipment and storage medium
CN114637852A (en) Method, device and equipment for extracting entity relationship of medical text and storage medium
CN114648005A (en) Multi-fragment machine reading understanding method and device for multitask joint learning
CN114417891A (en) Reply sentence determination method and device based on rough semantics and electronic equipment
CN114298052A (en) Entity joint labeling relation extraction method and system based on probability graph
CN113657092A (en) Method, apparatus, device and medium for identifying label
CN112364131A (en) Corpus processing method and related device thereof
CN110705268A (en) Article subject extraction method and device based on artificial intelligence and computer-readable storage medium
CN111783471B (en) Semantic recognition method, device, equipment and storage medium for natural language

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant