CN111737951A - Text language incidence relation labeling method and device - Google Patents

Text language incidence relation labeling method and device Download PDF

Info

Publication number
CN111737951A
CN111737951A CN201910212664.7A CN201910212664A CN111737951A CN 111737951 A CN111737951 A CN 111737951A CN 201910212664 A CN201910212664 A CN 201910212664A CN 111737951 A CN111737951 A CN 111737951A
Authority
CN
China
Prior art keywords
training
information extraction
text language
entity
composite
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910212664.7A
Other languages
Chinese (zh)
Other versions
CN111737951B (en
Inventor
韩英
刘迪
王腾蛟
邱镇
陈薇
孟洪民
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Peking University
State Grid Corp of China SGCC
State Grid Information and Telecommunication Co Ltd
State Grid Zhejiang Electric Power Co Ltd
Original Assignee
Peking University
State Grid Corp of China SGCC
State Grid Information and Telecommunication Co Ltd
State Grid Zhejiang Electric Power Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Peking University, State Grid Corp of China SGCC, State Grid Information and Telecommunication Co Ltd, State Grid Zhejiang Electric Power Co Ltd filed Critical Peking University
Priority to CN201910212664.7A priority Critical patent/CN111737951B/en
Publication of CN111737951A publication Critical patent/CN111737951A/en
Application granted granted Critical
Publication of CN111737951B publication Critical patent/CN111737951B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Machine Translation (AREA)

Abstract

The invention discloses a method and a device for labeling incidence relation of a text language. By utilizing the close relevance of each information extraction subtask of the text language, a composite labeling method independent of a specific model is designed, multiple text language information extraction tasks can be naturally fused, and the joint learning and integrated training of the multiple text language association tasks is realized, such as joint learning supporting named entity identification and named entity standardization, joint learning supporting named entity identification and entity relation extraction, joint learning supporting named entity identification and entity disambiguation and the like. The text language association relation composite marking method provided by the invention fully utilizes the close association among the subtasks of the text language information extraction, realizes complete joint learning, enables the information sharing among the associated tasks to be mutually promoted, and improves the accuracy and the recall rate of the text language information extraction as a whole.

Description

Text language incidence relation labeling method and device
Technical Field
The invention belongs to the technical field of information, and relates to a method for assisting information extraction of a text language by using a computer intelligent technology. The method specifically relates to a composite labeling method designed by utilizing the close relevance of each information extraction subtask of the text language to naturally fuse multiple text language information extraction tasks, and realizes the joint learning and integrated training of the multiple text language association tasks, so that the information sharing among the association tasks can be mutually promoted, and the accuracy and the recall rate of the text language information extraction are improved.
Background
Text languages are the main expression forms of natural languages and are important carriers of information. In the current information explosion era, the key to data intelligence is how to extract useful structured information from massive unstructured texts. The information extraction of the text language comprises a plurality of subtasks, such as named entity identification, named entity standardization, entity relation extraction and the like. Close association exists among these subtasks, but the traditional method treats these tasks as independent tasks and performs them separately (Peng Z, Sun L, Han X. SIR-ReDeeM: a chip name recognition and organization system using a two-stage method [ C ]// processes of the second communication IPS-SIGHAN Joint Conference on chip Language processing.2012: 115-120), so that these tasks cannot share and complement information.
Currently, a small percentage of researchers have been paying attention to the relevance between text language information extraction subtasks, and LiuX et al (Liu X, Zhou M, Wei F, et al. Joint involvement of named entity and standardization for things [ C ]// Proceedings of the 50th Annual Meeting of the association for practical linkage: Long Papers-Volume 1.Association for practical linkage, 2012: 526-. The joint learning method based on the probability map is not a neural network architecture, depends on feature engineering, is tedious, time-consuming and difficult to adapt to different linguistic data. Zheng S et al (Zheng S, Hao Y, Lu D, et al. joint entry based on a hybrid network [ J ]. neuro-learning, 2017,257:59-66.) propose a hybrid framework of named entity identification and entity relationship extraction, and this joint learning approach is based on neural network, but it is a not thorough joint learning. In the training stage, the optimization of the related parameters of named entity recognition is firstly carried out, and then the training of entity relation extraction is carried out. This two-stage training approach does not achieve global optimization. How to realize the method does not depend on a specific machine learning and deep learning method, and the method can be used for integrated training, which is a very challenging problem.
Disclosure of Invention
In view of the above problems, the present invention aims to provide a model-independent general joint learning strategy supporting integrated training, which does not depend on a specific model and simultaneously supports multi-task integrated training.
In order to achieve the purpose, the invention adopts the following technical scheme:
a method for labeling incidence relation of text language includes the following steps:
1) determining at least two related information extraction subtasks of the text language according to the requirements of the text language related tasks;
2) analyzing text corpora and defining a tag set of each information extraction subtask;
3) extracting the label sets of the subtasks by combining all the information to form a composite labeling system;
4) and labeling the text corpus according to the composite labeling system.
Further, the information extraction subtask in step 1) may include, but is not limited to, a named entity identification subtask, a named entity normalization subtask, and a named entity relationship extraction subtask.
Further, step 2) defines a separate labeling system corresponding to each text language information extraction subtask on the corpus, and each information extraction subtask corresponds to a label set and comprises a position of a character in an entity and an entity type.
Further, step 3) extracting subtasks from the information with the association relationship, combining the label sets of the information extracting subtasks, optimizing the public part in the labels of the information extracting subtasks, forming a composite labeling system, and realizing the natural fusion of multiple tasks.
A text language association labeling apparatus, comprising:
the subtask determining module is responsible for determining at least two related information extraction subtasks of the text language according to the requirements of the text language related tasks;
the tag set definition module is responsible for analyzing the text corpus and defining the tag set of each information extraction subtask;
the label combination module is responsible for combining label sets of all the information extraction subtasks to form a composite labeling system;
and the marking module is responsible for marking the text corpus according to the composite marking system.
A machine learning model integrated training method supporting multiple tasks comprises the following steps:
(1) labeling the text corpus according to the composite labeling system by adopting the method to obtain a training data set and a test data set;
(2) selecting a specific machine learning (including deep learning) model;
(3) in the prediction stage, a label sequence obtained by predicting the machine learning model according to an input sequence is decoded according to a composite labeling system to obtain a final label prediction result;
(4) and in the training iterative process of the machine learning model, optimizing on a training data set, simultaneously testing on a testing data set, and stopping training when the result on the testing data set is reduced.
Furthermore, a plurality of tasks are completely fused together through the composite labeling system, so that integrated training is realized, and separate training of each task in multiple stages is not required.
Further, the machine learning model is a traditional machine learning model or a deep learning model based on a deep neural network, and the traditional machine learning model comprises a conditional random field, a hidden markov model or other models based on probability maps.
Further, the decoding extracts entity relationships according to a proximity principle.
A multitasking enabled machine learning model integrated training device, comprising:
the data preparation module is responsible for labeling the text corpus according to the composite labeling system by adopting the method to obtain a training data set and a test data set;
the model selection module is responsible for selecting a specific machine learning model;
the decoding module is responsible for decoding a mark sequence obtained by predicting the machine learning model according to an input sequence according to a composite labeling system in a prediction stage to obtain a final label prediction result;
and the training module is responsible for optimizing the training data set and testing the testing data set in the training iterative process of the machine learning model, and stops training when the result on the testing data set is reduced.
The invention provides a universal joint learning strategy with independent models and supporting integrated training, which does not depend on specific models, supports the traditional machine learning based on statistics, also supports the deep learning based on a deep neural network, and simultaneously supports the multi-task integrated training, such as joint learning supporting named entity identification and named entity standardization, joint learning supporting named entity identification and entity relation extraction, joint learning supporting named entity identification and entity disambiguation and the like. The invention can naturally integrate a plurality of text language information extraction tasks, realize the joint learning and integrated training of the plurality of text language associated tasks, ensure that the information sharing among the associated tasks can be mutually promoted, and improve the accuracy and the recall rate of the text language information extraction.
The invention is an innovation on a labeling method, does not relate to a specific model, and is suitable for both traditional machine learning and deep learning based on a neural network; a composite labeling system is designed by combining label sets of a plurality of subtasks, so that the natural fusion of multiple tasks is realized; the multiple tasks are completely fused together by the composite labeling system, integrated training can be realized, and separate training of each task in multiple stages is not required.
Drawings
FIG. 1 is a schematic diagram of a text-based language association labeling method according to an embodiment of the present invention. Wherein, the diagram (a) is a composite label of named entity identification and standardization; and (b) composite labeling of named entity identification and relationship extraction.
FIG. 2 is a flow chart of steps for an embodiment of the present invention.
Detailed Description
In order to make the aforementioned objects, features and advantages of the present invention comprehensible, the present invention shall be described in further detail with reference to the following detailed description and accompanying drawings.
Fig. 1 is a schematic diagram of a text language association relation labeling method according to an embodiment of the present invention, where (a) is a composite labeling system of a joint learning framework in which the labeling method is applied to named entity recognition and entity standardization, and (b) is a composite labeling system of a joint learning framework in which the labeling method is applied to named entity recognition and relation extraction. The method for labeling the association relation based on the text language is suitable for the joint learning of subtasks associated with various text languages, and the method is only described by two examples of the joint learning of named entity identification and entity standardization and the joint learning of named entity identification and relation extraction.
The composite label in the diagram (a) of fig. 1 is composed of a tag identified by the named entity and a tag standardized by the named entity, and the style is [ position-entity type-entity standardization symbol ]. Wherein B, I, E, S, O represents the position of the character in the entity, wherein B represents begin, corresponding to the beginning position of the name of the entity (where a word represents a word in Chinese or a word in English); i represents an inter and corresponds to the middle position of the entity name; e represents end, corresponding to the ending position of the entity name; s represents a single, and represents that a corresponding entity only consists of one word; o stands for out and the corresponding character does not belong to a component of the entity name. The "ORG" represents that the type of the entity is an organization class, which can be freely defined according to the task requirement, and the common entity types are "PER" (person name), "LOC" (place name), etc. The name of an entity with only one expression in a document is represented by S, the standard name of the entity with a plurality of expression forms is represented by F, the non-standard name of the entity with a plurality of expression forms, such as short name, alternative name and the like, is represented by A, and F is agreed to be longer than A. In the diagram (a) of fig. 1, "transportation bank" is a standard name, and "delivery bank" is an abbreviation of transportation bank and is a non-standard name of the entity concept of "transportation bank". The 'transportation bank' belongs to an organization. The "intersection" of "transportation bank" is therefore marked as "B-ORG-F", representing the first letter of the standard name of the entity of the agency class.
The composite label in the diagram (b) of fig. 1 is composed of a label identified by the named entity and a label extracted from the relationship between the named entities, and the style is [ position-entity type-entity relationship ]. B. I, E, S, O represents the bits of the character in the entity. The set of entity types and entity relationships need to be freely defined in advance according to the requirements of the tasks. It is defined herein that "ORG" represents that the type of an entity is an organization class, "PER" represents that the type of an entity is a person name class, "LOC" represents that the type of an entity is a place name class, and "CF" represents a relationship of "Company-foundation" (Company-Founder). In the diagram (b) of fig. 1, "plum" is the originator of "sun company" (both the name of the person and the name of the company are imaginary, for example only), and thus both are in a "CF" relationship. The "sun" of "sun company" is labeled "B-ORG-CF" and represents its first character corresponding to an organization entity in the "company-originator" relationship, and similarly, the "ming" of "li ming" is labeled "E-PER-CF" and represents its last character corresponding to a name entity in the "company-originator" relationship. "Beijing" is a place name entity with no defined entity relationship to other entities in the example text, and thus "north" is labeled "B-LOC-S" where "S" represents a single entity with no entity relationship.
FIG. 2 is a flow chart of steps of an embodiment of the present invention, including the steps of:
step 1, the requirements of the tasks related to the text language are clarified, and at least two related information extraction subtasks of the text language are determined according to a specific data set and an application scene. For example, the named entity identifying subtask and the named entity relationship extracting subtask included in the diagram (b) of fig. 1.
And 2, analyzing a specific text corpus, extracting subtasks for the information of each text language, defining a corresponding independent labeling system on the corpus, wherein each task corresponds to a label set and comprises the position of characters in an entity, the entity type and the like.
For example, for the named entity recognition subtask, taking the entity type including organization class and name class as an example, the defined tag set is { B-ORG, I-ORG, E-ORG, S-ORG, B-PER, I-PER, E-PER, S-PER, O }; for the named entity relationship extraction subtask, taking the example of entity relationship including Country-prefix (county-prefix), company-originator (company-foundation), Part-Whole (Part-Whole), the defined set of tags is { e1-CP, e2-CP, e1-CF, e2-CF, e1-PW, e2-PW }, where e1, e2 represent the role position in a pair of entity relationships, e1-CP represents the role of the Country in the Country-prefix relationship.
And 3, for the subtasks with the association relation, combining the labels of the subtasks, and optimizing the public part in the label of each subtask to form a composite annotation system.
For example, the tags of the two subtasks of named entity identification and named entity standardization both contain the position information of characters in the entity, and the common part can be optimized when the tags of the two subtasks are combined, and the two subtasks share the entity position information. In addition, when the labels of each subtask are combined, the application scenarios of specific problems are combined for further optimization, and the number of labels of the label set in the composite labeling system is reduced as much as possible. For example, for the named entity recognition and the named entity relationship extraction, the formed composite labeling system is shown in fig. 1 (b), which includes both the label of the named entity recognition and the label of the entity relationship extraction.
And 4.1, marking the material by using the composite marking system defined in the step 3. The labeled results are shown in FIG. 1 (b).
And 4.2, segmenting the labeled corpus into a training data set and a testing data set.
Step 5.1, selecting a specific machine learning (including deep learning) model, which can be a traditional machine learning model, such as a conditional random field, a hidden markov model or other models based on probability maps, or a deep learning model based on a deep neural network.
And 5.2, defining a cost function according to the machine learning model. A commonly used cost function in the sequence labeling problem is a cross-entropy loss function:
Figure BDA0002001003930000051
wherein J (theta) represents a cross entropy loss function, theta represents a parameter of the model, m represents the number of training samples, y(i)Representing the true probability value, x, of the ith sample(i)Represents the ith sample input, hθA mapping function, h, representing the modelθ(x(i)) Representing the predicted output probability value under the mapping of the model for the input of the ith sample.
And 6, decoding the label sequence obtained by predicting the machine learning model prediction according to the input sequence according to the composite labeling system, and translating into a readable entity extraction result by combining the labels predicted by each adjacent character. Namely, the decoding stage of the composite labeling system extracts the entity relationship according to the principle of proximity.
If "li" is labeled as "B-PER-CF" by the model, "ming" is labeled as "E-PER-CF," locations "B" to "E" are a range of entity names, "PER" represents a person name, so the person name entity of "liming" is extracted, and "CF" represents that this entity is a person name entity in a "Company-creator" relationship, and similarly, "sun Company" is an organization class trial question in a "CF" relationship, thus a pair of relationships (sun Company, Company-creator, liming) is obtained. And similarly, decoding other marked sequences to obtain an entity marking result finally output after the model predicts the input text sequence.
Step 7. during the training iteration, optimization is performed on the training data set, typically using a gradient descent algorithm of adaptive learning rate, such as the Adam algorithm (Kingma D P, Ba J. Adam: A method for stochasticotimization [ J ]. arXiv preprint arXiv:1412.6980,2014.). While testing is performed on the test data set, and when the results on the test data set fall, training is stopped. And the fitting capability and the generalization capability of the model are ensured.
Based on the same inventive concept, another embodiment of the present invention provides a device for labeling a text language association relationship, including:
the subtask determining module is responsible for determining at least two related information extraction subtasks of the text language according to the requirements of the text language related tasks;
the tag set definition module is responsible for analyzing the text corpus and defining the tag set of each information extraction subtask;
the label combination module is responsible for combining label sets of all the information extraction subtasks to form a composite labeling system;
and the marking module is responsible for marking the text corpus according to the composite marking system.
Based on the same inventive concept, another embodiment of the present invention provides a machine learning model integrated training device supporting multiple tasks, comprising:
the data preparation module is responsible for labeling the text corpus according to the composite labeling system by adopting the method to obtain a training data set and a test data set;
the model selection module is responsible for selecting a specific machine learning model;
the decoding module is responsible for decoding a mark sequence obtained by predicting the machine learning model according to an input sequence according to a composite labeling system in a prediction stage to obtain a final label prediction result;
and the training module is responsible for optimizing the training data set and testing the testing data set in the training iterative process of the machine learning model, and stops training when the result on the testing data set is reduced.
The specific implementation of the modules is described in the foregoing description of the method of the present invention.
The above embodiments are only intended to illustrate the technical solution of the present invention and not to limit the same, and a person skilled in the art can modify the technical solution of the present invention or substitute the same without departing from the principle and scope of the present invention, and the scope of the present invention should be determined by the claims.

Claims (10)

1.A method for labeling incidence relation of text language is characterized by comprising the following steps:
1) determining at least two related information extraction subtasks of the text language according to the requirements of the text language related tasks;
2) analyzing text corpora and defining a tag set of each information extraction subtask;
3) extracting the label sets of the subtasks by combining all the information to form a composite labeling system;
4) and labeling the text corpus according to the composite labeling system.
2. The method of claim 1, wherein the information extraction subtask of step 1) includes: a named entity identification subtask, a named entity standardization subtask and a named entity relationship extraction subtask.
3. The method according to claim 1, wherein step 2) defines a separate labeling system corresponding to each text language information extraction subtask on the corpus, and each information extraction subtask corresponds to a label set and comprises a position of a character in an entity and an entity type.
4. The method according to claim 1, wherein step 3) extracts subtasks for the information with the association relationship, combines the label set of each information extraction subtask, optimizes the common part in the labels of each information extraction subtask, forms a composite labeling system, and realizes the natural fusion of multiple tasks.
5. A text language incidence relation labeling device is characterized by comprising:
the subtask determining module is responsible for determining at least two related information extraction subtasks of the text language according to the requirements of the text language related tasks;
the tag set definition module is responsible for analyzing the text corpus and defining the tag set of each information extraction subtask;
the label combination module is responsible for combining label sets of all the information extraction subtasks to form a composite labeling system;
and the marking module is responsible for marking the text corpus according to the composite marking system.
6. A machine learning model integrated training method supporting multiple tasks is characterized by comprising the following steps:
(1) labeling the text corpus according to a composite labeling system by adopting the method of any one of claims 1 to 4 to obtain a training data set and a test data set;
(2) selecting a specific machine learning model;
(3) in the prediction stage, a label sequence obtained by predicting the machine learning model according to an input sequence is decoded according to a composite labeling system to obtain a final label prediction result;
(4) and in the training iterative process of the machine learning model, optimizing on a training data set, simultaneously testing on a testing data set, and stopping training when the result on the testing data set is reduced.
7. The method of claim 6, wherein multiple tasks are fully fused together by the composite annotation architecture, enabling integrated training without separate training of multiple stages of tasks.
8. The method of claim 6, in which the machine learning model is a traditional machine learning model comprising conditional random fields, hidden Markov, or other probability map based models, or is a deep learning model based on a deep neural network.
9. The method of claim 6, wherein the decoding extracts entity relationships on a proximity basis.
10. A machine learning model integrated training device supporting multiple tasks, comprising:
the data preparation module is used for labeling the text corpus according to a composite labeling system by adopting the method of any one of claims 1 to 4 to obtain a training data set and a test data set;
the model selection module is responsible for selecting a specific machine learning model;
the decoding module is responsible for decoding a mark sequence obtained by predicting the machine learning model according to an input sequence according to a composite labeling system in a prediction stage to obtain a final label prediction result;
and the training module is responsible for optimizing the training data set and testing the testing data set in the training iterative process of the machine learning model, and stops training when the result on the testing data set is reduced.
CN201910212664.7A 2019-03-20 2019-03-20 Text language incidence relation labeling method and device Active CN111737951B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910212664.7A CN111737951B (en) 2019-03-20 2019-03-20 Text language incidence relation labeling method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910212664.7A CN111737951B (en) 2019-03-20 2019-03-20 Text language incidence relation labeling method and device

Publications (2)

Publication Number Publication Date
CN111737951A true CN111737951A (en) 2020-10-02
CN111737951B CN111737951B (en) 2022-10-14

Family

ID=72645595

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910212664.7A Active CN111737951B (en) 2019-03-20 2019-03-20 Text language incidence relation labeling method and device

Country Status (1)

Country Link
CN (1) CN111737951B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112149423A (en) * 2020-10-16 2020-12-29 中国农业科学院农业信息研究所 Corpus labeling method and system for domain-oriented entity relationship joint extraction
CN114003690A (en) * 2021-10-25 2022-02-01 南京中兴新软件有限责任公司 Information labeling method, model training method, electronic device and storage medium
CN115081453A (en) * 2022-08-23 2022-09-20 北京睿企信息科技有限公司 Named entity identification method and system

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103853706A (en) * 2012-12-06 2014-06-11 富士通株式会社 Method and equipment for converting simplified Chinese sentence into traditional Chinese sentence
US20140163951A1 (en) * 2012-12-07 2014-06-12 Xerox Corporation Hybrid adaptation of named entity recognition
CN104603773A (en) * 2012-06-14 2015-05-06 诺基亚公司 Method and apparatus for associating interest tags with media items based on social diffusions among users
CN105955123A (en) * 2011-11-14 2016-09-21 洛克威尔自动控制技术股份有限公司 Generation and publication of shared tagsets

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105955123A (en) * 2011-11-14 2016-09-21 洛克威尔自动控制技术股份有限公司 Generation and publication of shared tagsets
CN104603773A (en) * 2012-06-14 2015-05-06 诺基亚公司 Method and apparatus for associating interest tags with media items based on social diffusions among users
CN103853706A (en) * 2012-12-06 2014-06-11 富士通株式会社 Method and equipment for converting simplified Chinese sentence into traditional Chinese sentence
US20140163951A1 (en) * 2012-12-07 2014-06-12 Xerox Corporation Hybrid adaptation of named entity recognition

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
计峰: "《自然语言处理中序列标注模型的研究》", 《中国优秀博硕士学位论文全文数据库(博士)信息科技辑(月刊)2013年第03期》 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112149423A (en) * 2020-10-16 2020-12-29 中国农业科学院农业信息研究所 Corpus labeling method and system for domain-oriented entity relationship joint extraction
CN112149423B (en) * 2020-10-16 2024-01-26 中国农业科学院农业信息研究所 Corpus labeling method and system for domain entity relation joint extraction
CN114003690A (en) * 2021-10-25 2022-02-01 南京中兴新软件有限责任公司 Information labeling method, model training method, electronic device and storage medium
CN115081453A (en) * 2022-08-23 2022-09-20 北京睿企信息科技有限公司 Named entity identification method and system
CN115081453B (en) * 2022-08-23 2022-11-04 北京睿企信息科技有限公司 Named entity identification method and system

Also Published As

Publication number Publication date
CN111737951B (en) 2022-10-14

Similar Documents

Publication Publication Date Title
CN112685565B (en) Text classification method based on multi-mode information fusion and related equipment thereof
CN105718586B (en) The method and device of participle
CN107679039B (en) Method and device for determining statement intention
CN106777275B (en) Entity attribute and property value extracting method based on more granularity semantic chunks
CN109284400B (en) Named entity identification method based on Lattice LSTM and language model
CN108959242B (en) Target entity identification method and device based on part-of-speech characteristics of Chinese characters
CN112613314A (en) Electric power communication network knowledge graph construction method based on BERT model
CN111709241A (en) Named entity identification method oriented to network security field
CN108280062A (en) Entity based on deep learning and entity-relationship recognition method and device
CN111767732B (en) Document content understanding method and system based on graph attention model
CN111737951B (en) Text language incidence relation labeling method and device
CN113836925B (en) Training method and device for pre-training language model, electronic equipment and storage medium
CN112100332A (en) Word embedding expression learning method and device and text recall method and device
CN111062217A (en) Language information processing method and device, storage medium and electronic equipment
CN111753524B (en) Text sentence breaking position identification method and system, electronic equipment and storage medium
CN112528658B (en) Hierarchical classification method, hierarchical classification device, electronic equipment and storage medium
CN114676255A (en) Text processing method, device, equipment, storage medium and computer program product
CN111178080B (en) Named entity identification method and system based on structured information
CN112199954A (en) Disease entity matching method and device based on voice semantics and computer equipment
CN115374786A (en) Entity and relationship combined extraction method and device, storage medium and terminal
CN115114419A (en) Question and answer processing method and device, electronic equipment and computer readable medium
CN114676705B (en) Dialogue relation processing method, computer and readable storage medium
CN114528840A (en) Chinese entity identification method, terminal and storage medium fusing context information
CN111368532B (en) Topic word embedding disambiguation method and system based on LDA
CN111597302B (en) Text event acquisition method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant