CN112487206A - Entity relationship extraction method for automatically constructing data set - Google Patents

Entity relationship extraction method for automatically constructing data set Download PDF

Info

Publication number
CN112487206A
CN112487206A CN202011428961.4A CN202011428961A CN112487206A CN 112487206 A CN112487206 A CN 112487206A CN 202011428961 A CN202011428961 A CN 202011428961A CN 112487206 A CN112487206 A CN 112487206A
Authority
CN
China
Prior art keywords
data set
entity
relationship
model
relation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011428961.4A
Other languages
Chinese (zh)
Other versions
CN112487206B (en
Inventor
房冬丽
魏超
李俊
衡宇峰
黄元稳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
CETC 30 Research Institute
Original Assignee
CETC 30 Research Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by CETC 30 Research Institute filed Critical CETC 30 Research Institute
Priority to CN202011428961.4A priority Critical patent/CN112487206B/en
Publication of CN112487206A publication Critical patent/CN112487206A/en
Application granted granted Critical
Publication of CN112487206B publication Critical patent/CN112487206B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/237Lexical tools
    • G06F40/242Dictionaries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/237Lexical tools
    • G06F40/247Thesauruses; Synonyms
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Abstract

The invention provides an entity relationship extraction method for automatically constructing a data set, which comprises the following processes: step 1, collecting and preprocessing corpora; step 2, defining a triple dictionary table and constructing a synonym table; step 3, generating a training data set and a testing data set by utilizing an LTP tool; step 4, training a network model according to the training data set; step 5, carrying out entity and relation prediction on the test data set through the trained network model; and 6, optimizing the prediction result to obtain a triple data set. By adopting the scheme of the invention, the text content can be automatically analyzed, and the problem of difficult generation of training and testing data sets is effectively solved; according to the scheme, the problem that the extraction of the entity relation needs to depend on a large amount of resources for calculation is solved through the optimization and adjustment of the bert model, and the entity relation in the text can be extracted efficiently only through fine adjustment of the bert model, so that the essential relation among multi-source heterogeneous data is presented visually.

Description

Entity relationship extraction method for automatically constructing data set
Technical Field
The invention relates to the field of natural language processing, in particular to an entity relationship extraction method for automatically constructing a data set.
Background
The entities and the relations summarize main contents of texts, can visually display the relation among data, and provide basic data for downstream tasks such as intelligent question answering and retrieval systems; at present, except that documents with formal specifications such as thesis and the like provide a plurality of keywords, most documents do not provide an intuitive data structure capable of reflecting the content of the documents; the traditional method for extracting document entities and relationships by manually reading texts cannot meet the requirements of practical application at present of multi-source and mass document data; therefore, how to efficiently and accurately extract entities and relationships is a problem which needs to be solved urgently at present; at present, there are various methods for extracting entities and relationships, and they are mainly classified into two categories, namely rule-based methods and machine learning methods, in summary.
1) The rule-based method requires a certain number of rule template sets to be constructed, modeled by the relevant linguists in the field, and assigned with matching patterns. On the basis of the linguistic knowledge of characters, word characteristics, lexical analysis, syntactic dependency analysis and the like, a high-quality grammar pattern matching template is obtained, and existing relation patterns in the text are mined. The method for manually compiling the rules can be effectively applied to the professional field in the early stage, but because of the complex diversity of the language rules, a large amount of manpower is consumed to compile the rules.
2) In the machine learning method, most entity relation extraction methods are considered from language features, and how to extract features such as part of speech tagging, syntax parse trees and the like, certain sample data needs to be constructed in advance, and then the sample is input to model training. The constructed features depend on the technical level of a natural language processing tool, and errors generated by the tool can be spread and accumulated in the entity relation extraction process, so that the subsequent operation is greatly influenced. Meanwhile, the generation of sample data also requires a lot of manpower.
Disclosure of Invention
The data is very important resources for each field, the progress of research work in each field is limited due to the shortage of the data, a large amount of human resources are consumed in a traditional manual labeling mode, and aiming at the problems, the invention provides a technical solution for extracting entity relations in texts based on a Harbour LTP tool and a bert model. The scheme mainly solves the following two problems, on one hand, the automatic analysis of the text content can be realized, and the problem of difficult generation of training and testing data sets is effectively solved; on the other hand, the problem that the extraction of the entity relationship needs to depend on a large amount of resources for calculation is solved through the optimization and adjustment of the bert model, and the entity relationship in the text can be efficiently extracted only through the fine adjustment of the bert model, so that the essential relation among multi-source heterogeneous data is visually presented.
The technical scheme adopted by the invention is as follows: an entity relationship extraction method for automatically constructing a data set comprises the following processes:
step 1, collecting and preprocessing corpora;
step 2, defining a triple dictionary table and constructing a synonym table;
step 3, generating a training data set and a testing data set by utilizing an LTP tool;
step 4, training a network model according to the training data set;
step 5, carrying out entity and relation prediction on the test data set through the trained network model;
and 6, optimizing the prediction result to obtain a triple data set.
Further, the preprocessing comprises: and clearing the collected document data and performing sentence division on the text content, wherein different types of texts are realized by adopting different segmentation modes.
Further, the specific process of step 2 is as follows: and forming a triple dictionary table according to the preprocessed expectation and the relationship between the theoretic entity categories and the categories.
Further, the LTP tool is adopted to correct data with uncertain entity relation and attribute types according to the part of speech of the annotation information, nouns and verbs, the nouns are used as entities, the verbs are used as relations or attributes, and unreasonable data are cleaned to form a training data set and a testing data set.
Further, the step 4 comprises: building a model based on a bert model, matching a relation classification model, and building an entity extraction model according to the relation predicted by the relation classification model; the input of the relation classification model and the entity extraction model is a vector of (1, n, 768) dimensionality, the output of the bert model adopts sentence level, and the characteristic information output by the bert model passes through a full connection layer and a sigmoid activation function; and respectively inputting the training data into the two models to finish training.
Further, the activation function is:
Figure BDA0002825934920000021
further, the loss function of the model is:
Figure BDA0002825934920000022
wherein y iskTag representing one-hot encoding, the output layer contains k neurons for k classes, phi (l)k) And activating the corresponding neuron of the output layer by using a sigmoid function.
Further, the specific process of step 5 is as follows: and 4, based on the network model trained in the step 4, performing entity prediction and relationship prediction on the test data set, and dividing the input text into two parameters, namely text-a and relationship text-b, by the bert-based entity and relationship network model to perform prediction of multiple relationships and labeling of entity positions.
Further, the prediction result optimization method in step 6 comprises: judging whether synonyms or similar synonyms exist in the entity relationship at the predicted position or whether the synonyms or similar synonyms refer to the same entity or relationship, judging by adopting cosine similarity, if the cosine similarity is higher than a threshold value, judging the words to be uniform words, otherwise, clearing the entity relationship with the same semantic meaning by calculating the cosine similarity, and forming a final ternary group data set; the cosine similarity calculation mode is as follows:
Figure BDA0002825934920000031
wherein x and y respectively represent vector sets of two words, xiVector representation representing the ith word in x, yiRepresenting the vector representation of the ith one of the words y.
Compared with the prior art, the beneficial effects of adopting the technical scheme are as follows: the scheme of the invention can realize automatic analysis of the text content, and effectively solve the problem of difficult generation of training and testing data sets; according to the scheme, the problem that the extraction of the entity relation needs to depend on a large amount of resources for calculation is solved through the optimization and adjustment of the bert model, and the entity relation in the text can be extracted efficiently only through fine adjustment of the bert model, so that the essential relation among multi-source heterogeneous data is presented visually.
Drawings
FIG. 1 is a flow chart of the method for extracting entity relationships for automatically building data sets according to the present invention.
Fig. 2 is a schematic diagram of a network model constructed in an embodiment of the present invention.
FIG. 3 is a flow chart of constructing a training and testing data set in an embodiment of the present invention.
FIG. 4 is a diagram of network model input vectors in an embodiment of the invention.
FIG. 5 is a diagram of entities and relationships in an embodiment of the invention.
Detailed Description
The invention is further described below with reference to the accompanying drawings.
As shown in fig. 1, the present invention provides an entity relationship extraction method for automatically constructing a data set, which includes the following processes:
step 1, collecting and preprocessing corpora;
step 2, defining a triple dictionary table and constructing a synonym table;
step 3, generating a training data set and a testing data set by utilizing an LTP tool;
step 4, training a network model according to the training data set;
step 5, carrying out entity and relation prediction on the test data set through the trained network model;
and 6, optimizing the prediction result to obtain a triple data set.
In particular, the method comprises the following steps of,
in the step 1, crawler data and non-public document data are mainly collected, the collected multi-source heterogeneous data are cleaned, finally, text content is accurately divided by sentences, and different segmentation modes can be adopted for different types of texts, such as segmentation by using a regular expression, segmentation by using punctuation marks and the like.
The specific process of the step 2 is as follows: and according to the preprocessed expectation, theorem entity classes and class relations, forming a triple dictionary table which mainly comprises information such as definition entities, relation classes, data types and the like.
In step 3, the LTP integrates a word segmentation algorithm based on dictionary matching and statistical machine learning, and can conveniently label information such as part of speech labeling, part of speech category labeling, boundaries and the like to the data, so that a training data set and a test data set are preliminarily formed by adopting an LTP tool according to the part of speech of labeling information, nouns and verbs, the nouns are used as entities, and verbs are used as relations or attributes.
The data set constructed by the LTP tool has irregular and unreasonable data, and the data set needs to be cleaned up by the part. And correcting the data with uncertain entity relationship and attribute types, cleaning unreasonable data, and continuously optimizing the entity category and relationship category dictionary table to finally form a high-quality data set.
In step 4, a sentence may contain a plurality of relationships or entities, so the extraction process for the entity relationships is a multi-label classification task. In order to more effectively extract entity relationship information in a text, the invention adopts a mode of separating an entity model from a relationship network model, firstly constructs the relationship network model and then constructs the entity network model. The main objective is to make the relationship network model extract all relationship classifications as much as possible, and the entity network model can extract the relevant entities better. In recent years, the bert model is widely applied in the field of NLP with its own advantages, so that the relationship and entity network models to be constructed herein are based on the bert model. The construction of the relationship and entity network model based on the bert model is very convenient, and only the following parameters need to be defined: transformer, self-attention, hidden units, cycle number and the like, wherein 12 layers of transformer, 12 layers of self-attention and 768 hidden units are adopted, and the cycle number is 3, and a specific model is shown in FIG. 2. The input of the relation classification model and the entity extraction model is a vector of (1, n, 768) dimensions, the output of the bert model adopts sentence level, and the characteristic information output by the bert model passes through a full connection layer and a sigmoid activation function.
In the training stage, the training data set is mainly processed, so that parameters such as the weight and the bias of the network model are formed and stored for subsequent verification on the test data set. The training process of the two network models is as follows: 1) inserting CLS and SEP at the beginning and end of a sentence, respectively, 2) converting each word into a vector of 768 dimensions, 3) generating segment ids, 4) generating positional vector. All three vectors are (1, n, 768), the elements can be added to obtain a composite representation of (1, n, 768), namely coded input of bert, coded information is input into a well-defined network model, network model training of entities and relations, storage of the network model and various weight parameters and bias information and the like are executed in parallel.
Wherein the activation function is:
Figure BDA0002825934920000051
only the model can not effectively extract useful characteristic information, and although the bert model is pre-trained based on open-source corpus, text composition of each field is complex and diverse, so that complete field data is difficult to be involved in the whole bert model pre-training stage. The invention can continuously optimize the model by training and fine-tuning the proposed model, and adopts a loss function matched with the multi-label classification task:
Figure BDA0002825934920000052
wherein y iskRepresenting a one-hot encoded tag, the output layer contains k neurons for k classes,
Figure BDA0002825934920000053
and activating the corresponding neuron of the output layer by using a sigmoid function.
The specific process of the step 5 is as follows: and (4) performing entity prediction and relation prediction on the test data set based on the network model trained in the step (4). The entity and relationship network model based on bert automatically splits an input text into two parameters, text-a and text-b, wherein the text-a is sentence content, and the text-b can be a specific relationship, so that the prediction of various relationships and the labeling of entity positions are carried out.
The prediction result optimization method in the step 6 comprises the following steps: judging whether synonyms or similar synonyms exist in the entity relationship at the predicted position or whether the synonyms or similar synonyms refer to the same entity or relationship, judging by adopting cosine similarity, if the cosine similarity is higher than a threshold value, judging the words to be uniform words, otherwise, clearing the entity relationship with the same semantic meaning by calculating the cosine similarity, and forming a final ternary group data set; the cosine similarity calculation mode is as follows:
Figure BDA0002825934920000054
wherein x and y respectively represent vector sets of two words, xiVector representation representing the ith word in x, yiRepresenting the vector representation of the ith one of the words y.
In the scheme, the optimization is also carried out aiming at the LTP and bert models: aiming at the situation that a text data set is deficient, an LTP word segmentation tool is adopted to automatically complete the construction of the data set, and the problem that a large amount of labels can only be manually marked in the past is effectively solved.
In the self-supervision task of the bert model, an MLM task is provided, namely, a part of words are randomly masked from an input sequence during network training, and then the masked words are predicted by context input into the bert model. However, MLM starts to be an NLP training method for english, and in the chinese field, the MLM task will segment chinese words and splits context semantic information. Google published the latest version of bert in 2019, and the masking granularity in the text is increased from words to words, so that the sentence semantics are ensured not to be segmented as much as possible. Unfortunately, this latest version is still not for the chinese version. As indicated in the text of pre-training with a word masking for Chinese chess, an LTP word segmentation tool can be used to segment words of sentences first, and then the words are used as granularity to cover the words, so as to perform self-supervision training. Therefore, in order to better improve the Chinese semantic understanding, the word segmentation is carried out by combining an LTP tool, then the word granularity covering capability and the open source code provided by the latest version of bert in 2019 are combined, and the pre-training process of the bert is re-executed, so that a bert model supporting the Chinese word granularity level is formed, and the entity and the relation network model adopted in the text are based on the bert model.
The invention also provides an embodiment that the original text data is mainly based on the text in the field of policy and regulation, and the collected web crawler data and the non-public text data account for 15982 parts in total, which is concretely as follows:
1) training and testing data sets are automatically constructed as shown in fig. 3.
Firstly, accurately segmenting all documents by using a regular expression to form a French stripe.
The regular expression is as follows: the first ([ two, three, four, six, seven, eight ninety, hundred million zeros 1234567890\ \ S ]) the [ chapter bar ] ([ \ \ S ]? ) (? (the first ([ two, three, four, five, seven, eight ninety, one hundred million zero 1234567890 \)
H) [ chapter bar ]).
Next, entity classes (Table 1) and class relationships (Table 2) are defined.
And finally, marking the corpus information according to the LTP tool, and cleaning the data to form a data set.
Example (c): active military/n and/c preparedness/n,/wp must/d respect/v constitution/n and/c law/n, fulfill/u obligation/n of/v citizen/n,/wp simultaneously/d enjoy/u right/n of/v citizen/n; /wp produces/u right/n and/c obligation/n of/v due to/c service/v and/c, and/wp is specified/v by/c law/n and/c other/r-related/r laws/regulations; the resulting data sets are: { active soldier, adherence, constitution }, { active soldier, adherence, law } … ….
TABLE 1 entity classes
Figure BDA0002825934920000061
TABLE 2 relationship classes
Figure BDA0002825934920000062
2) And (3) training the network model according to a training set, respectively taking the French stripe and the single relation as a bert model text-a and a text-b, and predicting all existing relations in the French stripe as much as possible.
The details are as follows: example (c): "military service is divided into active service and reserve service, { military service, divided into active service }, { military service, divided into reserve service }", which is converted into the following two sentences: "military service is divided into active service and reserve service, { military service, divided into active service }" and "military service is divided into active service and reserve service, { military service, divided into reserve service }". According to each law and the corresponding triple, carrying out sequence marking in the law; example (c): "military service is divided into active service and reserve service, { military service, divided into, active service }" is marked as "CLS B-SUB I-SUB 00B-OBJ I-OBJ 0000 SEP active service". Performing tokenization processing on each law in a training data set, dividing the law into a basic tokenizer and a WordpieceTokenizer, and performing vectorization processing, wherein the vectorization processing comprises mask vectors mark, segment sequences segments ids and position coding position encoding, and specifically as shown in fig. 4, the method mainly comprises the following specific steps: 1) inserting CLS and SEP at the beginning and end of a sentence, respectively, 2) converting each word into a vector of 768 dimensions, 3) generating segment ids, 4) generating positional vector. All three vectors are (1, n, 768), and element additions can be made to obtain a composite representation of (1, n, 768), i.e., the coded input of bert. Inputting the coded information into a defined network model, executing entity and relation network model training in parallel, storing the network model and various weight parameters and bias information, and the like.
3) Calculating cosine similarity, forming a triple data set of entity relationship, and visually presenting the relation among data, wherein the result is shown in table 3 and fig. 5:
table 3 entity relation case
Figure BDA0002825934920000071
In this embodiment, the performance of the model is evaluated by using the accuracy and the recall ratio, and the samples in the test set are divided into the following types: predicting correct positive samples TP, predicting incorrect positive samples FP, predicting correct negative samples TN, predicting incorrect negative samples FN, precision P and recall R are as follows:
Figure BDA0002825934920000081
Figure BDA0002825934920000082
in the field of entity relationship extraction, the size of comparison F1 is generally adopted. The calculation method is as follows:
Figure BDA0002825934920000083
the invention compares the values of accuracy P, recall R and F1 of different network models, as shown in Table 4, and it can be seen from Table 4 that the improved bert model achieves better effect.
TABLE 4 comparison of different models
Network model P R F1
RNN 0.57 052 0.54
LSTM 0.63 0.58 0.60
CNN 0.67 0.62 0.64
Bert 0.72 0.68 0.70
The invention is not limited to the foregoing embodiments. The invention extends to any novel feature or any novel combination of features disclosed in this specification and any novel method or process steps or any novel combination of features disclosed. Those skilled in the art to which the invention pertains will appreciate that insubstantial changes or modifications can be made without departing from the spirit of the invention as defined by the appended claims.
All of the features disclosed in this specification, or all of the steps in any method or process so disclosed, may be combined in any combination, except combinations of features and/or steps that are mutually exclusive.
Any feature disclosed in this specification may be replaced by alternative features serving equivalent or similar purposes, unless expressly stated otherwise. That is, unless expressly stated otherwise, each feature is only an example of a generic series of equivalent or similar features.

Claims (9)

1. An entity relationship extraction method for automatically constructing a data set is characterized by comprising the following processes:
step 1, collecting and preprocessing corpora;
step 2, defining a triple dictionary table and constructing a synonym table;
step 3, generating a training data set and a testing data set by utilizing an LTP tool;
step 4, training a network model according to the training data set;
step 5, carrying out entity and relation prediction on the test data set through the trained network model;
and 6, optimizing the prediction result to obtain a triple data set.
2. The method of entity relationship extraction for automatically building a data set according to claim 1, wherein the preprocessing comprises: and clearing the collected document data and performing sentence division on the text content, wherein different types of texts are realized by adopting different segmentation modes.
3. The method for extracting entity relationship of automatically building data set according to claim 2, wherein the specific process of step 2 is as follows: and forming a triple dictionary table according to the preprocessed expectation and the relationship between the theoretic entity categories and the categories.
4. The method as claimed in claim 3, wherein in step 3, an LTP tool is used to correct the data with ambiguous entity relation and attribute type according to the part of speech, noun and verb, noun as entity and verb as relation or attribute, and to clean up the unreasonable data to form a training data set and a testing data set.
5. The method for extracting entity relationship of automatically building data set according to claim 4, wherein the step 4 comprises: building a model based on a bert model, matching a relation classification model, and building an entity extraction model according to the relation predicted by the relation classification model; the input of the relation classification model and the entity extraction model is a vector of (1, n, 768) dimensionality, the output of the bert model adopts sentence level, and the characteristic information output by the bert model passes through a full connection layer and a sigmoid activation function; and respectively inputting the training data into the two models to finish training.
6. The method of entity relationship extraction for automatically building a data set according to claim 5, wherein the activation function is:
Figure FDA0002825934910000011
7. the method of extracting entity relationships for automatically constructing data sets according to claim 5, wherein the loss function of the model is:
Figure FDA0002825934910000021
wherein y iskTag representing one-hot encoding, the output layer contains k neurons for k classes, phi (l)k) And activating the corresponding neuron of the output layer by using a sigmoid function.
8. The method for extracting entity relationship of automatically building data set according to claim 1, wherein the concrete process of the step 5 is as follows: and 4, based on the network model trained in the step 4, performing entity prediction and relationship prediction on the test data set, and dividing the input text into two parameters, namely text-a and relationship text-b, by the bert-based entity and relationship network model to perform prediction of multiple relationships and labeling of entity positions.
9. The method for extracting entity relationship of automatically constructing data set according to claim 1, wherein the method for optimizing prediction result in step 6 is as follows: judging whether synonyms or similar synonyms exist in the entity relationship at the predicted position or whether the synonyms or similar synonyms refer to the same entity or relationship, judging by adopting cosine similarity, if the cosine similarity is higher than a threshold value, judging the words to be uniform words, otherwise, clearing the entity relationship with the same semantic meaning by calculating the cosine similarity, and forming a final ternary group data set; the cosine similarity calculation mode is as follows:
Figure FDA0002825934910000022
wherein x and y respectively represent vector sets of two words, xiVector representation representing the ith word in x, yiRepresenting the vector representation of the ith one of the words y.
CN202011428961.4A 2020-12-09 2020-12-09 Entity relationship extraction method for automatically constructing data set Active CN112487206B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011428961.4A CN112487206B (en) 2020-12-09 2020-12-09 Entity relationship extraction method for automatically constructing data set

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011428961.4A CN112487206B (en) 2020-12-09 2020-12-09 Entity relationship extraction method for automatically constructing data set

Publications (2)

Publication Number Publication Date
CN112487206A true CN112487206A (en) 2021-03-12
CN112487206B CN112487206B (en) 2022-09-20

Family

ID=74940894

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011428961.4A Active CN112487206B (en) 2020-12-09 2020-12-09 Entity relationship extraction method for automatically constructing data set

Country Status (1)

Country Link
CN (1) CN112487206B (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112966774A (en) * 2021-03-24 2021-06-15 黑龙江机智通智能科技有限公司 Histopathology image classification method based on image Bert
CN113051897A (en) * 2021-05-25 2021-06-29 中国电子科技集团公司第三十研究所 GPT2 text automatic generation method based on Performer structure
CN113111180A (en) * 2021-03-22 2021-07-13 杭州祺鲸科技有限公司 Chinese medical synonym clustering method based on deep pre-training neural network
CN113160917A (en) * 2021-05-18 2021-07-23 山东健康医疗大数据有限公司 Electronic medical record entity relation extraction method
CN113268577A (en) * 2021-06-04 2021-08-17 厦门快商通科技股份有限公司 Training data processing method and device based on dialogue relation and readable medium
CN113836281A (en) * 2021-09-13 2021-12-24 中国人民解放军国防科技大学 Entity relation joint extraction method based on automatic question answering
WO2022198868A1 (en) * 2021-03-26 2022-09-29 深圳壹账通智能科技有限公司 Open entity relationship extraction method, apparatus and device, and storage medium

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110134772A (en) * 2019-04-18 2019-08-16 五邑大学 Medical text Relation extraction method based on pre-training model and fine tuning technology
CN110297913A (en) * 2019-06-12 2019-10-01 中电科大数据研究院有限公司 A kind of electronic government documents entity abstracting method
CN110597998A (en) * 2019-07-19 2019-12-20 中国人民解放军国防科技大学 Military scenario entity relationship extraction method and device combined with syntactic analysis
CN110597760A (en) * 2019-09-18 2019-12-20 苏州派维斯信息科技有限公司 Intelligent method for judging compliance of electronic document
CN110619053A (en) * 2019-09-18 2019-12-27 北京百度网讯科技有限公司 Training method of entity relation extraction model and method for extracting entity relation
CN110765774A (en) * 2019-10-08 2020-02-07 北京三快在线科技有限公司 Training method and device of information extraction model and information extraction method and device
CN111061882A (en) * 2019-08-19 2020-04-24 广州利科科技有限公司 Knowledge graph construction method
CN111143536A (en) * 2019-12-30 2020-05-12 腾讯科技(深圳)有限公司 Information extraction method based on artificial intelligence, storage medium and related device
CN111581395A (en) * 2020-05-06 2020-08-25 西安交通大学 Model fusion triple representation learning system and method based on deep learning
US20200334416A1 (en) * 2019-04-16 2020-10-22 Covera Health Computer-implemented natural language understanding of medical reports
CN111931506A (en) * 2020-05-22 2020-11-13 北京理工大学 Entity relationship extraction method based on graph information enhancement

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200334416A1 (en) * 2019-04-16 2020-10-22 Covera Health Computer-implemented natural language understanding of medical reports
CN110134772A (en) * 2019-04-18 2019-08-16 五邑大学 Medical text Relation extraction method based on pre-training model and fine tuning technology
CN110297913A (en) * 2019-06-12 2019-10-01 中电科大数据研究院有限公司 A kind of electronic government documents entity abstracting method
CN110597998A (en) * 2019-07-19 2019-12-20 中国人民解放军国防科技大学 Military scenario entity relationship extraction method and device combined with syntactic analysis
CN111061882A (en) * 2019-08-19 2020-04-24 广州利科科技有限公司 Knowledge graph construction method
CN110597760A (en) * 2019-09-18 2019-12-20 苏州派维斯信息科技有限公司 Intelligent method for judging compliance of electronic document
CN110619053A (en) * 2019-09-18 2019-12-27 北京百度网讯科技有限公司 Training method of entity relation extraction model and method for extracting entity relation
CN110765774A (en) * 2019-10-08 2020-02-07 北京三快在线科技有限公司 Training method and device of information extraction model and information extraction method and device
CN111143536A (en) * 2019-12-30 2020-05-12 腾讯科技(深圳)有限公司 Information extraction method based on artificial intelligence, storage medium and related device
CN111581395A (en) * 2020-05-06 2020-08-25 西安交通大学 Model fusion triple representation learning system and method based on deep learning
CN111931506A (en) * 2020-05-22 2020-11-13 北京理工大学 Entity relationship extraction method based on graph information enhancement

Non-Patent Citations (9)

* Cited by examiner, † Cited by third party
Title
HOU R 等: "An entity relation extraction algorithm based on BERT (wwm-ext)-BiGRU-Attention", 《PROCEEDINGS OF THE 2020 INTERNATIONAL CONFERENCE ON CYBERSPACE INNOVATION OF ADVANCED TECHNOLOGIES》 *
HUI B 等: "Few-shot relation classification by context attention-based prototypical networks with BERT", 《EURASIP JOURNAL ON WIRELESS COMMUNICATIONS AND NETWORKING》 *
LIU C 等: "Chinese named entity recognition based on BERT with whole word masking", 《PROCEEDINGS OF THE 2020 6TH INTERNATIONAL CONFERENCE ON COMPUTING AND ARTIFICIAL INTELLIGENCE》 *
刘运璇: "基于深度学习的实体关系联合抽取技术研究与实现", 《中国优秀博硕士学位论文全文数据库(硕士)信息科技辑》 *
姜猛: "基于深度学习的中文信息抽取研究", 《中国优秀博硕士学位论文全文数据库(硕士)信息科技辑》 *
房冬丽 等: "一种自动构建数据集的实体关系抽取方法", 《通信技术》 *
李冬梅 等: "实体关系抽取方法研究综述", 《计算机研究与发展》 *
李颖: "中文开放式多元实体关系抽取", 《中国优秀博硕士学位论文全文数据库(硕士)信息科技辑》 *
王海宁: "民族节日知识图谱构建与应用研究", 《中国优秀博硕士学位论文全文数据库(硕士)哲学与人文科学辑》 *

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113111180A (en) * 2021-03-22 2021-07-13 杭州祺鲸科技有限公司 Chinese medical synonym clustering method based on deep pre-training neural network
CN113111180B (en) * 2021-03-22 2022-01-25 杭州祺鲸科技有限公司 Chinese medical synonym clustering method based on deep pre-training neural network
CN112966774A (en) * 2021-03-24 2021-06-15 黑龙江机智通智能科技有限公司 Histopathology image classification method based on image Bert
WO2022198868A1 (en) * 2021-03-26 2022-09-29 深圳壹账通智能科技有限公司 Open entity relationship extraction method, apparatus and device, and storage medium
CN113160917A (en) * 2021-05-18 2021-07-23 山东健康医疗大数据有限公司 Electronic medical record entity relation extraction method
CN113160917B (en) * 2021-05-18 2022-11-01 山东浪潮智慧医疗科技有限公司 Electronic medical record entity relation extraction method
CN113051897A (en) * 2021-05-25 2021-06-29 中国电子科技集团公司第三十研究所 GPT2 text automatic generation method based on Performer structure
CN113051897B (en) * 2021-05-25 2021-09-10 中国电子科技集团公司第三十研究所 GPT2 text automatic generation method based on Performer structure
CN113268577A (en) * 2021-06-04 2021-08-17 厦门快商通科技股份有限公司 Training data processing method and device based on dialogue relation and readable medium
CN113836281A (en) * 2021-09-13 2021-12-24 中国人民解放军国防科技大学 Entity relation joint extraction method based on automatic question answering

Also Published As

Publication number Publication date
CN112487206B (en) 2022-09-20

Similar Documents

Publication Publication Date Title
CN112487206B (en) Entity relationship extraction method for automatically constructing data set
CN113011533B (en) Text classification method, apparatus, computer device and storage medium
CN112214610B (en) Entity relationship joint extraction method based on span and knowledge enhancement
CN106599032B (en) Text event extraction method combining sparse coding and structure sensing machine
CN112541337B (en) Document template automatic generation method and system based on recurrent neural network language model
CN109614620B (en) HowNet-based graph model word sense disambiguation method and system
CN117151220B (en) Entity link and relationship based extraction industry knowledge base system and method
CN115357719B (en) Power audit text classification method and device based on improved BERT model
Sarwadnya et al. Marathi extractive text summarizer using graph based model
US20200311345A1 (en) System and method for language-independent contextual embedding
CN112417823B (en) Chinese text word order adjustment and word completion method and system
CN112183059A (en) Chinese structured event extraction method
CN114881043A (en) Deep learning model-based legal document semantic similarity evaluation method and system
CN112818698B (en) Fine-grained user comment sentiment analysis method based on dual-channel model
CN116342167B (en) Intelligent cost measurement method and device based on sequence labeling named entity recognition
CN110377753B (en) Relation extraction method and device based on relation trigger word and GRU model
Seo et al. Plain Template Insertion: Korean-Prompt-Based Engineering for Few-Shot Learners
CN110807096A (en) Information pair matching method and system on small sample set
Ramesh et al. Interpretable natural language segmentation based on link grammar
Lu et al. Attributed rhetorical structure grammar for domain text summarization
Maheswari et al. Rule based morphological variation removable stemming algorithm
CN114528459A (en) Semantic-based webpage information extraction method and system
Novák et al. Resolving noun phrase coreference in czech
Tolegen et al. Voted-perceptron approach for Kazakh morphological disambiguation
Wei Research on Internet Text Sentiment Classification Based on BERT and CNN-BiGRU

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant