CN108287911B - Relation extraction method based on constrained remote supervision - Google Patents

Relation extraction method based on constrained remote supervision Download PDF

Info

Publication number
CN108287911B
CN108287911B CN201810103633.3A CN201810103633A CN108287911B CN 108287911 B CN108287911 B CN 108287911B CN 201810103633 A CN201810103633 A CN 201810103633A CN 108287911 B CN108287911 B CN 108287911B
Authority
CN
China
Prior art keywords
sentence
sentences
data
training
confidence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810103633.3A
Other languages
Chinese (zh)
Other versions
CN108287911A (en
Inventor
汤斯亮
张金剑
袁愈锦
吴飞
庄越挺
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University ZJU
Original Assignee
Zhejiang University ZJU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University ZJU filed Critical Zhejiang University ZJU
Priority to CN201810103633.3A priority Critical patent/CN108287911B/en
Publication of CN108287911A publication Critical patent/CN108287911A/en
Application granted granted Critical
Publication of CN108287911B publication Critical patent/CN108287911B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Animal Behavior & Ethology (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Machine Translation (AREA)

Abstract

The invention discloses a relation extraction method based on constrained remote supervision, which comprises the following steps: (1) constructing an external knowledge base; (2) acquiring text data; (3) obtaining a sentence containing the attribute by using a remote supervision method; (4) obtaining confidence information of the sentence by using a pre-trained model; (5) and (5) regularizing the network by using the confidence information, and calculating the normalized posterior probability to obtain a relationship label. The invention provides a relation extraction method based on constrained remote supervision. The invention also uses the method of regularization posterior probability to automatically extract the characteristics of the text sentences, thereby saving the manual work and simultaneously extracting more abstract and expressive characteristics. The method is superior in effect to the traditional relation extraction algorithm and some mainstream square algorithms in recent years.

Description

Relation extraction method based on constrained remote supervision
Technical Field
The invention relates to text feature extraction and relationship extraction, in particular to a relationship extraction method based on constrained remote supervision.
Background
The world is in an era of information explosion, and the popularity and the high-speed development of the internet generate massive information resources. The significance of these resources to the development of science and technology is significant, the scientific community needs to extract the basic materials of scientific research, and the industry needs to mine potential business opportunities, so how to utilize these internet information resources is one of the mainstream directions of the scientific research in recent years.
Although the amount of information resources in the internet is enormous, these areResources tend to lack structured properties. Structured data refers to row data, data that can be expressed in a two-dimensional table structure, and unstructured data[2]The field length of (a) is variable and inconvenient to express using a two-dimensional logical table. Since many of these resources are unstructured or semi-structured data, finding and understanding this data quickly and efficiently is greatly limited.
Text data is an important part of the information resources of the internet, and most of the text data on the internet is also unstructured data, such as news, blogs, emails, government documents, chat records, system logs, and the like. To be able to efficiently utilize these unstructured text data, information extraction techniques have come to mind-automatically translating unstructured or semi-structured text in an input page into structured data. The information extraction task is defined by the target of input and extraction, and the input can be an unstructured document written by using a natural language or a semi-structured document on a webpage; and the extraction target is a k-tuple (k is the number of attributes of a record) relationship or a complex hierarchical data object.
The traditional relation extraction technology has many disadvantages, firstly, whether based on rules or classification algorithms, more manual intervention is needed, such as rule design based on rules, data labeling based on classification and feature design, the cost of the manual intervention is high, more authoritative manual data can be obtained only by labeling of professionals, meanwhile, certain errors can be brought by manual work, the errors can be continuously accumulated in subsequent algorithms, and finally, the deviation of results is overlarge; secondly, the training data set of the algorithm is limited in a certain field, namely, the algorithm has no universality, for example, a trained relation extraction classifier related to the aspect of sports news cannot be well used in other news; generally, the effect obtained by the algorithm is not ideal enough because the manually designed rule in the rule-based method is limited, and the labeled data of the classification-based method is limited and the method depends on the quality of the manually designed feature.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provides a relation extraction method based on constrained remote supervision. The invention adopts the following specific technical scheme:
the relation extraction method based on constrained remote supervision comprises the following steps
S1: acquiring information frame data of Wikipedia, converting the information frame data into entity pairs, and constructing a structured external knowledge base;
s2: obtaining forum and news data, and constructing an unstructured text corpus;
s3: searching sentences containing entity pairs in a text corpus by using a remote supervision method to form an original sentence set;
s4: marking partial sentences, training a model by using marked data to obtain a pre-training model, and inputting the unprocessed sentences into the pre-training model to obtain the posterior probability of model output;
s5: and inputting the original sentence set data set and the posterior probability thereof into a network, and training a model to obtain the relational label.
In the above scheme, each step can be specifically realized by adopting the following mode:
the S1 specifically includes the following steps:
s11: downloading the public data of Wikipedia;
s12: extracting a data frame of an entry in Wikipedia, mapping the name of the data frame to a relation name, and storing an attribute value and an entity name to form an entity pair;
s13: for all entries, attribute values and entity names that have the same relationship are saved together.
The S2 specifically includes the following steps:
s21: downloading news data;
s22: preprocessing the text data, removing tags such as HTML (hypertext markup language) or XML (extensible markup language) and the like, converting the character coding format into utf-8, and converting the format into pure text data;
s23: and using a natural language processing tool to perform word segmentation on the plain text data and extracting named entity information.
The S3 specifically includes the following steps:
s31: constructing a positive sample: under the same relation type, if the entity name and the attribute value appear in a certain sentence at the same time, marking the sentence as a positive sample;
s32: constructing a negative sample: under the same relationship type, if an entity name appears in a sentence, an attribute value does not appear in the sentence, but the sentence contains named entity information of the attribute value, the sentence is marked as a negative sample;
s33: number of equalized samples: and randomly sampling the negative samples to ensure that the number of the negative samples is equal to that of the positive samples.
The S4 specifically includes the following steps:
s41: and selecting the sentences searched in the part S3 to be stored in the following sequence form:
[sentence1sentence2sentence3...sentenceN]
wherein N is the number of selected sentences, sensorNIs the Nth sentence;
s42, manually labeling the selected sentences, and judging whether the sentences contain the relationship:
{sentence1:label1,sentence2:label2...sentenceN:labelN}
wherein labelNA tag representing an nth sentence;
s43, inputting the marked sentences into a classification algorithm, and training a network to obtain a pre-training model theta;
s44, selecting the sentences which are not manually marked and storing the sentences in the following sequence form:
[sentence1sentence2sentence3...sentenceM]
wherein M represents the number of unlabeled sentences;
s45: inputting the unlabeled sentences into a pre-training model theta to obtain the confidence coefficient:
{sentence1:confidence1,sentence2:confidence2...sentenceM:confidenceM}
wherein confidenceMRepresenting the confidence of the mth sentence.
The S5 specifically includes the following steps:
s51: acquiring a training data set:
x=[x1,x2,...,xl]
wherein xlRepresenting the ith sentence in the training set, wherein l represents the number of sentences in the training set;
s52: inputting sentences in the training data set into a pre-training model theta to obtain posterior probability output (x ') of the pre-training model output on the jth class'j):
Figure BDA0001567132020000041
Wherein K represents the number of classes, pjThe output result of the pre-training model on the jth class is obtained;
s53: inputting the confidence of the sample into the network to calculate the constraint value:
con=exp(η(λ-cofidence))
wherein η represents a penalty factor and λ represents a set threshold;
s54: the posterior probabilities of all the relations are normalized as follows:
Figure BDA0001567132020000042
Z=∑exp(η(λ-cofidence))
wherein p (x'j) The posterior probability after normalization processing is adopted;
s56: and selecting the relation with the maximum posterior probability as a prediction label of relation classification.
In S53, the confidence level of the input sample includes two parts, one part is the confidence level of the labeled sentence, and the confidence level is 1; the other part is an unlabeled sentence, and the confidence level of the unlabeled sentence is calculated according to the step S45.
In order to overcome various defects in the traditional method, the invention provides a relation extraction method based on constrained remote supervision. The invention also uses the method of regularization posterior probability to automatically extract the characteristics of the text sentences, thereby saving the manual work and simultaneously extracting more abstract and expressive characteristics. The method is superior in effect to the traditional relation extraction algorithm and some mainstream square algorithms in recent years.
Drawings
FIG. 1 is a pseudo-graphical representation of a core utilization regularization remote supervision model used by the present invention. Two positive samples and two negative samples are respectively shown on the left side of the figure, and the positive samples and the negative samples are extracted by adopting a remote supervision method. And (4) obtaining a prediction label after the calculation according to the posterior probability and the regularization.
Detailed Description
The invention is further described with reference to the following figures and detailed description.
A relation extraction method based on constrained remote supervision comprises the following steps:
s1: acquiring information frame data of Wikipedia, converting the information frame data into entity pairs, and constructing a structured external knowledge base; the specific implementation manner of the step is as follows:
s11: downloading the public data of Wikipedia;
s12: extracting a data frame of an entry in Wikipedia, mapping the name of the data frame to a relation name, and storing an attribute value and an entity name to form an entity pair;
s13: for all entries, attribute values and entity names that have the same relationship are saved together.
S2: obtaining forum and news data, and constructing an unstructured text corpus; the specific implementation manner of the step is as follows:
s21: downloading news data, such as public data of a people daily report;
s22: preprocessing the text data, removing tags such as HTML (hypertext markup language) or XML (extensible markup language) and the like, converting the character coding format into utf-8, and converting the format into pure text data;
s23: the plain text data is tokenized using natural language processing tools, such as the ending tokenization, and named entity information is extracted.
The structured data and the unstructured data in the invention directly adopt a TAC-KBP 2016 data set.
S3: as shown in fig. 1, using a remote supervision method, a text corpus is searched for sentences containing entity pairs to form an original sentence set; the specific implementation manner of the step is as follows:
s31: constructing a positive sample: under the same relation type, if the entity name and the attribute value appear in a certain sentence at the same time, marking the sentence as a positive sample;
s32: constructing a negative sample: under the same relationship type, if an entity name appears in a sentence, an attribute value does not appear in the sentence, but the sentence contains named entity information of the attribute value, the sentence is marked as a negative sample;
s33: number of equalized samples: and randomly sampling the negative samples to ensure that the number of the negative samples is equal to that of the positive samples.
S4: marking partial sentences, training a model by using marked data to obtain a pre-training model, and inputting the unprocessed sentences into the pre-training model to obtain the posterior probability of model output; the specific implementation manner of the step is as follows:
s41: and selecting the sentences searched in the part S3 to be stored in the following sequence form:
[sentence1sentence2sentence3...sentenceN]
wherein N is the number of selected sentences, sensorNIs the Nth sentence;
s42, manually labeling the selected sentences, and judging whether the sentences contain the relationship:
{sentence1:label1,sentence2:label2...sentenceN:labelN}
wherein labelNA tag representing an nth sentence;
s43, inputting the marked sentences into a classification algorithm, and training a network to obtain a pre-training model theta;
s44, selecting the sentences which are not manually marked and storing the sentences in the following sequence form:
[sentence1sentence2sentence3...sentenceM]
wherein M represents the number of unlabeled sentences;
s45: inputting the unlabeled sentences into a pre-training model theta to obtain the confidence coefficient:
{sentence1:confidence1,sentence2:confidence2...sentenceM:confidenceM}
wherein confidenceMRepresenting the confidence of the mth sentence.
S5: and inputting the original sentence set data set and the posterior probability thereof into a network, and training a model to obtain the relational label. The specific implementation manner of the step is as follows:
s51: acquiring a training data set:
x=[x1,x2,...,xl]
wherein xlRepresenting the ith sentence in the training set, wherein l represents the number of sentences in the training set;
s52: inputting sentences in the training data set into a pre-training model theta to obtain posterior probability output (x ') of the pre-training model output on the jth class'j):
Figure BDA0001567132020000061
Wherein K represents the number of classes, pjThe output result of the pre-training model on the jth class is obtained;
s53: inputting the confidence coefficient of the sample into a network to calculate a constraint value of the network, wherein the confidence coefficient of the input sample comprises two parts, one part is the confidence coefficient of a labeled sentence, and the confidence coefficient is 1; the other part is an unlabeled sentence, and the confidence level of the unlabeled sentence is calculated according to the step S45. The constraint value calculation formula is as follows:
con=exp(η(λ-cofidence))
wherein η represents a penalty factor and λ represents a set threshold;
s54: the posterior probabilities of all the relations are normalized as follows:
Figure BDA0001567132020000071
Z=∑exp(η(λ-cofidence))
wherein p (x'j) The posterior probability after normalization processing is adopted;
s56: and selecting the relation with the maximum posterior probability as a prediction label of relation classification.
The method is applied to the following examples in order that those skilled in the art will better understand the specific implementation of the present invention.
Examples
In this embodiment, taking a section of news text submitted by a user as an example, the relationship extraction is performed by using the above method, and specific parameters and methods in each implementation step are as follows:
1. the information frame data is converted into entity pairs which are stored in a sequence form and correspond to the sequence of sentences in the candidate set.
{(entity1,slotfiller1),(entity2,slotfiller2),...(entityN1,slotfillerN1)}
2. Searching whether the input sentence contains entity, and forming the sentence containing entity into original sentence set
{sentence1,sentence2,...sentencesN}
3. And searching whether the original sentence set contains the attribute value or not, and forming the sentences containing the attribute value into a candidate sentence set, wherein the sentences in the candidate set simultaneously contain the entity and the attribute value.
{candidate1,candidate2,...candidateN1}
4. And manually marking partial data to obtain manually marked accurate data.
{sentence1:label1,sentence2:label2...sentenceN:labelN}
5. And training the model to obtain a parameter theta.
{word1:vector1,word2:vector2...wordN:vectorN}
6. And acquiring candidate data which are not manually marked.
[sentence1sentence2sentence3...sentenceM]
7. Inputting the candidate data which are not labeled manually into the trained network, and respectively obtaining the corresponding confidence degrees.
{sentence1:confidence1,sentence2:confidence2...sentenceN:confidenceM}
8. Inputting the candidate data into the network to obtain the posterior probability
Figure BDA0001567132020000081
9. Calculating a constraint value based on the confidence
con=exp(η(λ-cofidence))
10. Calculating normalization parameters
Z=∑exp(η(λ-cofidence))
11. Calculating normalized posterior probability
Figure BDA0001567132020000082
12. The category in which the maximum value of the posterior probability is located is the prediction label.
As shown in Table 1, the comparison of the method described in the present invention with the pre-existing mainstream method on the TAC-KBP 2016 dataset shows that the present invention has significant advantages in Precision, Recall and F1-Score evaluation criteria.
TABLE 1
Model (model) Precision Recall F1-Score
PCNN - - 0.52
CNN 0.499 0.483 0.453
Text model 0.547 0.559 0.553

Claims (6)

1. A relation extraction method based on constrained remote supervision is characterized by comprising the following steps:
s1: acquiring information frame data of Wikipedia, converting the information frame data into entity pairs, and constructing a structured external knowledge base;
s2: obtaining forum and news data, and constructing an unstructured text corpus;
s3: searching sentences containing entity pairs in a text corpus by using a remote supervision method to form an original sentence set;
s4: marking partial sentences, training a model by using marked data to obtain a pre-training model, and inputting the unprocessed sentences into the pre-training model to obtain the posterior probability of model output;
s5: inputting the original sentence set data set and the posterior probability thereof into a network, training a model and obtaining a relational tag;
the S5 specifically includes the following steps:
s51: acquiring a training data set:
x=[x1,x2,...,xl]
wherein xlRepresenting the ith sentence in the training set, wherein l represents the number of sentences in the training set;
s52: inputting sentences in the training data set into a pre-training model theta to obtain the posterior probability output (x) of the pre-training model output on the jth classj):
Figure FDA0002355952490000011
Wherein K represents the number of classes, pjThe output result of the pre-training model on the jth class is obtained;
s53: inputting the confidence of the sample into the network to calculate the constraint value:
con=exp(η(λ-cofidence))
wherein η represents a penalty factor and λ represents a set threshold;
s54: the posterior probabilities of all the relations are normalized as follows:
Figure FDA0002355952490000012
Z=∑exp(η(λ-cofidence))
wherein p (x)j) The posterior probability after normalization processing is adopted;
s56: and selecting the relation with the maximum posterior probability as a prediction label of relation classification.
2. The method for extracting relationship based on constrained remote supervision as recited in claim 1, wherein the step S1 specifically comprises the following steps:
s11: downloading the public data of Wikipedia;
s12: extracting a data frame of an entry in Wikipedia, mapping the name of the data frame to a relation name, and storing an attribute value and an entity name to form an entity pair;
s13: for all entries, attribute values and entity names that have the same relationship are saved together.
3. The method for extracting relationship based on constrained remote supervision as recited in claim 1, wherein the step S2 specifically comprises the following steps:
s21: downloading news data;
s22: preprocessing the text data, removing HTML or XML labels, converting the character coding format into utf-8, and converting the format into pure text data;
s23: and using a natural language processing tool to perform word segmentation on the plain text data and extracting named entity information.
4. The method for extracting relationship based on constrained remote supervision as recited in claim 1, wherein the step S3 specifically comprises the following steps:
s31: constructing a positive sample: under the same relation type, if the entity name and the attribute value appear in a certain sentence at the same time, marking the sentence as a positive sample;
s32: constructing a negative sample: under the same relation type, if an entity name appears in a certain sentence, an attribute value does not appear in the sentence, but the sentence contains named entity information of the attribute value, the sentence is marked as a negative sample;
s33: number of equalized samples: and randomly sampling the negative samples to ensure that the number of the negative samples is equal to that of the positive samples.
5. The method for extracting relationship based on constrained remote supervision as recited in claim 1, wherein the step S4 specifically comprises the following steps:
s41: and selecting the sentences searched in the part S3 to be stored in the following sequence form:
[sentence1sentence2sentence3... sentenceN]
wherein N is the number of selected sentences, sensorNIs the Nth sentence;
s42, manually labeling the selected sentences, and judging whether the sentences contain the relationship:
{sentence1:label1,sentence2:label2...sentenceN:labelN}
wherein labelNA tag representing an nth sentence;
s43, inputting the marked sentences into a classification algorithm, and training a network to obtain a pre-training model theta;
s44, selecting the sentences which are not manually marked and storing the sentences in the following sequence form:
[sentence1sentence2sentence3... sentenceM]
wherein M represents the number of unlabeled sentences;
s45: inputting the unlabeled sentences into a pre-training model theta to obtain the confidence coefficient:
{sentence1:confidence1,sentence2:confidence2...sentenceM:confidenceM}
wherein confidenceMRepresenting the confidence of the mth sentence.
6. The method for extracting relationship based on constrained remote supervision as claimed in claim 1, wherein in S53, the confidence level of the input sample comprises two parts, one part is the confidence level of the labeled sentence, and the confidence level is 1; the other part is an unlabeled sentence, and the confidence level of the unlabeled sentence is calculated according to the step S45.
CN201810103633.3A 2018-02-01 2018-02-01 Relation extraction method based on constrained remote supervision Active CN108287911B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810103633.3A CN108287911B (en) 2018-02-01 2018-02-01 Relation extraction method based on constrained remote supervision

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810103633.3A CN108287911B (en) 2018-02-01 2018-02-01 Relation extraction method based on constrained remote supervision

Publications (2)

Publication Number Publication Date
CN108287911A CN108287911A (en) 2018-07-17
CN108287911B true CN108287911B (en) 2020-04-24

Family

ID=62836441

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810103633.3A Active CN108287911B (en) 2018-02-01 2018-02-01 Relation extraction method based on constrained remote supervision

Country Status (1)

Country Link
CN (1) CN108287911B (en)

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109472033B (en) * 2018-11-19 2022-12-06 华南师范大学 Method and system for extracting entity relationship in text, storage medium and electronic equipment
CN111914555B (en) * 2019-05-09 2022-08-23 中国人民大学 Automatic relation extraction system based on Transformer structure
CN110276081B (en) * 2019-06-06 2023-04-25 百度在线网络技术(北京)有限公司 Text generation method, device and storage medium
CN113282758A (en) * 2020-02-19 2021-08-20 复旦大学 Depth relation extraction method for theme knowledge fusion under government control field
CN111783463B (en) * 2020-06-30 2024-08-13 北京百度网讯科技有限公司 Knowledge extraction method and device
CN111859238B (en) * 2020-07-27 2024-07-16 平安科技(深圳)有限公司 Model-based method, device and computer equipment for predicting data change frequency
CN112307130B (en) * 2020-10-21 2022-07-05 清华大学 Document-level remote supervision relation extraction method and system
CN112860903B (en) * 2021-04-06 2022-02-22 哈尔滨工业大学 Remote supervision relation extraction method integrated with constraint information
CN113807518B (en) * 2021-08-16 2024-04-05 中央财经大学 Relation extraction system based on remote supervision

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103049454A (en) * 2011-10-16 2013-04-17 同济大学 Chinese and English search result visualization system based on multi-label classification
CN105389354A (en) * 2015-11-02 2016-03-09 东南大学 Social media text oriented unsupervised method for extracting and sorting events
CN106570148A (en) * 2016-10-27 2017-04-19 浙江大学 Convolutional neutral network-based attribute extraction method
CN106886569A (en) * 2017-01-13 2017-06-23 重庆邮电大学 A kind of ML KNN multi-tag Chinese Text Categorizations based on MPI
CN107644101A (en) * 2017-09-30 2018-01-30 百度在线网络技术(北京)有限公司 Information classification approach and device, information classification equipment and computer-readable medium

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10133728B2 (en) * 2015-03-20 2018-11-20 Microsoft Technology Licensing, Llc Semantic parsing for complex knowledge extraction

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103049454A (en) * 2011-10-16 2013-04-17 同济大学 Chinese and English search result visualization system based on multi-label classification
CN105389354A (en) * 2015-11-02 2016-03-09 东南大学 Social media text oriented unsupervised method for extracting and sorting events
CN106570148A (en) * 2016-10-27 2017-04-19 浙江大学 Convolutional neutral network-based attribute extraction method
CN106886569A (en) * 2017-01-13 2017-06-23 重庆邮电大学 A kind of ML KNN multi-tag Chinese Text Categorizations based on MPI
CN107644101A (en) * 2017-09-30 2018-01-30 百度在线网络技术(北京)有限公司 Information classification approach and device, information classification equipment and computer-readable medium

Also Published As

Publication number Publication date
CN108287911A (en) 2018-07-17

Similar Documents

Publication Publication Date Title
CN108287911B (en) Relation extraction method based on constrained remote supervision
CN109033374B (en) Knowledge graph retrieval method based on Bayesian classifier
CN110222160A (en) Intelligent semantic document recommendation method, device and computer readable storage medium
CN103218444B (en) Based on semantic method of Tibetan language webpage text classification
CN111767725B (en) Data processing method and device based on emotion polarity analysis model
CN101127042A (en) Sensibility classification method based on language model
CN112417891B (en) Text relation automatic labeling method based on open type information extraction
CN113255320A (en) Entity relation extraction method and device based on syntax tree and graph attention machine mechanism
CN107180045A (en) A kind of internet text contains the abstracting method of geographical entity relation
CN113033183B (en) Network new word discovery method and system based on statistics and similarity
CN116628173B (en) Intelligent customer service information generation system and method based on keyword extraction
CN115796181A (en) Text relation extraction method for chemical field
CN111444704B (en) Network safety keyword extraction method based on deep neural network
CN112507109A (en) Retrieval method and device based on semantic analysis and keyword recognition
CN113360582B (en) Relation classification method and system based on BERT model fusion multi-entity information
CN111428501A (en) Named entity recognition method, recognition system and computer readable storage medium
CN113627190A (en) Visualized data conversion method and device, computer equipment and storage medium
CN111814477A (en) Dispute focus discovery method and device based on dispute focus entity and terminal
CN112860898A (en) Short text box clustering method, system, equipment and storage medium
CN115759119A (en) Financial text emotion analysis method, system, medium and equipment
CN111325036A (en) Emerging technology prediction-oriented evidence fact extraction method and system
CN117573869A (en) Network connection resource key element extraction method
CN117574858A (en) Automatic generation method of class case retrieval report based on large language model
CN114238735B (en) Intelligent internet data acquisition method
CN114996455A (en) News title short text classification method based on double knowledge maps

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant