CN108287911B - Relation extraction method based on constrained remote supervision - Google Patents
Relation extraction method based on constrained remote supervision Download PDFInfo
- Publication number
- CN108287911B CN108287911B CN201810103633.3A CN201810103633A CN108287911B CN 108287911 B CN108287911 B CN 108287911B CN 201810103633 A CN201810103633 A CN 201810103633A CN 108287911 B CN108287911 B CN 108287911B
- Authority
- CN
- China
- Prior art keywords
- sentence
- sentences
- data
- training
- confidence
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000605 extraction Methods 0.000 title claims abstract description 21
- 238000000034 method Methods 0.000 claims abstract description 24
- 238000012549 training Methods 0.000 claims description 44
- 238000002372 labelling Methods 0.000 claims description 5
- 238000007635 classification algorithm Methods 0.000 claims description 4
- 238000010606 normalization Methods 0.000 claims description 4
- 238000013507 mapping Methods 0.000 claims description 3
- 238000003058 natural language processing Methods 0.000 claims description 3
- 238000007781 pre-processing Methods 0.000 claims description 3
- 238000012545 processing Methods 0.000 claims description 3
- 238000005070 sampling Methods 0.000 claims description 3
- 230000011218 segmentation Effects 0.000 claims description 2
- 238000004422 calculation algorithm Methods 0.000 abstract description 8
- 230000000694 effects Effects 0.000 abstract description 3
- 238000004364 calculation method Methods 0.000 description 2
- 230000007547 defect Effects 0.000 description 2
- 238000013461 design Methods 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 230000018109 developmental process Effects 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 238000004880 explosion Methods 0.000 description 1
- 239000000463 material Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/36—Creation of semantic tools, e.g. ontology or thesauri
- G06F16/367—Ontology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Computational Linguistics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Animal Behavior & Ethology (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- General Health & Medical Sciences (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Machine Translation (AREA)
Abstract
The invention discloses a relation extraction method based on constrained remote supervision, which comprises the following steps: (1) constructing an external knowledge base; (2) acquiring text data; (3) obtaining a sentence containing the attribute by using a remote supervision method; (4) obtaining confidence information of the sentence by using a pre-trained model; (5) and (5) regularizing the network by using the confidence information, and calculating the normalized posterior probability to obtain a relationship label. The invention provides a relation extraction method based on constrained remote supervision. The invention also uses the method of regularization posterior probability to automatically extract the characteristics of the text sentences, thereby saving the manual work and simultaneously extracting more abstract and expressive characteristics. The method is superior in effect to the traditional relation extraction algorithm and some mainstream square algorithms in recent years.
Description
Technical Field
The invention relates to text feature extraction and relationship extraction, in particular to a relationship extraction method based on constrained remote supervision.
Background
The world is in an era of information explosion, and the popularity and the high-speed development of the internet generate massive information resources. The significance of these resources to the development of science and technology is significant, the scientific community needs to extract the basic materials of scientific research, and the industry needs to mine potential business opportunities, so how to utilize these internet information resources is one of the mainstream directions of the scientific research in recent years.
Although the amount of information resources in the internet is enormous, these areResources tend to lack structured properties. Structured data refers to row data, data that can be expressed in a two-dimensional table structure, and unstructured data[2]The field length of (a) is variable and inconvenient to express using a two-dimensional logical table. Since many of these resources are unstructured or semi-structured data, finding and understanding this data quickly and efficiently is greatly limited.
Text data is an important part of the information resources of the internet, and most of the text data on the internet is also unstructured data, such as news, blogs, emails, government documents, chat records, system logs, and the like. To be able to efficiently utilize these unstructured text data, information extraction techniques have come to mind-automatically translating unstructured or semi-structured text in an input page into structured data. The information extraction task is defined by the target of input and extraction, and the input can be an unstructured document written by using a natural language or a semi-structured document on a webpage; and the extraction target is a k-tuple (k is the number of attributes of a record) relationship or a complex hierarchical data object.
The traditional relation extraction technology has many disadvantages, firstly, whether based on rules or classification algorithms, more manual intervention is needed, such as rule design based on rules, data labeling based on classification and feature design, the cost of the manual intervention is high, more authoritative manual data can be obtained only by labeling of professionals, meanwhile, certain errors can be brought by manual work, the errors can be continuously accumulated in subsequent algorithms, and finally, the deviation of results is overlarge; secondly, the training data set of the algorithm is limited in a certain field, namely, the algorithm has no universality, for example, a trained relation extraction classifier related to the aspect of sports news cannot be well used in other news; generally, the effect obtained by the algorithm is not ideal enough because the manually designed rule in the rule-based method is limited, and the labeled data of the classification-based method is limited and the method depends on the quality of the manually designed feature.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provides a relation extraction method based on constrained remote supervision. The invention adopts the following specific technical scheme:
the relation extraction method based on constrained remote supervision comprises the following steps
S1: acquiring information frame data of Wikipedia, converting the information frame data into entity pairs, and constructing a structured external knowledge base;
s2: obtaining forum and news data, and constructing an unstructured text corpus;
s3: searching sentences containing entity pairs in a text corpus by using a remote supervision method to form an original sentence set;
s4: marking partial sentences, training a model by using marked data to obtain a pre-training model, and inputting the unprocessed sentences into the pre-training model to obtain the posterior probability of model output;
s5: and inputting the original sentence set data set and the posterior probability thereof into a network, and training a model to obtain the relational label.
In the above scheme, each step can be specifically realized by adopting the following mode:
the S1 specifically includes the following steps:
s11: downloading the public data of Wikipedia;
s12: extracting a data frame of an entry in Wikipedia, mapping the name of the data frame to a relation name, and storing an attribute value and an entity name to form an entity pair;
s13: for all entries, attribute values and entity names that have the same relationship are saved together.
The S2 specifically includes the following steps:
s21: downloading news data;
s22: preprocessing the text data, removing tags such as HTML (hypertext markup language) or XML (extensible markup language) and the like, converting the character coding format into utf-8, and converting the format into pure text data;
s23: and using a natural language processing tool to perform word segmentation on the plain text data and extracting named entity information.
The S3 specifically includes the following steps:
s31: constructing a positive sample: under the same relation type, if the entity name and the attribute value appear in a certain sentence at the same time, marking the sentence as a positive sample;
s32: constructing a negative sample: under the same relationship type, if an entity name appears in a sentence, an attribute value does not appear in the sentence, but the sentence contains named entity information of the attribute value, the sentence is marked as a negative sample;
s33: number of equalized samples: and randomly sampling the negative samples to ensure that the number of the negative samples is equal to that of the positive samples.
The S4 specifically includes the following steps:
s41: and selecting the sentences searched in the part S3 to be stored in the following sequence form:
[sentence1sentence2sentence3...sentenceN]
wherein N is the number of selected sentences, sensorNIs the Nth sentence;
s42, manually labeling the selected sentences, and judging whether the sentences contain the relationship:
{sentence1:label1,sentence2:label2...sentenceN:labelN}
wherein labelNA tag representing an nth sentence;
s43, inputting the marked sentences into a classification algorithm, and training a network to obtain a pre-training model theta;
s44, selecting the sentences which are not manually marked and storing the sentences in the following sequence form:
[sentence1sentence2sentence3...sentenceM]
wherein M represents the number of unlabeled sentences;
s45: inputting the unlabeled sentences into a pre-training model theta to obtain the confidence coefficient:
{sentence1:confidence1,sentence2:confidence2...sentenceM:confidenceM}
wherein confidenceMRepresenting the confidence of the mth sentence.
The S5 specifically includes the following steps:
s51: acquiring a training data set:
x=[x1,x2,...,xl]
wherein xlRepresenting the ith sentence in the training set, wherein l represents the number of sentences in the training set;
s52: inputting sentences in the training data set into a pre-training model theta to obtain posterior probability output (x ') of the pre-training model output on the jth class'j):
Wherein K represents the number of classes, pjThe output result of the pre-training model on the jth class is obtained;
s53: inputting the confidence of the sample into the network to calculate the constraint value:
con=exp(η(λ-cofidence))
wherein η represents a penalty factor and λ represents a set threshold;
s54: the posterior probabilities of all the relations are normalized as follows:
Z=∑exp(η(λ-cofidence))
wherein p (x'j) The posterior probability after normalization processing is adopted;
s56: and selecting the relation with the maximum posterior probability as a prediction label of relation classification.
In S53, the confidence level of the input sample includes two parts, one part is the confidence level of the labeled sentence, and the confidence level is 1; the other part is an unlabeled sentence, and the confidence level of the unlabeled sentence is calculated according to the step S45.
In order to overcome various defects in the traditional method, the invention provides a relation extraction method based on constrained remote supervision. The invention also uses the method of regularization posterior probability to automatically extract the characteristics of the text sentences, thereby saving the manual work and simultaneously extracting more abstract and expressive characteristics. The method is superior in effect to the traditional relation extraction algorithm and some mainstream square algorithms in recent years.
Drawings
FIG. 1 is a pseudo-graphical representation of a core utilization regularization remote supervision model used by the present invention. Two positive samples and two negative samples are respectively shown on the left side of the figure, and the positive samples and the negative samples are extracted by adopting a remote supervision method. And (4) obtaining a prediction label after the calculation according to the posterior probability and the regularization.
Detailed Description
The invention is further described with reference to the following figures and detailed description.
A relation extraction method based on constrained remote supervision comprises the following steps:
s1: acquiring information frame data of Wikipedia, converting the information frame data into entity pairs, and constructing a structured external knowledge base; the specific implementation manner of the step is as follows:
s11: downloading the public data of Wikipedia;
s12: extracting a data frame of an entry in Wikipedia, mapping the name of the data frame to a relation name, and storing an attribute value and an entity name to form an entity pair;
s13: for all entries, attribute values and entity names that have the same relationship are saved together.
S2: obtaining forum and news data, and constructing an unstructured text corpus; the specific implementation manner of the step is as follows:
s21: downloading news data, such as public data of a people daily report;
s22: preprocessing the text data, removing tags such as HTML (hypertext markup language) or XML (extensible markup language) and the like, converting the character coding format into utf-8, and converting the format into pure text data;
s23: the plain text data is tokenized using natural language processing tools, such as the ending tokenization, and named entity information is extracted.
The structured data and the unstructured data in the invention directly adopt a TAC-KBP 2016 data set.
S3: as shown in fig. 1, using a remote supervision method, a text corpus is searched for sentences containing entity pairs to form an original sentence set; the specific implementation manner of the step is as follows:
s31: constructing a positive sample: under the same relation type, if the entity name and the attribute value appear in a certain sentence at the same time, marking the sentence as a positive sample;
s32: constructing a negative sample: under the same relationship type, if an entity name appears in a sentence, an attribute value does not appear in the sentence, but the sentence contains named entity information of the attribute value, the sentence is marked as a negative sample;
s33: number of equalized samples: and randomly sampling the negative samples to ensure that the number of the negative samples is equal to that of the positive samples.
S4: marking partial sentences, training a model by using marked data to obtain a pre-training model, and inputting the unprocessed sentences into the pre-training model to obtain the posterior probability of model output; the specific implementation manner of the step is as follows:
s41: and selecting the sentences searched in the part S3 to be stored in the following sequence form:
[sentence1sentence2sentence3...sentenceN]
wherein N is the number of selected sentences, sensorNIs the Nth sentence;
s42, manually labeling the selected sentences, and judging whether the sentences contain the relationship:
{sentence1:label1,sentence2:label2...sentenceN:labelN}
wherein labelNA tag representing an nth sentence;
s43, inputting the marked sentences into a classification algorithm, and training a network to obtain a pre-training model theta;
s44, selecting the sentences which are not manually marked and storing the sentences in the following sequence form:
[sentence1sentence2sentence3...sentenceM]
wherein M represents the number of unlabeled sentences;
s45: inputting the unlabeled sentences into a pre-training model theta to obtain the confidence coefficient:
{sentence1:confidence1,sentence2:confidence2...sentenceM:confidenceM}
wherein confidenceMRepresenting the confidence of the mth sentence.
S5: and inputting the original sentence set data set and the posterior probability thereof into a network, and training a model to obtain the relational label. The specific implementation manner of the step is as follows:
s51: acquiring a training data set:
x=[x1,x2,...,xl]
wherein xlRepresenting the ith sentence in the training set, wherein l represents the number of sentences in the training set;
s52: inputting sentences in the training data set into a pre-training model theta to obtain posterior probability output (x ') of the pre-training model output on the jth class'j):
Wherein K represents the number of classes, pjThe output result of the pre-training model on the jth class is obtained;
s53: inputting the confidence coefficient of the sample into a network to calculate a constraint value of the network, wherein the confidence coefficient of the input sample comprises two parts, one part is the confidence coefficient of a labeled sentence, and the confidence coefficient is 1; the other part is an unlabeled sentence, and the confidence level of the unlabeled sentence is calculated according to the step S45. The constraint value calculation formula is as follows:
con=exp(η(λ-cofidence))
wherein η represents a penalty factor and λ represents a set threshold;
s54: the posterior probabilities of all the relations are normalized as follows:
Z=∑exp(η(λ-cofidence))
wherein p (x'j) The posterior probability after normalization processing is adopted;
s56: and selecting the relation with the maximum posterior probability as a prediction label of relation classification.
The method is applied to the following examples in order that those skilled in the art will better understand the specific implementation of the present invention.
Examples
In this embodiment, taking a section of news text submitted by a user as an example, the relationship extraction is performed by using the above method, and specific parameters and methods in each implementation step are as follows:
1. the information frame data is converted into entity pairs which are stored in a sequence form and correspond to the sequence of sentences in the candidate set.
{(entity1,slotfiller1),(entity2,slotfiller2),...(entityN1,slotfillerN1)}
2. Searching whether the input sentence contains entity, and forming the sentence containing entity into original sentence set
{sentence1,sentence2,...sentencesN}
3. And searching whether the original sentence set contains the attribute value or not, and forming the sentences containing the attribute value into a candidate sentence set, wherein the sentences in the candidate set simultaneously contain the entity and the attribute value.
{candidate1,candidate2,...candidateN1}
4. And manually marking partial data to obtain manually marked accurate data.
{sentence1:label1,sentence2:label2...sentenceN:labelN}
5. And training the model to obtain a parameter theta.
{word1:vector1,word2:vector2...wordN:vectorN}
6. And acquiring candidate data which are not manually marked.
[sentence1sentence2sentence3...sentenceM]
7. Inputting the candidate data which are not labeled manually into the trained network, and respectively obtaining the corresponding confidence degrees.
{sentence1:confidence1,sentence2:confidence2...sentenceN:confidenceM}
8. Inputting the candidate data into the network to obtain the posterior probability
9. Calculating a constraint value based on the confidence
con=exp(η(λ-cofidence))
10. Calculating normalization parameters
Z=∑exp(η(λ-cofidence))
11. Calculating normalized posterior probability
12. The category in which the maximum value of the posterior probability is located is the prediction label.
As shown in Table 1, the comparison of the method described in the present invention with the pre-existing mainstream method on the TAC-KBP 2016 dataset shows that the present invention has significant advantages in Precision, Recall and F1-Score evaluation criteria.
TABLE 1
Model (model) | Precision | Recall | F1-Score |
PCNN | - | - | 0.52 |
CNN | 0.499 | 0.483 | 0.453 |
Text model | 0.547 | 0.559 | 0.553 |
Claims (6)
1. A relation extraction method based on constrained remote supervision is characterized by comprising the following steps:
s1: acquiring information frame data of Wikipedia, converting the information frame data into entity pairs, and constructing a structured external knowledge base;
s2: obtaining forum and news data, and constructing an unstructured text corpus;
s3: searching sentences containing entity pairs in a text corpus by using a remote supervision method to form an original sentence set;
s4: marking partial sentences, training a model by using marked data to obtain a pre-training model, and inputting the unprocessed sentences into the pre-training model to obtain the posterior probability of model output;
s5: inputting the original sentence set data set and the posterior probability thereof into a network, training a model and obtaining a relational tag;
the S5 specifically includes the following steps:
s51: acquiring a training data set:
x=[x1,x2,...,xl]
wherein xlRepresenting the ith sentence in the training set, wherein l represents the number of sentences in the training set;
s52: inputting sentences in the training data set into a pre-training model theta to obtain the posterior probability output (x) of the pre-training model output on the jth classj):
Wherein K represents the number of classes, pjThe output result of the pre-training model on the jth class is obtained;
s53: inputting the confidence of the sample into the network to calculate the constraint value:
con=exp(η(λ-cofidence))
wherein η represents a penalty factor and λ represents a set threshold;
s54: the posterior probabilities of all the relations are normalized as follows:
Z=∑exp(η(λ-cofidence))
wherein p (x)j) The posterior probability after normalization processing is adopted;
s56: and selecting the relation with the maximum posterior probability as a prediction label of relation classification.
2. The method for extracting relationship based on constrained remote supervision as recited in claim 1, wherein the step S1 specifically comprises the following steps:
s11: downloading the public data of Wikipedia;
s12: extracting a data frame of an entry in Wikipedia, mapping the name of the data frame to a relation name, and storing an attribute value and an entity name to form an entity pair;
s13: for all entries, attribute values and entity names that have the same relationship are saved together.
3. The method for extracting relationship based on constrained remote supervision as recited in claim 1, wherein the step S2 specifically comprises the following steps:
s21: downloading news data;
s22: preprocessing the text data, removing HTML or XML labels, converting the character coding format into utf-8, and converting the format into pure text data;
s23: and using a natural language processing tool to perform word segmentation on the plain text data and extracting named entity information.
4. The method for extracting relationship based on constrained remote supervision as recited in claim 1, wherein the step S3 specifically comprises the following steps:
s31: constructing a positive sample: under the same relation type, if the entity name and the attribute value appear in a certain sentence at the same time, marking the sentence as a positive sample;
s32: constructing a negative sample: under the same relation type, if an entity name appears in a certain sentence, an attribute value does not appear in the sentence, but the sentence contains named entity information of the attribute value, the sentence is marked as a negative sample;
s33: number of equalized samples: and randomly sampling the negative samples to ensure that the number of the negative samples is equal to that of the positive samples.
5. The method for extracting relationship based on constrained remote supervision as recited in claim 1, wherein the step S4 specifically comprises the following steps:
s41: and selecting the sentences searched in the part S3 to be stored in the following sequence form:
[sentence1sentence2sentence3... sentenceN]
wherein N is the number of selected sentences, sensorNIs the Nth sentence;
s42, manually labeling the selected sentences, and judging whether the sentences contain the relationship:
{sentence1:label1,sentence2:label2...sentenceN:labelN}
wherein labelNA tag representing an nth sentence;
s43, inputting the marked sentences into a classification algorithm, and training a network to obtain a pre-training model theta;
s44, selecting the sentences which are not manually marked and storing the sentences in the following sequence form:
[sentence1sentence2sentence3... sentenceM]
wherein M represents the number of unlabeled sentences;
s45: inputting the unlabeled sentences into a pre-training model theta to obtain the confidence coefficient:
{sentence1:confidence1,sentence2:confidence2...sentenceM:confidenceM}
wherein confidenceMRepresenting the confidence of the mth sentence.
6. The method for extracting relationship based on constrained remote supervision as claimed in claim 1, wherein in S53, the confidence level of the input sample comprises two parts, one part is the confidence level of the labeled sentence, and the confidence level is 1; the other part is an unlabeled sentence, and the confidence level of the unlabeled sentence is calculated according to the step S45.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810103633.3A CN108287911B (en) | 2018-02-01 | 2018-02-01 | Relation extraction method based on constrained remote supervision |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810103633.3A CN108287911B (en) | 2018-02-01 | 2018-02-01 | Relation extraction method based on constrained remote supervision |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108287911A CN108287911A (en) | 2018-07-17 |
CN108287911B true CN108287911B (en) | 2020-04-24 |
Family
ID=62836441
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810103633.3A Active CN108287911B (en) | 2018-02-01 | 2018-02-01 | Relation extraction method based on constrained remote supervision |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108287911B (en) |
Families Citing this family (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109472033B (en) * | 2018-11-19 | 2022-12-06 | 华南师范大学 | Method and system for extracting entity relationship in text, storage medium and electronic equipment |
CN111914555B (en) * | 2019-05-09 | 2022-08-23 | 中国人民大学 | Automatic relation extraction system based on Transformer structure |
CN110276081B (en) * | 2019-06-06 | 2023-04-25 | 百度在线网络技术(北京)有限公司 | Text generation method, device and storage medium |
CN113282758A (en) * | 2020-02-19 | 2021-08-20 | 复旦大学 | Depth relation extraction method for theme knowledge fusion under government control field |
CN111783463B (en) * | 2020-06-30 | 2024-08-13 | 北京百度网讯科技有限公司 | Knowledge extraction method and device |
CN111859238B (en) * | 2020-07-27 | 2024-07-16 | 平安科技(深圳)有限公司 | Model-based method, device and computer equipment for predicting data change frequency |
CN112307130B (en) * | 2020-10-21 | 2022-07-05 | 清华大学 | Document-level remote supervision relation extraction method and system |
CN112860903B (en) * | 2021-04-06 | 2022-02-22 | 哈尔滨工业大学 | Remote supervision relation extraction method integrated with constraint information |
CN113807518B (en) * | 2021-08-16 | 2024-04-05 | 中央财经大学 | Relation extraction system based on remote supervision |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103049454A (en) * | 2011-10-16 | 2013-04-17 | 同济大学 | Chinese and English search result visualization system based on multi-label classification |
CN105389354A (en) * | 2015-11-02 | 2016-03-09 | 东南大学 | Social media text oriented unsupervised method for extracting and sorting events |
CN106570148A (en) * | 2016-10-27 | 2017-04-19 | 浙江大学 | Convolutional neutral network-based attribute extraction method |
CN106886569A (en) * | 2017-01-13 | 2017-06-23 | 重庆邮电大学 | A kind of ML KNN multi-tag Chinese Text Categorizations based on MPI |
CN107644101A (en) * | 2017-09-30 | 2018-01-30 | 百度在线网络技术(北京)有限公司 | Information classification approach and device, information classification equipment and computer-readable medium |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10133728B2 (en) * | 2015-03-20 | 2018-11-20 | Microsoft Technology Licensing, Llc | Semantic parsing for complex knowledge extraction |
-
2018
- 2018-02-01 CN CN201810103633.3A patent/CN108287911B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103049454A (en) * | 2011-10-16 | 2013-04-17 | 同济大学 | Chinese and English search result visualization system based on multi-label classification |
CN105389354A (en) * | 2015-11-02 | 2016-03-09 | 东南大学 | Social media text oriented unsupervised method for extracting and sorting events |
CN106570148A (en) * | 2016-10-27 | 2017-04-19 | 浙江大学 | Convolutional neutral network-based attribute extraction method |
CN106886569A (en) * | 2017-01-13 | 2017-06-23 | 重庆邮电大学 | A kind of ML KNN multi-tag Chinese Text Categorizations based on MPI |
CN107644101A (en) * | 2017-09-30 | 2018-01-30 | 百度在线网络技术(北京)有限公司 | Information classification approach and device, information classification equipment and computer-readable medium |
Also Published As
Publication number | Publication date |
---|---|
CN108287911A (en) | 2018-07-17 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108287911B (en) | Relation extraction method based on constrained remote supervision | |
CN109033374B (en) | Knowledge graph retrieval method based on Bayesian classifier | |
CN110222160A (en) | Intelligent semantic document recommendation method, device and computer readable storage medium | |
CN103218444B (en) | Based on semantic method of Tibetan language webpage text classification | |
CN111767725B (en) | Data processing method and device based on emotion polarity analysis model | |
CN101127042A (en) | Sensibility classification method based on language model | |
CN112417891B (en) | Text relation automatic labeling method based on open type information extraction | |
CN113255320A (en) | Entity relation extraction method and device based on syntax tree and graph attention machine mechanism | |
CN107180045A (en) | A kind of internet text contains the abstracting method of geographical entity relation | |
CN113033183B (en) | Network new word discovery method and system based on statistics and similarity | |
CN116628173B (en) | Intelligent customer service information generation system and method based on keyword extraction | |
CN115796181A (en) | Text relation extraction method for chemical field | |
CN111444704B (en) | Network safety keyword extraction method based on deep neural network | |
CN112507109A (en) | Retrieval method and device based on semantic analysis and keyword recognition | |
CN113360582B (en) | Relation classification method and system based on BERT model fusion multi-entity information | |
CN111428501A (en) | Named entity recognition method, recognition system and computer readable storage medium | |
CN113627190A (en) | Visualized data conversion method and device, computer equipment and storage medium | |
CN111814477A (en) | Dispute focus discovery method and device based on dispute focus entity and terminal | |
CN112860898A (en) | Short text box clustering method, system, equipment and storage medium | |
CN115759119A (en) | Financial text emotion analysis method, system, medium and equipment | |
CN111325036A (en) | Emerging technology prediction-oriented evidence fact extraction method and system | |
CN117573869A (en) | Network connection resource key element extraction method | |
CN117574858A (en) | Automatic generation method of class case retrieval report based on large language model | |
CN114238735B (en) | Intelligent internet data acquisition method | |
CN114996455A (en) | News title short text classification method based on double knowledge maps |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |