CN108363816A - Open entity relation extraction method based on sentence justice structural model - Google Patents
Open entity relation extraction method based on sentence justice structural model Download PDFInfo
- Publication number
- CN108363816A CN108363816A CN201810234056.1A CN201810234056A CN108363816A CN 108363816 A CN108363816 A CN 108363816A CN 201810234056 A CN201810234056 A CN 201810234056A CN 108363816 A CN108363816 A CN 108363816A
- Authority
- CN
- China
- Prior art keywords
- sentence
- entity relationship
- similarity
- entity
- group
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
- G06F16/3344—Query execution using natural language analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/36—Creation of semantic tools, e.g. ontology or thesauri
- G06F16/367—Ontology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
- G06F40/211—Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/284—Lexical analysis, e.g. tokenisation or collocates
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Life Sciences & Earth Sciences (AREA)
- Animal Behavior & Ethology (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The present invention relates to the open entity relation extraction methods based on sentence justice structural model, belong to computer and information science technical field.The present invention extracts the text of microblog data first, carries out subordinate sentence, segments, removes stop words and part-of-speech tagging, recycles dependency analysis tool, obtain interdependent syntax analytic tree;Candidate argument is determined secondly by basic noun recognition rule, and marriage relation word decimation rule and argument decimation rule obtain entity relationship triple, using confidence calculations Rules Filtering entity relationship triple, obtain candidate entity relationship pair;It is then based on CSM calculating sentence similarities and obtains Sim1, sentence similarity is calculated based on PV and obtains Sim2, and carry out Similarity-Weighted and merge to obtain sentence similarity, to obtain sentence similarity matrix;Finally by the sentence similarity matrix of generation, according to similarity threshold, similar sentence group is divided, and combines sentence includes in group entity relationship to corresponding confidence level, entity relationship pair in merging group.The present invention tests on NLP&&CC microbloggings evaluation and test language material, the results showed that confidence level and divides similar sentence group by computational entity relationship, entity relationship pair in merging group improves accuracy rate and recall rate, achieved the effect that de-redundancy.
Description
Technical field
The present invention relates to the open entity relation extraction methods based on sentence justice structural model, belong to computer and Information Center
Learn technical field.
Background technology
Open entity relation extraction technology from mixed and disorderly unordered network data, can extract unlimited classification entity,
Entity relationship forms structured information output.The extraction result that mixed and disorderly redundancy property in order to solve microblog data causes is accurate
The problem of true rate is low and redundancy, the characteristics of needing redundancy mixed and disorderly for microblogging, study open entity relation extraction technology.Therefore,
The present invention improves system in the micro- of mixed and disorderly redundancy by the open entity relation extraction method based on sentence justice structural model is provided
The ability of entity relationship is extracted in rich data.
Open entity relation extraction method based on sentence justice structural model needs the basic problem solved to be:From mixed and disorderly nothing
In the network data of sequence, entity, the entity relationship of unlimited classification are extracted, forms structured information output.Take a broad view of existing open
Formula entity relation extraction system and method are put, are specifically included following several:
1.TextRunner systems and WOE systems
TextRunner is first open information extraction system, is trained by features such as part of speech and base noun phrases
Model-naive Bayesian extracts the relationship between entity.Further work shows to be modeled according to text sequence characteristic information
Grader can obtain better effect, such as linear conditions random field and Markov Logic Network.WOE systems are by Wiki hundred
For section's data as training set, TextRunner systems can effectively be promoted using the dependence in data by being experimentally confirmed
Ability.TextRunner and WOE systems belong to first identification name entity, then the method for extracting entity relationship.
The rule-based method of 2.ReVerb and Gamallo et al.
ReVerb determines a relationship phrase centered on verb first, is taken out in conjunction with semantic rules and syntax rule constraint
Entity relationship triple is taken, position constraint rule extraction entity relationship triple is then passed through.This method passes through part-of-speech tagging, life
Entity relationship pair is extracted in name Entity recognition and the artificial matching that lays down a regulation.Multilingual opening imformation is extracted, Gamallo etc.
The entity relationship of English, Portuguese, Galician and Spanish is extracted using rule-based dependency analysis.
3. for the open entity relation extraction system of Chinese
For Chinese open entity relation extraction, mainly there are three systems:ZORE, UnCORE and CORE.ZORE distich
Son carries out dependency analysis, obtains interdependent analytic tree, then extracts sentence according to the dependence iteration between entity and relative
Entity triple.UnCORE systems by formulating the position restriction rule in sentence between entity between relationship deictic words,
Candidate relationship triple is extracted, information gain is then utilized to screen relationship deictic words, bond type sort method obtains each real
The relationship deictic words of body relationship type, is filtered candidate triple finally by relative and clause rule.CORE is first
Syntactic structure is analyzed using CKIP resolvers, then identifies that the center relationship in sentence indicates by " head-driven " criterion
Word finally combines dependence to find central entity word.
In conclusion existing open entity relation extraction method is difficult to handle the mixed and disorderly and redundancy property of microblog data,
So the present invention proposes the open entity relation extraction method based on sentence justice structural model.
Invention content
The purpose of the present invention is to alleviate existing method, to extract microblog data entity relationship to accuracy rate low, as a result redundancy
Problem proposes the open entity relationship based on sentence justice structural model to improve the comprehensive performance of open entity relation extraction
Abstracting method.
The present invention design principle be:The text for extracting microblog data first carries out subordinate sentence to text, segments, goes to deactivate
Word and part-of-speech tagging recycle dependency analysis tool, obtain interdependent syntax analytic tree;Followed by basic noun recognition rule
Determine candidate's argument, marriage relation word decimation rule and argument decimation rule obtain entity relationship triple, utilize confidence level meter
Rules Filtering entity relationship triple is calculated, candidate entity relationship pair is obtained;CSM calculating sentence similarities are then based on to obtain
Sim1, sentence similarity is calculated based on PV and obtains Sim2, carry out Similarity-Weighted and merge to obtain sentence similarity, and then obtain sentence
Sub- similarity matrix;Finally according to sentence similarity matrix and similarity threshold, similar sentence group is divided, in conjunction with sentence packet in group
The entity relationship contained is to corresponding confidence level, entity relationship pair in merging group.
The technical scheme is that be achieved by the steps of:
Step 1, microblog data is pre-processed.
Step 1.1, the text of microblog data is extracted.
Step 1.2, subordinate sentence carried out to the text of microblog data, segment, remove stop words and part-of-speech tagging.
Step 1.3, using dependency analysis tool, interdependent syntax analytic tree is obtained.
Step 2, candidate entity relationship pair is extracted.
Step 2.1, entity relationship is extracted in conjunction with base noun phrase rule, relative decimation rule and argument decimation rule
Triple.
Step 2.2, by confidence calculations rule, entity relationship triple is screened, generates entity relationship to waiting
Selected works.
Step 3, sentence similarity is calculated.
Step 3.1, sentence similarity is calculated based on CSM and obtains Sim1。
Step 3.2, sentence similarity is calculated based on PV and obtains Sim2。
Step 3.3, it carries out Similarity-Weighted to merge to obtain sentence similarity, and then obtains sentence similarity matrix.
Step 4, entity relationship is to merging.
Step 4.1, similar sentence group is divided according to sentence similarity matrix and similarity threshold.
Step 4.2, the entity relationship for including in conjunction with sentence in group is to corresponding confidence level, entity relationship in merging group
It is right, obtain final result.
Advantageous effect
Compared to existing open entity relation extraction system and method, the present invention, which is effectively relieved, extracts microblog number factually
The problem of body relationship pair, as a result accuracy rate is low and redundancy.
Description of the drawings
Fig. 1 is that the present invention is based on the schematic diagrams of the open entity relation extraction method of sentence justice structural model.
Fig. 2 is the schematic diagram of the open entity relation extraction preprocessing process based on sentence justice structural model.
Fig. 3 is interdependent example syntax figure.
Fig. 4 is the schematic diagram of PV-CSM sentence similarity computational methods.
Fig. 5 is the schematic diagram of the sentence similarity computational methods based on CSM models.
Fig. 6 is Paragraph Vector frames.
Fig. 7 is entity relationship to merging schematic diagram.
Specific implementation mode
Objects and advantages in order to better illustrate the present invention do the embodiment of the method for the present invention with reference to example
It is further described.
Detailed process is:
Step 1, microblog data is pre-processed.
Step 1.1, the html labels and noise symbol concentrated using canonical filtering microblog data, extract body matter, into
Row either traditional and simplified characters convert.
Step 1.2, to textual data carry out subordinate sentence, in conjunction with Harbin Institute of Technology language cloud LTP each sentence is segmented, word
Property mark and dependency analysis, and will include less than 4 effective words (including noun, verb, adjective, number, time word
Deng) text removal.
Step 1.3, interdependent syntactic analysis discloses it by the dependence between ingredient in linguistic unit in parsing sentence
Syntactic structure, the LTP dependency analysis tool analysis sentence " Democratic Party's Monday of budget committee of the White House provided using Harbin Institute of Technology
Dependence in publication report " between ingredient is shown in Fig. 3.Interdependent syntax mark relationship and meaning are shown in Table 1.
1. interdependent syntax of table marks relation table
Step 2, candidate entity relationship pair is extracted.
Step 2.1, base noun phrase is obtained according to part-of-speech tagging result and noun phrase decimation rule first;Then will
There are VOB (dynamic guest's relationship) or the verb in FOB (preposition object) dependence path to be considered as candidate relationship word in sentence;Then will
Ingredient in base noun phrase and candidate relationship word there are the argument as the verb of SBV (subject-predicate relationship), VOB, FOB,
Obtain the entity relationship pair of " SBV- relatives-VOB " and " SBV-FOB- relatives " two kinds of dependence paths.
Sentence with Negative Structure needs specially treated, for example, " Some University Students do not participate in party ", according to above-mentioned
Entity relationship obtains " e1 to decimation rule:Some University Students, e2:Party, r:Participate in " entity relationship pair, as a result incorrect, institute
To need consideration negative word, correct result that should be:“e1:Some University Students, e2:Party, r:It does not participate in ".
Negative word is identified by establishing a negative word set, for the negative word identified, is added into and is deposited therewith
In the relative of dependence path (ADV).Negative word includes:It is non-, do not have, nothing, or not prevent, do not have, being difficult to, forbidding, is difficult
With, forget, ignore, abandon, prevent, refuse, do not have almost, almost, unclear.
Step 2.2, the confidence level of computational entity relationship pair.When confidence level be more than threshold value when, corresponding entity relationship at
For candidate entity relationship pair;Conversely, entity relationship is to being rejected.
Selected feature and the weight after 200 mark language material training are shown in Table 2:
2. feature of table and respective weights
{ x in table1…x10Value meet situation duration described in feature be 1, otherwise value be 0.Distance in table and length
Degree all refers to the number of word, x11The computational methods of corresponding Dis weights are as shown in Equation 1.
Wherein e1、e2It is two arguments of entity relationship centering respectively, r is the relative of entity relationship centering, dis (e1,
e2) indicate distance of two arguments in sentence, i.e., the number of word between the two, dis (e1, r) and statement argument e1With relative r
Distance in sentence, dis (r, e2) indicate relative r and argument e2Distance in sentence.Binding characteristic weight and
It is as shown in Equation 2 to the computational methods of confidence level Confidence that sigmoid functions obtain entity relationship.
Wherein x is the parameter in table, and value is 0 or 1, and w is its corresponding weighted value.
Step 3, sentence similarity is calculated.
Step 3.1, the sentence similarity computational methods principle based on CSM models is shown in Fig. 5.For the semantic feature of short text
Sparse Problems, this method excavates potential thematic knowledge on the basis of sentence justice structural analysis, using LDA topic models, to single
The semantic feature of sentence is expanded, and is then carried out vectorial expression to sentence, is finally calculated sentence similarity.
Sentence justice is mainly divided into topic and states topic by CSM, and topic refers to main description object in sentence, and stating topic is then
Description to topic object.The different semantic roles undertaken in sentence justice according to word in sentence divide fundamental mesh and general lattice,
For example, " Xiao Ming has broken the window in classroom." " Xiao Ming " in one undertake the implementer of action, belong to " applying in fundamental mesh
Thing lattice ", and " classroom " undertakes the restriction effect to " window ", belongs to " range lattice " in general lattice.Based on sentence justice structural analysis
Word in sentence is divided into four classes by the method for as a result dividing word by its semantic role, including elementary item under topic, is stated under topic
General term and the lower general term of topic is stated under elementary item, topic.In conjunction with LDA analyses as a result, the semantic feature to sentence expands.
The input of LDA analysis modules is the set of four class words in text set, and output is this four classes word under multiple themes
The distribution of language.By in the distributed intelligence deposit knowledge base of word, sentence semantics feature is expanded in subsequent module knowledge based library.LDA
Topic model assumes to include multiple themes in text, and each theme corresponds to multiple words in text and obeys multinomial point in word set
Cloth, the word for belonging to a theme have potential semantic dependency.Therefore by in the maximum theme of the sentence degree of correlation
Top n word extends in sentence, can not introduce excessive noise while expanding semantic feature.
Expanding the semantic feature of sentence can be divided into based on topic ingredient and based on the expansion for stating topic ingredient.
Semantic feature based on topic ingredient expands process:First, all elementary items under topic are calculated in theme Pi
Under the sum of probability value;Then be all general terms in theme PiUnder the weighting of the sum of probability value, the results added with back,
It is specific as shown in Equation 3,
Wherein TmiIt is m-th of elementary item under sentence topic in theme PiUnder probability value, GniIt is n-th under sentence topic
General term is in theme PiUnder probability value;The highest theme of select probability value (i.e. with the maximum theme of the sentence degree of correlation) and corresponding
Top n word extend in short text;Finally, the sentence vector based on topic is built based on VSM, the weights of sentence original word are
Corresponding TF*IDF values, the weights for expanding word are 1.
The rest may be inferred expands process based on the semantic feature for stating topic ingredient, obtains based on the sentence vector for stating topic.
For the similarity calculation between sentence, it is utilized respectively between the two feature vectors calculating sentence after semantic feature expands
Cosine similarity, be then weighted addition, obtain the sentence similarity value Sim based on CSM-LDA1, circular
As shown in Equation 4.
Wherein, SAAnd SBIndicate arbitrary two sentences,WithIt indicates to obtain after sentence justice structural analysis respectively
Sentence topic vector,WithThen indicate that two stating for sentence inscribe vector, topic and the weighting coefficient ω for stating topic are usually set
It is set to 0.5.
Step 3.2, Paragraph Vector (PV) are a kind of unsupervised distributed vectorial representation methods, can handle and appoint
Meaning length, the typically other text data of sentence level and paragraph level, to obtain the excellent vector table for sentence and paragraph
Show.Similar Word2vec includes CBOW models and Skip-gram models, and PV includes two kinds of models of PV-DM and PV-DBOW.PV moulds
Type has newly added Paragraphid marks to each sentence or paragraph.
PV-DM models are made of input layer, projection layer and output layer three-layer neural network.When PV-DM trains sentence vector,
Paragraph id are considered as common word, are that its one vector of random generation is added in matrix D.It is random for the word in sentence
It generates term vector to be added in matrix W, the vector of sentence is as the dimension of term vector, but the two is not belonging to the same space.PV-DM
Term vector in model distich subvector and sentence carries out cumulative mean or head and the tail are connected to obtain input vector, then maximum
The probability of occurrence for changing target word carrys out training pattern.Sentence vector training comparison term vector training the difference is that:PV-DM
Hiding input codetermined by matrix W and D, and consider the semantic information of entire sentence in the training process.
Specific algorithm includes training and infers two stages, sees Fig. 6.
(1) training stage:Term vector matrix W is obtained by training, softmax weights U, b and the sentence that had occurred
Subvector D.The Paragraph id initialized in training process are unique and do not share, and term vector is total to by entire training corpus
It enjoys.Concentrate all words with the sliding window ergodic data of regular length, when window sliding update term vector matrix W and
The vector matrix D of Paragraph id is until training terminates.
(2) deduction phase:The new Paragraph id of target sentences one are first allocated to, the combined training stage obtains
PV model parameter W, U, b, optimize the vector of target sentences using gradient descent algorithm and BP algorithm, target sentences are made to exist
The maximum probability occurred under conditions present indicates after restraining to get the vector to sentence to be predicted.
It is indicated according to obtained sentence vector, calculates the cosine similarity Sim between sentence2。
Step 3.3, by Sim1And Sim2It is weighted summation according to formula 5, exports the similarity Sim (S between sentence1,
S2).It repeats above-mentioned sentence similarity computational methods and obtains the similarity that sentence concentrates all sentences mutual, generate sentence phase
Like degree matrix.
Sim(S1,S2)=α * Sim1+β*Sim2 (5)
Step 4, entity relationship is to merging.
Step 4.1, entity relationship is shown in Fig. 7 to merging schematic diagram.Sentence similarity square is obtained by sentence similarity module
Battle array, the sentence for similarity being more than threshold value are divided into one group.Sentence similarity matrix is divided into the specific steps of similar sentence group
It is as follows:
(1) one sentence S of selection is concentrated in sentence, which is added in similarity sentence group 1, deleted in sentence concentration
Except sentence S;
(2) line number is of the positioning S in sentence similarity matrix, is more than similarity on the i-th row of matrix 0.75 all sentences
Son is added in sentence group 1, and is concentrated in sentence and delete them;
(3) a sentence S2 is selected in remaining sentence at random, if sentence S2 and any sentence similarity in sentence group 1
More than 0.75, then S2 is added in sentence group 1, otherwise creates a similar sentence group, S2 is added, repeated (2);
(4) constantly iteration (3) obtains n similar sentence group until sentence collection is sky.
Step 4.2, by comparing the confidence level of the candidate entity relationship pair of each sentence in same group, to all times in organizing
Entity relationship is selected to being ranked up, takes the highest entity relationship that sorts to the candidate entity relationships pair of all sentences in replacement group,
Optimal entity relationship pair as the sentence group.
Test result:Open entity relation extraction method based on sentence justice structural model, in social text (2013
NLP&&CC meetings publication towards Chinese microblogging viewpoint element extract evaluation and test task language material is disclosed) on carry out open entity pass
It is the contrast experiment of abstracting method, control methods includes ZORE (2014) and CORE (2014).The present invention better than ZORE and
CORE realizes the effect for improving accuracy rate and de-redundancy, and the results are shown in Table 3, effectively realizes open entity relationship
It extracts.
3. comparative test result of table
Above-described specific descriptions have carried out further specifically the purpose, technical solution and advantageous effect of invention
It is bright, it should be understood that the above is only a specific embodiment of the present invention, the protection model being not intended to limit the present invention
It encloses, all within the spirits and principles of the present invention, any modification, equivalent substitution, improvement and etc. done should be included in the present invention
Protection domain within.
Claims (4)
1. the open entity relation extraction method based on sentence justice structural model, it is characterised in that the method includes walking as follows
Suddenly:
Step 1, microblog data is pre-processed, including:The text for extracting microblog data, divides the text of microblog data
Sentence segments, removes stop words and part-of-speech tagging, then utilizes dependency analysis tool, obtains interdependent syntax analytic tree;
Step 2, entity relationship ternary is extracted in conjunction with base noun phrase rule, relative decimation rule and argument decimation rule
Group screens entity relationship triple then by confidence calculations rule, generates entity relationship to Candidate Set;
Step 3, sentence similarity is calculated based on CSM and obtains Sim1, sentence similarity is calculated based on PV and obtains Sim2, then carry out
Similarity-Weighted merges to obtain sentence similarity, and then obtains sentence similarity matrix;
Step 4, similar sentence group is divided according to sentence similarity matrix and similarity threshold, includes then in conjunction with sentence in group
For entity relationship to corresponding confidence level, entity relationship pair in merging group obtains final result.
2. the open entity relation extraction method according to claim 1 based on sentence justice structural model, it is characterised in that:
When calculating the confidence level of entity relationship pair in step 2, selected feature includes:Relative among two arguments, two arguments are in relationship
Word side, ER are to there are the paths VOB, ER to there are the paths FOB, the distance between arguments and relative.
3. the open entity relation extraction method according to claim 1 based on sentence justice structural model, it is characterised in that:
When calculating the confidence level of entity relationship pair in step 2, the corresponding weight Dis calculating of feature the distance between " argument with relative "
Method is as shown in Equation 1:
Wherein e1、e2It is two arguments of entity relationship centering respectively, r is the relative of entity relationship centering, dis (e1,e2) table
Show distance of two arguments in sentence, i.e., the number of word between the two, dis (e1, r) and statement argument e1With relative r in sentence
In distance, dis (r, e2) indicate relative r and argument e2Distance in sentence.
4. the open entity relation extraction method according to claim 1 based on sentence justice structural model, it is characterised in that:
Sentence similarity is calculated based on CSM in step 3 and step 4 and obtains Sim1, sentence similarity is calculated based on PV and obtains Sim2, then
Carry out Similarity-Weighted to merge to obtain sentence similarity, and then obtain sentence similarity matrix, according to sentence similarity matrix and
Similarity threshold divides similar sentence group, and the entity relationship for including then in conjunction with sentence in group closes corresponding confidence level
And entity relationship pair in organizing, realize that redundancy drops in entity relationship result.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810234056.1A CN108363816A (en) | 2018-03-21 | 2018-03-21 | Open entity relation extraction method based on sentence justice structural model |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810234056.1A CN108363816A (en) | 2018-03-21 | 2018-03-21 | Open entity relation extraction method based on sentence justice structural model |
Publications (1)
Publication Number | Publication Date |
---|---|
CN108363816A true CN108363816A (en) | 2018-08-03 |
Family
ID=63000741
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810234056.1A Pending CN108363816A (en) | 2018-03-21 | 2018-03-21 | Open entity relation extraction method based on sentence justice structural model |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108363816A (en) |
Cited By (29)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109165385A (en) * | 2018-08-29 | 2019-01-08 | 中国人民解放军国防科技大学 | Multi-triple extraction method based on entity relationship joint extraction model |
CN109271626A (en) * | 2018-08-31 | 2019-01-25 | 北京工业大学 | Text semantic analysis method |
CN109325201A (en) * | 2018-08-15 | 2019-02-12 | 北京百度网讯科技有限公司 | Generation method, device, equipment and the storage medium of entity relationship data |
CN109359302A (en) * | 2018-10-26 | 2019-02-19 | 重庆大学 | A kind of optimization method of field term vector and fusion sort method based on it |
CN109376202A (en) * | 2018-10-30 | 2019-02-22 | 青岛理工大学 | NLP-based enterprise supply relationship automatic extraction and analysis method |
CN109408643A (en) * | 2018-09-03 | 2019-03-01 | 平安科技(深圳)有限公司 | Fund similarity calculating method, system, computer equipment and storage medium |
CN109460547A (en) * | 2018-09-19 | 2019-03-12 | 中国电子科技集团公司第二十八研究所 | A kind of structuring control order extracting method based on natural language processing |
CN109472032A (en) * | 2018-11-14 | 2019-03-15 | 北京锐安科技有限公司 | A kind of determination method, apparatus, server and the storage medium of entity relationship diagram |
CN109558584A (en) * | 2018-10-26 | 2019-04-02 | 平安科技(深圳)有限公司 | Business connection prediction technique, device, computer equipment and storage medium |
CN109582949A (en) * | 2018-09-14 | 2019-04-05 | 阿里巴巴集团控股有限公司 | Event element abstracting method, calculates equipment and storage medium at device |
CN109710932A (en) * | 2018-12-22 | 2019-05-03 | 北京工业大学 | A kind of medical bodies Relation extraction method based on Fusion Features |
CN109710759A (en) * | 2018-12-17 | 2019-05-03 | 北京百度网讯科技有限公司 | Text dividing method, device, computer equipment and readable storage medium storing program for executing |
CN109815497A (en) * | 2019-01-23 | 2019-05-28 | 四川易诚智讯科技有限公司 | Based on the interdependent character attribute abstracting method of syntax |
CN110188175A (en) * | 2019-04-29 | 2019-08-30 | 厦门快商通信息咨询有限公司 | A kind of question and answer based on BiLSTM-CRF model are to abstracting method, system and storage medium |
CN110287497A (en) * | 2019-07-03 | 2019-09-27 | 桂林电子科技大学 | A kind of coherent analysis method of the semantic structure of English text |
CN110837731A (en) * | 2019-10-12 | 2020-02-25 | 创新工场(广州)人工智能研究有限公司 | Word vector training method and device |
CN111160030A (en) * | 2019-12-11 | 2020-05-15 | 北京明略软件系统有限公司 | Information extraction method, device and storage medium |
CN111597812A (en) * | 2020-05-09 | 2020-08-28 | 北京合众鼎成科技有限公司 | Financial field multiple relation extraction method based on mask language model |
CN111639499A (en) * | 2020-06-01 | 2020-09-08 | 北京中科汇联科技股份有限公司 | Composite entity extraction method and system |
CN111651528A (en) * | 2020-05-11 | 2020-09-11 | 北京理工大学 | Open entity relation extraction method based on generative countermeasure network |
CN111914083A (en) * | 2019-05-10 | 2020-11-10 | 腾讯科技(深圳)有限公司 | Statement processing method, device and storage medium |
CN112084389A (en) * | 2020-08-17 | 2020-12-15 | 上海交通大学 | Network crawler-based academic institution geographical position information extraction method |
CN112269884A (en) * | 2020-11-13 | 2021-01-26 | 北京百度网讯科技有限公司 | Information extraction method, device, equipment and storage medium |
CN112417891A (en) * | 2020-11-29 | 2021-02-26 | 中国科学院电子学研究所苏州研究院 | Text relation automatic labeling method based on open type information extraction |
US11308283B2 (en) | 2020-01-30 | 2022-04-19 | International Business Machines Corporation | Lightweight tagging for disjoint entities |
CN114548103A (en) * | 2020-11-25 | 2022-05-27 | 马上消费金融股份有限公司 | Training method of named entity recognition model and recognition method of named entity |
CN115391569A (en) * | 2022-10-27 | 2022-11-25 | 哈尔滨工业大学(深圳)(哈尔滨工业大学深圳科技创新研究院) | Method for automatically constructing industry chain map from research report and related equipment |
CN116127079A (en) * | 2023-04-20 | 2023-05-16 | 中电科大数据研究院有限公司 | Text classification method |
CN116467430A (en) * | 2023-05-08 | 2023-07-21 | 北京科技大学 | Material preparation processing technology information text mining method and system |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104933027A (en) * | 2015-06-12 | 2015-09-23 | 华东师范大学 | Open Chinese entity relation extraction method using dependency analysis |
CN105138507A (en) * | 2015-08-06 | 2015-12-09 | 电子科技大学 | Pattern self-learning based Chinese open relationship extraction method |
CN106445920A (en) * | 2016-09-29 | 2017-02-22 | 北京理工大学 | Sentence similarity calculation method based on sentence meaning structure characteristics |
CN106484675A (en) * | 2016-09-29 | 2017-03-08 | 北京理工大学 | Fusion distributed semantic and the character relation abstracting method of sentence justice feature |
-
2018
- 2018-03-21 CN CN201810234056.1A patent/CN108363816A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104933027A (en) * | 2015-06-12 | 2015-09-23 | 华东师范大学 | Open Chinese entity relation extraction method using dependency analysis |
CN105138507A (en) * | 2015-08-06 | 2015-12-09 | 电子科技大学 | Pattern self-learning based Chinese open relationship extraction method |
CN106445920A (en) * | 2016-09-29 | 2017-02-22 | 北京理工大学 | Sentence similarity calculation method based on sentence meaning structure characteristics |
CN106484675A (en) * | 2016-09-29 | 2017-03-08 | 北京理工大学 | Fusion distributed semantic and the character relation abstracting method of sentence justice feature |
Non-Patent Citations (1)
Title |
---|
林萌等: "《融合句义结构模型的微博话题摘要算法》", 《浙江大学学报(工学版)》 * |
Cited By (46)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109325201A (en) * | 2018-08-15 | 2019-02-12 | 北京百度网讯科技有限公司 | Generation method, device, equipment and the storage medium of entity relationship data |
US11321421B2 (en) | 2018-08-15 | 2022-05-03 | Beijing Baidu Netcom Science And Technology Co., Ltd. | Method, apparatus and device for generating entity relationship data, and storage medium |
CN109165385B (en) * | 2018-08-29 | 2022-08-09 | 中国人民解放军国防科技大学 | Multi-triple extraction method based on entity relationship joint extraction model |
CN109165385A (en) * | 2018-08-29 | 2019-01-08 | 中国人民解放军国防科技大学 | Multi-triple extraction method based on entity relationship joint extraction model |
CN109271626B (en) * | 2018-08-31 | 2023-09-26 | 北京工业大学 | Text semantic analysis method |
CN109271626A (en) * | 2018-08-31 | 2019-01-25 | 北京工业大学 | Text semantic analysis method |
CN109408643A (en) * | 2018-09-03 | 2019-03-01 | 平安科技(深圳)有限公司 | Fund similarity calculating method, system, computer equipment and storage medium |
CN109408643B (en) * | 2018-09-03 | 2023-05-30 | 平安科技(深圳)有限公司 | Fund similarity calculation method, system, computer equipment and storage medium |
CN109582949A (en) * | 2018-09-14 | 2019-04-05 | 阿里巴巴集团控股有限公司 | Event element abstracting method, calculates equipment and storage medium at device |
CN109460547A (en) * | 2018-09-19 | 2019-03-12 | 中国电子科技集团公司第二十八研究所 | A kind of structuring control order extracting method based on natural language processing |
CN109460547B (en) * | 2018-09-19 | 2023-03-28 | 中国电子科技集团公司第二十八研究所 | Structured control instruction extraction method based on natural language processing |
CN109558584A (en) * | 2018-10-26 | 2019-04-02 | 平安科技(深圳)有限公司 | Business connection prediction technique, device, computer equipment and storage medium |
CN109359302A (en) * | 2018-10-26 | 2019-02-19 | 重庆大学 | A kind of optimization method of field term vector and fusion sort method based on it |
CN109376202B (en) * | 2018-10-30 | 2021-08-03 | 青岛理工大学 | NLP-based enterprise supply relationship automatic extraction and analysis method |
CN109376202A (en) * | 2018-10-30 | 2019-02-22 | 青岛理工大学 | NLP-based enterprise supply relationship automatic extraction and analysis method |
CN109472032A (en) * | 2018-11-14 | 2019-03-15 | 北京锐安科技有限公司 | A kind of determination method, apparatus, server and the storage medium of entity relationship diagram |
CN109710759A (en) * | 2018-12-17 | 2019-05-03 | 北京百度网讯科技有限公司 | Text dividing method, device, computer equipment and readable storage medium storing program for executing |
CN109710932A (en) * | 2018-12-22 | 2019-05-03 | 北京工业大学 | A kind of medical bodies Relation extraction method based on Fusion Features |
CN109815497B (en) * | 2019-01-23 | 2023-04-18 | 四川易诚智讯科技有限公司 | Character attribute extraction method based on syntactic dependency |
CN109815497A (en) * | 2019-01-23 | 2019-05-28 | 四川易诚智讯科技有限公司 | Based on the interdependent character attribute abstracting method of syntax |
CN110188175A (en) * | 2019-04-29 | 2019-08-30 | 厦门快商通信息咨询有限公司 | A kind of question and answer based on BiLSTM-CRF model are to abstracting method, system and storage medium |
CN111914083A (en) * | 2019-05-10 | 2020-11-10 | 腾讯科技(深圳)有限公司 | Statement processing method, device and storage medium |
CN111914083B (en) * | 2019-05-10 | 2024-07-09 | 腾讯科技(深圳)有限公司 | Statement processing method, device and storage medium |
CN110287497A (en) * | 2019-07-03 | 2019-09-27 | 桂林电子科技大学 | A kind of coherent analysis method of the semantic structure of English text |
CN110287497B (en) * | 2019-07-03 | 2023-03-31 | 桂林电子科技大学 | Semantic structure coherent analysis method for English text |
CN110837731A (en) * | 2019-10-12 | 2020-02-25 | 创新工场(广州)人工智能研究有限公司 | Word vector training method and device |
CN111160030A (en) * | 2019-12-11 | 2020-05-15 | 北京明略软件系统有限公司 | Information extraction method, device and storage medium |
CN111160030B (en) * | 2019-12-11 | 2023-09-19 | 北京明略软件系统有限公司 | Information extraction method, device and storage medium |
US11308283B2 (en) | 2020-01-30 | 2022-04-19 | International Business Machines Corporation | Lightweight tagging for disjoint entities |
CN111597812A (en) * | 2020-05-09 | 2020-08-28 | 北京合众鼎成科技有限公司 | Financial field multiple relation extraction method based on mask language model |
CN111651528A (en) * | 2020-05-11 | 2020-09-11 | 北京理工大学 | Open entity relation extraction method based on generative countermeasure network |
CN111639499A (en) * | 2020-06-01 | 2020-09-08 | 北京中科汇联科技股份有限公司 | Composite entity extraction method and system |
CN111639499B (en) * | 2020-06-01 | 2023-06-16 | 北京中科汇联科技股份有限公司 | Composite entity extraction method and system |
CN112084389A (en) * | 2020-08-17 | 2020-12-15 | 上海交通大学 | Network crawler-based academic institution geographical position information extraction method |
CN112269884A (en) * | 2020-11-13 | 2021-01-26 | 北京百度网讯科技有限公司 | Information extraction method, device, equipment and storage medium |
CN112269884B (en) * | 2020-11-13 | 2024-03-05 | 北京百度网讯科技有限公司 | Information extraction method, device, equipment and storage medium |
CN114548103A (en) * | 2020-11-25 | 2022-05-27 | 马上消费金融股份有限公司 | Training method of named entity recognition model and recognition method of named entity |
CN114548103B (en) * | 2020-11-25 | 2024-03-29 | 马上消费金融股份有限公司 | Named entity recognition model training method and named entity recognition method |
CN112417891A (en) * | 2020-11-29 | 2021-02-26 | 中国科学院电子学研究所苏州研究院 | Text relation automatic labeling method based on open type information extraction |
CN112417891B (en) * | 2020-11-29 | 2023-08-22 | 中国科学院电子学研究所苏州研究院 | Text relation automatic labeling method based on open type information extraction |
CN115391569A (en) * | 2022-10-27 | 2022-11-25 | 哈尔滨工业大学(深圳)(哈尔滨工业大学深圳科技创新研究院) | Method for automatically constructing industry chain map from research report and related equipment |
CN115391569B (en) * | 2022-10-27 | 2023-03-24 | 哈尔滨工业大学(深圳)(哈尔滨工业大学深圳科技创新研究院) | Method for automatically constructing industry chain map from research report and related equipment |
CN116127079B (en) * | 2023-04-20 | 2023-06-20 | 中电科大数据研究院有限公司 | Text classification method |
CN116127079A (en) * | 2023-04-20 | 2023-05-16 | 中电科大数据研究院有限公司 | Text classification method |
CN116467430B (en) * | 2023-05-08 | 2023-09-19 | 北京科技大学 | Material preparation processing technology information text mining method and system |
CN116467430A (en) * | 2023-05-08 | 2023-07-21 | 北京科技大学 | Material preparation processing technology information text mining method and system |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108363816A (en) | Open entity relation extraction method based on sentence justice structural model | |
CN110866117B (en) | Short text classification method based on semantic enhancement and multi-level label embedding | |
CN108595632B (en) | Hybrid neural network text classification method fusing abstract and main body characteristics | |
CN109783818B (en) | Enterprise industry classification method | |
CN111027595B (en) | Double-stage semantic word vector generation method | |
CN110532328B (en) | Text concept graph construction method | |
CN108874878A (en) | A kind of building system and method for knowledge mapping | |
CN107273913B (en) | Short text similarity calculation method based on multi-feature fusion | |
CN106776562A (en) | A kind of keyword extracting method and extraction system | |
CN108920482B (en) | Microblog short text classification method based on lexical chain feature extension and LDA (latent Dirichlet Allocation) model | |
Huang et al. | A topic BiLSTM model for sentiment classification | |
Jayawardana et al. | Semi-supervised instance population of an ontology using word vector embedding | |
CN111753058A (en) | Text viewpoint mining method and system | |
CN111695358A (en) | Method and device for generating word vector, computer storage medium and electronic equipment | |
CN114997288A (en) | Design resource association method | |
El-Alami et al. | An efficient method based on deep learning approach for Arabic text categorization | |
Wankhede et al. | Data preprocessing for efficient sentimental analysis | |
CN114254645A (en) | Artificial intelligence auxiliary writing system | |
El Moubtahij et al. | AraBERT transformer model for Arabic comments and reviews analysis | |
Andrews et al. | Robust entity clustering via phylogenetic inference | |
Li et al. | Recursive graphical neural networks for text classification | |
CN114036938B (en) | News classification method for extracting text features by combining topic information and word vectors | |
Fahrni et al. | HITS'Monolingual and Cross-lingual Entity Linking System at TAC 2013. | |
Kuila et al. | A Neural Network based Event Extraction System for Indian Languages. | |
Walia et al. | Case based interpretation model for word sense disambiguation in Gurmukhi |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20180803 |
|
WD01 | Invention patent application deemed withdrawn after publication |