CN108363816A - Open entity relation extraction method based on sentence justice structural model - Google Patents

Open entity relation extraction method based on sentence justice structural model Download PDF

Info

Publication number
CN108363816A
CN108363816A CN201810234056.1A CN201810234056A CN108363816A CN 108363816 A CN108363816 A CN 108363816A CN 201810234056 A CN201810234056 A CN 201810234056A CN 108363816 A CN108363816 A CN 108363816A
Authority
CN
China
Prior art keywords
sentence
entity relationship
similarity
entity
group
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201810234056.1A
Other languages
Chinese (zh)
Inventor
罗森林
尹继泽
潘丽敏
郭佳
吴舟婷
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Institute of Technology BIT
Original Assignee
Beijing Institute of Technology BIT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Institute of Technology BIT filed Critical Beijing Institute of Technology BIT
Priority to CN201810234056.1A priority Critical patent/CN108363816A/en
Publication of CN108363816A publication Critical patent/CN108363816A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/211Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates

Abstract

The present invention relates to the open entity relation extraction methods based on sentence justice structural model, belong to computer and information science technical field.The present invention extracts the text of microblog data first, carries out subordinate sentence, segments, removes stop words and part-of-speech tagging, recycles dependency analysis tool, obtain interdependent syntax analytic tree;Candidate argument is determined secondly by basic noun recognition rule, and marriage relation word decimation rule and argument decimation rule obtain entity relationship triple, using confidence calculations Rules Filtering entity relationship triple, obtain candidate entity relationship pair;It is then based on CSM calculating sentence similarities and obtains Sim1, sentence similarity is calculated based on PV and obtains Sim2, and carry out Similarity-Weighted and merge to obtain sentence similarity, to obtain sentence similarity matrix;Finally by the sentence similarity matrix of generation, according to similarity threshold, similar sentence group is divided, and combines sentence includes in group entity relationship to corresponding confidence level, entity relationship pair in merging group.The present invention tests on NLP&&CC microbloggings evaluation and test language material, the results showed that confidence level and divides similar sentence group by computational entity relationship, entity relationship pair in merging group improves accuracy rate and recall rate, achieved the effect that de-redundancy.

Description

Open entity relation extraction method based on sentence justice structural model
Technical field
The present invention relates to the open entity relation extraction methods based on sentence justice structural model, belong to computer and Information Center Learn technical field.
Background technology
Open entity relation extraction technology from mixed and disorderly unordered network data, can extract unlimited classification entity, Entity relationship forms structured information output.The extraction result that mixed and disorderly redundancy property in order to solve microblog data causes is accurate The problem of true rate is low and redundancy, the characteristics of needing redundancy mixed and disorderly for microblogging, study open entity relation extraction technology.Therefore, The present invention improves system in the micro- of mixed and disorderly redundancy by the open entity relation extraction method based on sentence justice structural model is provided The ability of entity relationship is extracted in rich data.
Open entity relation extraction method based on sentence justice structural model needs the basic problem solved to be:From mixed and disorderly nothing In the network data of sequence, entity, the entity relationship of unlimited classification are extracted, forms structured information output.Take a broad view of existing open Formula entity relation extraction system and method are put, are specifically included following several:
1.TextRunner systems and WOE systems
TextRunner is first open information extraction system, is trained by features such as part of speech and base noun phrases Model-naive Bayesian extracts the relationship between entity.Further work shows to be modeled according to text sequence characteristic information Grader can obtain better effect, such as linear conditions random field and Markov Logic Network.WOE systems are by Wiki hundred For section's data as training set, TextRunner systems can effectively be promoted using the dependence in data by being experimentally confirmed Ability.TextRunner and WOE systems belong to first identification name entity, then the method for extracting entity relationship.
The rule-based method of 2.ReVerb and Gamallo et al.
ReVerb determines a relationship phrase centered on verb first, is taken out in conjunction with semantic rules and syntax rule constraint Entity relationship triple is taken, position constraint rule extraction entity relationship triple is then passed through.This method passes through part-of-speech tagging, life Entity relationship pair is extracted in name Entity recognition and the artificial matching that lays down a regulation.Multilingual opening imformation is extracted, Gamallo etc. The entity relationship of English, Portuguese, Galician and Spanish is extracted using rule-based dependency analysis.
3. for the open entity relation extraction system of Chinese
For Chinese open entity relation extraction, mainly there are three systems:ZORE, UnCORE and CORE.ZORE distich Son carries out dependency analysis, obtains interdependent analytic tree, then extracts sentence according to the dependence iteration between entity and relative Entity triple.UnCORE systems by formulating the position restriction rule in sentence between entity between relationship deictic words, Candidate relationship triple is extracted, information gain is then utilized to screen relationship deictic words, bond type sort method obtains each real The relationship deictic words of body relationship type, is filtered candidate triple finally by relative and clause rule.CORE is first Syntactic structure is analyzed using CKIP resolvers, then identifies that the center relationship in sentence indicates by " head-driven " criterion Word finally combines dependence to find central entity word.
In conclusion existing open entity relation extraction method is difficult to handle the mixed and disorderly and redundancy property of microblog data, So the present invention proposes the open entity relation extraction method based on sentence justice structural model.
Invention content
The purpose of the present invention is to alleviate existing method, to extract microblog data entity relationship to accuracy rate low, as a result redundancy Problem proposes the open entity relationship based on sentence justice structural model to improve the comprehensive performance of open entity relation extraction Abstracting method.
The present invention design principle be:The text for extracting microblog data first carries out subordinate sentence to text, segments, goes to deactivate Word and part-of-speech tagging recycle dependency analysis tool, obtain interdependent syntax analytic tree;Followed by basic noun recognition rule Determine candidate's argument, marriage relation word decimation rule and argument decimation rule obtain entity relationship triple, utilize confidence level meter Rules Filtering entity relationship triple is calculated, candidate entity relationship pair is obtained;CSM calculating sentence similarities are then based on to obtain Sim1, sentence similarity is calculated based on PV and obtains Sim2, carry out Similarity-Weighted and merge to obtain sentence similarity, and then obtain sentence Sub- similarity matrix;Finally according to sentence similarity matrix and similarity threshold, similar sentence group is divided, in conjunction with sentence packet in group The entity relationship contained is to corresponding confidence level, entity relationship pair in merging group.
The technical scheme is that be achieved by the steps of:
Step 1, microblog data is pre-processed.
Step 1.1, the text of microblog data is extracted.
Step 1.2, subordinate sentence carried out to the text of microblog data, segment, remove stop words and part-of-speech tagging.
Step 1.3, using dependency analysis tool, interdependent syntax analytic tree is obtained.
Step 2, candidate entity relationship pair is extracted.
Step 2.1, entity relationship is extracted in conjunction with base noun phrase rule, relative decimation rule and argument decimation rule Triple.
Step 2.2, by confidence calculations rule, entity relationship triple is screened, generates entity relationship to waiting Selected works.
Step 3, sentence similarity is calculated.
Step 3.1, sentence similarity is calculated based on CSM and obtains Sim1
Step 3.2, sentence similarity is calculated based on PV and obtains Sim2
Step 3.3, it carries out Similarity-Weighted to merge to obtain sentence similarity, and then obtains sentence similarity matrix.
Step 4, entity relationship is to merging.
Step 4.1, similar sentence group is divided according to sentence similarity matrix and similarity threshold.
Step 4.2, the entity relationship for including in conjunction with sentence in group is to corresponding confidence level, entity relationship in merging group It is right, obtain final result.
Advantageous effect
Compared to existing open entity relation extraction system and method, the present invention, which is effectively relieved, extracts microblog number factually The problem of body relationship pair, as a result accuracy rate is low and redundancy.
Description of the drawings
Fig. 1 is that the present invention is based on the schematic diagrams of the open entity relation extraction method of sentence justice structural model.
Fig. 2 is the schematic diagram of the open entity relation extraction preprocessing process based on sentence justice structural model.
Fig. 3 is interdependent example syntax figure.
Fig. 4 is the schematic diagram of PV-CSM sentence similarity computational methods.
Fig. 5 is the schematic diagram of the sentence similarity computational methods based on CSM models.
Fig. 6 is Paragraph Vector frames.
Fig. 7 is entity relationship to merging schematic diagram.
Specific implementation mode
Objects and advantages in order to better illustrate the present invention do the embodiment of the method for the present invention with reference to example It is further described.
Detailed process is:
Step 1, microblog data is pre-processed.
Step 1.1, the html labels and noise symbol concentrated using canonical filtering microblog data, extract body matter, into Row either traditional and simplified characters convert.
Step 1.2, to textual data carry out subordinate sentence, in conjunction with Harbin Institute of Technology language cloud LTP each sentence is segmented, word Property mark and dependency analysis, and will include less than 4 effective words (including noun, verb, adjective, number, time word Deng) text removal.
Step 1.3, interdependent syntactic analysis discloses it by the dependence between ingredient in linguistic unit in parsing sentence Syntactic structure, the LTP dependency analysis tool analysis sentence " Democratic Party's Monday of budget committee of the White House provided using Harbin Institute of Technology Dependence in publication report " between ingredient is shown in Fig. 3.Interdependent syntax mark relationship and meaning are shown in Table 1.
1. interdependent syntax of table marks relation table
Step 2, candidate entity relationship pair is extracted.
Step 2.1, base noun phrase is obtained according to part-of-speech tagging result and noun phrase decimation rule first;Then will There are VOB (dynamic guest's relationship) or the verb in FOB (preposition object) dependence path to be considered as candidate relationship word in sentence;Then will Ingredient in base noun phrase and candidate relationship word there are the argument as the verb of SBV (subject-predicate relationship), VOB, FOB, Obtain the entity relationship pair of " SBV- relatives-VOB " and " SBV-FOB- relatives " two kinds of dependence paths.
Sentence with Negative Structure needs specially treated, for example, " Some University Students do not participate in party ", according to above-mentioned Entity relationship obtains " e1 to decimation rule:Some University Students, e2:Party, r:Participate in " entity relationship pair, as a result incorrect, institute To need consideration negative word, correct result that should be:“e1:Some University Students, e2:Party, r:It does not participate in ".
Negative word is identified by establishing a negative word set, for the negative word identified, is added into and is deposited therewith In the relative of dependence path (ADV).Negative word includes:It is non-, do not have, nothing, or not prevent, do not have, being difficult to, forbidding, is difficult With, forget, ignore, abandon, prevent, refuse, do not have almost, almost, unclear.
Step 2.2, the confidence level of computational entity relationship pair.When confidence level be more than threshold value when, corresponding entity relationship at For candidate entity relationship pair;Conversely, entity relationship is to being rejected.
Selected feature and the weight after 200 mark language material training are shown in Table 2:
2. feature of table and respective weights
{ x in table1…x10Value meet situation duration described in feature be 1, otherwise value be 0.Distance in table and length Degree all refers to the number of word, x11The computational methods of corresponding Dis weights are as shown in Equation 1.
Wherein e1、e2It is two arguments of entity relationship centering respectively, r is the relative of entity relationship centering, dis (e1, e2) indicate distance of two arguments in sentence, i.e., the number of word between the two, dis (e1, r) and statement argument e1With relative r Distance in sentence, dis (r, e2) indicate relative r and argument e2Distance in sentence.Binding characteristic weight and It is as shown in Equation 2 to the computational methods of confidence level Confidence that sigmoid functions obtain entity relationship.
Wherein x is the parameter in table, and value is 0 or 1, and w is its corresponding weighted value.
Step 3, sentence similarity is calculated.
Step 3.1, the sentence similarity computational methods principle based on CSM models is shown in Fig. 5.For the semantic feature of short text Sparse Problems, this method excavates potential thematic knowledge on the basis of sentence justice structural analysis, using LDA topic models, to single The semantic feature of sentence is expanded, and is then carried out vectorial expression to sentence, is finally calculated sentence similarity.
Sentence justice is mainly divided into topic and states topic by CSM, and topic refers to main description object in sentence, and stating topic is then Description to topic object.The different semantic roles undertaken in sentence justice according to word in sentence divide fundamental mesh and general lattice, For example, " Xiao Ming has broken the window in classroom." " Xiao Ming " in one undertake the implementer of action, belong to " applying in fundamental mesh Thing lattice ", and " classroom " undertakes the restriction effect to " window ", belongs to " range lattice " in general lattice.Based on sentence justice structural analysis Word in sentence is divided into four classes by the method for as a result dividing word by its semantic role, including elementary item under topic, is stated under topic General term and the lower general term of topic is stated under elementary item, topic.In conjunction with LDA analyses as a result, the semantic feature to sentence expands.
The input of LDA analysis modules is the set of four class words in text set, and output is this four classes word under multiple themes The distribution of language.By in the distributed intelligence deposit knowledge base of word, sentence semantics feature is expanded in subsequent module knowledge based library.LDA Topic model assumes to include multiple themes in text, and each theme corresponds to multiple words in text and obeys multinomial point in word set Cloth, the word for belonging to a theme have potential semantic dependency.Therefore by in the maximum theme of the sentence degree of correlation Top n word extends in sentence, can not introduce excessive noise while expanding semantic feature.
Expanding the semantic feature of sentence can be divided into based on topic ingredient and based on the expansion for stating topic ingredient.
Semantic feature based on topic ingredient expands process:First, all elementary items under topic are calculated in theme Pi Under the sum of probability value;Then be all general terms in theme PiUnder the weighting of the sum of probability value, the results added with back, It is specific as shown in Equation 3,
Wherein TmiIt is m-th of elementary item under sentence topic in theme PiUnder probability value, GniIt is n-th under sentence topic General term is in theme PiUnder probability value;The highest theme of select probability value (i.e. with the maximum theme of the sentence degree of correlation) and corresponding Top n word extend in short text;Finally, the sentence vector based on topic is built based on VSM, the weights of sentence original word are Corresponding TF*IDF values, the weights for expanding word are 1.
The rest may be inferred expands process based on the semantic feature for stating topic ingredient, obtains based on the sentence vector for stating topic.
For the similarity calculation between sentence, it is utilized respectively between the two feature vectors calculating sentence after semantic feature expands Cosine similarity, be then weighted addition, obtain the sentence similarity value Sim based on CSM-LDA1, circular As shown in Equation 4.
Wherein, SAAnd SBIndicate arbitrary two sentences,WithIt indicates to obtain after sentence justice structural analysis respectively Sentence topic vector,WithThen indicate that two stating for sentence inscribe vector, topic and the weighting coefficient ω for stating topic are usually set It is set to 0.5.
Step 3.2, Paragraph Vector (PV) are a kind of unsupervised distributed vectorial representation methods, can handle and appoint Meaning length, the typically other text data of sentence level and paragraph level, to obtain the excellent vector table for sentence and paragraph Show.Similar Word2vec includes CBOW models and Skip-gram models, and PV includes two kinds of models of PV-DM and PV-DBOW.PV moulds Type has newly added Paragraphid marks to each sentence or paragraph.
PV-DM models are made of input layer, projection layer and output layer three-layer neural network.When PV-DM trains sentence vector, Paragraph id are considered as common word, are that its one vector of random generation is added in matrix D.It is random for the word in sentence It generates term vector to be added in matrix W, the vector of sentence is as the dimension of term vector, but the two is not belonging to the same space.PV-DM Term vector in model distich subvector and sentence carries out cumulative mean or head and the tail are connected to obtain input vector, then maximum The probability of occurrence for changing target word carrys out training pattern.Sentence vector training comparison term vector training the difference is that:PV-DM Hiding input codetermined by matrix W and D, and consider the semantic information of entire sentence in the training process.
Specific algorithm includes training and infers two stages, sees Fig. 6.
(1) training stage:Term vector matrix W is obtained by training, softmax weights U, b and the sentence that had occurred Subvector D.The Paragraph id initialized in training process are unique and do not share, and term vector is total to by entire training corpus It enjoys.Concentrate all words with the sliding window ergodic data of regular length, when window sliding update term vector matrix W and The vector matrix D of Paragraph id is until training terminates.
(2) deduction phase:The new Paragraph id of target sentences one are first allocated to, the combined training stage obtains PV model parameter W, U, b, optimize the vector of target sentences using gradient descent algorithm and BP algorithm, target sentences are made to exist The maximum probability occurred under conditions present indicates after restraining to get the vector to sentence to be predicted.
It is indicated according to obtained sentence vector, calculates the cosine similarity Sim between sentence2
Step 3.3, by Sim1And Sim2It is weighted summation according to formula 5, exports the similarity Sim (S between sentence1, S2).It repeats above-mentioned sentence similarity computational methods and obtains the similarity that sentence concentrates all sentences mutual, generate sentence phase Like degree matrix.
Sim(S1,S2)=α * Sim1+β*Sim2 (5)
Step 4, entity relationship is to merging.
Step 4.1, entity relationship is shown in Fig. 7 to merging schematic diagram.Sentence similarity square is obtained by sentence similarity module Battle array, the sentence for similarity being more than threshold value are divided into one group.Sentence similarity matrix is divided into the specific steps of similar sentence group It is as follows:
(1) one sentence S of selection is concentrated in sentence, which is added in similarity sentence group 1, deleted in sentence concentration Except sentence S;
(2) line number is of the positioning S in sentence similarity matrix, is more than similarity on the i-th row of matrix 0.75 all sentences Son is added in sentence group 1, and is concentrated in sentence and delete them;
(3) a sentence S2 is selected in remaining sentence at random, if sentence S2 and any sentence similarity in sentence group 1 More than 0.75, then S2 is added in sentence group 1, otherwise creates a similar sentence group, S2 is added, repeated (2);
(4) constantly iteration (3) obtains n similar sentence group until sentence collection is sky.
Step 4.2, by comparing the confidence level of the candidate entity relationship pair of each sentence in same group, to all times in organizing Entity relationship is selected to being ranked up, takes the highest entity relationship that sorts to the candidate entity relationships pair of all sentences in replacement group, Optimal entity relationship pair as the sentence group.
Test result:Open entity relation extraction method based on sentence justice structural model, in social text (2013 NLP&&CC meetings publication towards Chinese microblogging viewpoint element extract evaluation and test task language material is disclosed) on carry out open entity pass It is the contrast experiment of abstracting method, control methods includes ZORE (2014) and CORE (2014).The present invention better than ZORE and CORE realizes the effect for improving accuracy rate and de-redundancy, and the results are shown in Table 3, effectively realizes open entity relationship It extracts.
3. comparative test result of table
Above-described specific descriptions have carried out further specifically the purpose, technical solution and advantageous effect of invention It is bright, it should be understood that the above is only a specific embodiment of the present invention, the protection model being not intended to limit the present invention It encloses, all within the spirits and principles of the present invention, any modification, equivalent substitution, improvement and etc. done should be included in the present invention Protection domain within.

Claims (4)

1. the open entity relation extraction method based on sentence justice structural model, it is characterised in that the method includes walking as follows Suddenly:
Step 1, microblog data is pre-processed, including:The text for extracting microblog data, divides the text of microblog data Sentence segments, removes stop words and part-of-speech tagging, then utilizes dependency analysis tool, obtains interdependent syntax analytic tree;
Step 2, entity relationship ternary is extracted in conjunction with base noun phrase rule, relative decimation rule and argument decimation rule Group screens entity relationship triple then by confidence calculations rule, generates entity relationship to Candidate Set;
Step 3, sentence similarity is calculated based on CSM and obtains Sim1, sentence similarity is calculated based on PV and obtains Sim2, then carry out Similarity-Weighted merges to obtain sentence similarity, and then obtains sentence similarity matrix;
Step 4, similar sentence group is divided according to sentence similarity matrix and similarity threshold, includes then in conjunction with sentence in group For entity relationship to corresponding confidence level, entity relationship pair in merging group obtains final result.
2. the open entity relation extraction method according to claim 1 based on sentence justice structural model, it is characterised in that: When calculating the confidence level of entity relationship pair in step 2, selected feature includes:Relative among two arguments, two arguments are in relationship Word side, ER are to there are the paths VOB, ER to there are the paths FOB, the distance between arguments and relative.
3. the open entity relation extraction method according to claim 1 based on sentence justice structural model, it is characterised in that: When calculating the confidence level of entity relationship pair in step 2, the corresponding weight Dis calculating of feature the distance between " argument with relative " Method is as shown in Equation 1:
Wherein e1、e2It is two arguments of entity relationship centering respectively, r is the relative of entity relationship centering, dis (e1,e2) table Show distance of two arguments in sentence, i.e., the number of word between the two, dis (e1, r) and statement argument e1With relative r in sentence In distance, dis (r, e2) indicate relative r and argument e2Distance in sentence.
4. the open entity relation extraction method according to claim 1 based on sentence justice structural model, it is characterised in that: Sentence similarity is calculated based on CSM in step 3 and step 4 and obtains Sim1, sentence similarity is calculated based on PV and obtains Sim2, then Carry out Similarity-Weighted to merge to obtain sentence similarity, and then obtain sentence similarity matrix, according to sentence similarity matrix and Similarity threshold divides similar sentence group, and the entity relationship for including then in conjunction with sentence in group closes corresponding confidence level And entity relationship pair in organizing, realize that redundancy drops in entity relationship result.
CN201810234056.1A 2018-03-21 2018-03-21 Open entity relation extraction method based on sentence justice structural model Pending CN108363816A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810234056.1A CN108363816A (en) 2018-03-21 2018-03-21 Open entity relation extraction method based on sentence justice structural model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810234056.1A CN108363816A (en) 2018-03-21 2018-03-21 Open entity relation extraction method based on sentence justice structural model

Publications (1)

Publication Number Publication Date
CN108363816A true CN108363816A (en) 2018-08-03

Family

ID=63000741

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810234056.1A Pending CN108363816A (en) 2018-03-21 2018-03-21 Open entity relation extraction method based on sentence justice structural model

Country Status (1)

Country Link
CN (1) CN108363816A (en)

Cited By (29)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109165385A (en) * 2018-08-29 2019-01-08 中国人民解放军国防科技大学 Multi-triple extraction method based on entity relationship joint extraction model
CN109271626A (en) * 2018-08-31 2019-01-25 北京工业大学 Text semantic analysis method
CN109325201A (en) * 2018-08-15 2019-02-12 北京百度网讯科技有限公司 Generation method, device, equipment and the storage medium of entity relationship data
CN109359302A (en) * 2018-10-26 2019-02-19 重庆大学 A kind of optimization method of field term vector and fusion sort method based on it
CN109376202A (en) * 2018-10-30 2019-02-22 青岛理工大学 A kind of supply relationship based on NLP extracts analysis method automatically
CN109408643A (en) * 2018-09-03 2019-03-01 平安科技(深圳)有限公司 Fund similarity calculating method, system, computer equipment and storage medium
CN109460547A (en) * 2018-09-19 2019-03-12 中国电子科技集团公司第二十八研究所 A kind of structuring control order extracting method based on natural language processing
CN109472032A (en) * 2018-11-14 2019-03-15 北京锐安科技有限公司 A kind of determination method, apparatus, server and the storage medium of entity relationship diagram
CN109558584A (en) * 2018-10-26 2019-04-02 平安科技(深圳)有限公司 Business connection prediction technique, device, computer equipment and storage medium
CN109582949A (en) * 2018-09-14 2019-04-05 阿里巴巴集团控股有限公司 Event element abstracting method, calculates equipment and storage medium at device
CN109710932A (en) * 2018-12-22 2019-05-03 北京工业大学 A kind of medical bodies Relation extraction method based on Fusion Features
CN109710759A (en) * 2018-12-17 2019-05-03 北京百度网讯科技有限公司 Text dividing method, device, computer equipment and readable storage medium storing program for executing
CN109815497A (en) * 2019-01-23 2019-05-28 四川易诚智讯科技有限公司 Based on the interdependent character attribute abstracting method of syntax
CN110188175A (en) * 2019-04-29 2019-08-30 厦门快商通信息咨询有限公司 A kind of question and answer based on BiLSTM-CRF model are to abstracting method, system and storage medium
CN110287497A (en) * 2019-07-03 2019-09-27 桂林电子科技大学 A kind of coherent analysis method of the semantic structure of English text
CN110837731A (en) * 2019-10-12 2020-02-25 创新工场(广州)人工智能研究有限公司 Word vector training method and device
CN111160030A (en) * 2019-12-11 2020-05-15 北京明略软件系统有限公司 Information extraction method, device and storage medium
CN111597812A (en) * 2020-05-09 2020-08-28 北京合众鼎成科技有限公司 Financial field multiple relation extraction method based on mask language model
CN111639499A (en) * 2020-06-01 2020-09-08 北京中科汇联科技股份有限公司 Composite entity extraction method and system
CN111651528A (en) * 2020-05-11 2020-09-11 北京理工大学 Open entity relation extraction method based on generative countermeasure network
CN111914083A (en) * 2019-05-10 2020-11-10 腾讯科技(深圳)有限公司 Statement processing method, device and storage medium
CN112084389A (en) * 2020-08-17 2020-12-15 上海交通大学 Network crawler-based academic institution geographical position information extraction method
CN112269884A (en) * 2020-11-13 2021-01-26 北京百度网讯科技有限公司 Information extraction method, device, equipment and storage medium
CN112417891A (en) * 2020-11-29 2021-02-26 中国科学院电子学研究所苏州研究院 Text relation automatic labeling method based on open type information extraction
US11308283B2 (en) 2020-01-30 2022-04-19 International Business Machines Corporation Lightweight tagging for disjoint entities
CN114548103A (en) * 2020-11-25 2022-05-27 马上消费金融股份有限公司 Training method of named entity recognition model and recognition method of named entity
CN115391569A (en) * 2022-10-27 2022-11-25 哈尔滨工业大学(深圳)(哈尔滨工业大学深圳科技创新研究院) Method for automatically constructing industry chain map from research report and related equipment
CN116127079A (en) * 2023-04-20 2023-05-16 中电科大数据研究院有限公司 Text classification method
CN116467430A (en) * 2023-05-08 2023-07-21 北京科技大学 Material preparation processing technology information text mining method and system

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104933027A (en) * 2015-06-12 2015-09-23 华东师范大学 Open Chinese entity relation extraction method using dependency analysis
CN105138507A (en) * 2015-08-06 2015-12-09 电子科技大学 Pattern self-learning based Chinese open relationship extraction method
CN106445920A (en) * 2016-09-29 2017-02-22 北京理工大学 Sentence similarity calculation method based on sentence meaning structure characteristics
CN106484675A (en) * 2016-09-29 2017-03-08 北京理工大学 Fusion distributed semantic and the character relation abstracting method of sentence justice feature

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104933027A (en) * 2015-06-12 2015-09-23 华东师范大学 Open Chinese entity relation extraction method using dependency analysis
CN105138507A (en) * 2015-08-06 2015-12-09 电子科技大学 Pattern self-learning based Chinese open relationship extraction method
CN106445920A (en) * 2016-09-29 2017-02-22 北京理工大学 Sentence similarity calculation method based on sentence meaning structure characteristics
CN106484675A (en) * 2016-09-29 2017-03-08 北京理工大学 Fusion distributed semantic and the character relation abstracting method of sentence justice feature

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
林萌等: "《融合句义结构模型的微博话题摘要算法》", 《浙江大学学报(工学版)》 *

Cited By (45)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109325201A (en) * 2018-08-15 2019-02-12 北京百度网讯科技有限公司 Generation method, device, equipment and the storage medium of entity relationship data
US11321421B2 (en) 2018-08-15 2022-05-03 Beijing Baidu Netcom Science And Technology Co., Ltd. Method, apparatus and device for generating entity relationship data, and storage medium
CN109165385B (en) * 2018-08-29 2022-08-09 中国人民解放军国防科技大学 Multi-triple extraction method based on entity relationship joint extraction model
CN109165385A (en) * 2018-08-29 2019-01-08 中国人民解放军国防科技大学 Multi-triple extraction method based on entity relationship joint extraction model
CN109271626B (en) * 2018-08-31 2023-09-26 北京工业大学 Text semantic analysis method
CN109271626A (en) * 2018-08-31 2019-01-25 北京工业大学 Text semantic analysis method
CN109408643A (en) * 2018-09-03 2019-03-01 平安科技(深圳)有限公司 Fund similarity calculating method, system, computer equipment and storage medium
CN109408643B (en) * 2018-09-03 2023-05-30 平安科技(深圳)有限公司 Fund similarity calculation method, system, computer equipment and storage medium
CN109582949A (en) * 2018-09-14 2019-04-05 阿里巴巴集团控股有限公司 Event element abstracting method, calculates equipment and storage medium at device
CN109460547A (en) * 2018-09-19 2019-03-12 中国电子科技集团公司第二十八研究所 A kind of structuring control order extracting method based on natural language processing
CN109460547B (en) * 2018-09-19 2023-03-28 中国电子科技集团公司第二十八研究所 Structured control instruction extraction method based on natural language processing
CN109558584A (en) * 2018-10-26 2019-04-02 平安科技(深圳)有限公司 Business connection prediction technique, device, computer equipment and storage medium
CN109359302A (en) * 2018-10-26 2019-02-19 重庆大学 A kind of optimization method of field term vector and fusion sort method based on it
CN109376202B (en) * 2018-10-30 2021-08-03 青岛理工大学 NLP-based enterprise supply relationship automatic extraction and analysis method
CN109376202A (en) * 2018-10-30 2019-02-22 青岛理工大学 A kind of supply relationship based on NLP extracts analysis method automatically
CN109472032A (en) * 2018-11-14 2019-03-15 北京锐安科技有限公司 A kind of determination method, apparatus, server and the storage medium of entity relationship diagram
CN109710759A (en) * 2018-12-17 2019-05-03 北京百度网讯科技有限公司 Text dividing method, device, computer equipment and readable storage medium storing program for executing
CN109710932A (en) * 2018-12-22 2019-05-03 北京工业大学 A kind of medical bodies Relation extraction method based on Fusion Features
CN109815497B (en) * 2019-01-23 2023-04-18 四川易诚智讯科技有限公司 Character attribute extraction method based on syntactic dependency
CN109815497A (en) * 2019-01-23 2019-05-28 四川易诚智讯科技有限公司 Based on the interdependent character attribute abstracting method of syntax
CN110188175A (en) * 2019-04-29 2019-08-30 厦门快商通信息咨询有限公司 A kind of question and answer based on BiLSTM-CRF model are to abstracting method, system and storage medium
CN111914083A (en) * 2019-05-10 2020-11-10 腾讯科技(深圳)有限公司 Statement processing method, device and storage medium
CN110287497A (en) * 2019-07-03 2019-09-27 桂林电子科技大学 A kind of coherent analysis method of the semantic structure of English text
CN110287497B (en) * 2019-07-03 2023-03-31 桂林电子科技大学 Semantic structure coherent analysis method for English text
CN110837731A (en) * 2019-10-12 2020-02-25 创新工场(广州)人工智能研究有限公司 Word vector training method and device
CN111160030A (en) * 2019-12-11 2020-05-15 北京明略软件系统有限公司 Information extraction method, device and storage medium
CN111160030B (en) * 2019-12-11 2023-09-19 北京明略软件系统有限公司 Information extraction method, device and storage medium
US11308283B2 (en) 2020-01-30 2022-04-19 International Business Machines Corporation Lightweight tagging for disjoint entities
CN111597812A (en) * 2020-05-09 2020-08-28 北京合众鼎成科技有限公司 Financial field multiple relation extraction method based on mask language model
CN111651528A (en) * 2020-05-11 2020-09-11 北京理工大学 Open entity relation extraction method based on generative countermeasure network
CN111639499B (en) * 2020-06-01 2023-06-16 北京中科汇联科技股份有限公司 Composite entity extraction method and system
CN111639499A (en) * 2020-06-01 2020-09-08 北京中科汇联科技股份有限公司 Composite entity extraction method and system
CN112084389A (en) * 2020-08-17 2020-12-15 上海交通大学 Network crawler-based academic institution geographical position information extraction method
CN112269884A (en) * 2020-11-13 2021-01-26 北京百度网讯科技有限公司 Information extraction method, device, equipment and storage medium
CN112269884B (en) * 2020-11-13 2024-03-05 北京百度网讯科技有限公司 Information extraction method, device, equipment and storage medium
CN114548103B (en) * 2020-11-25 2024-03-29 马上消费金融股份有限公司 Named entity recognition model training method and named entity recognition method
CN114548103A (en) * 2020-11-25 2022-05-27 马上消费金融股份有限公司 Training method of named entity recognition model and recognition method of named entity
CN112417891A (en) * 2020-11-29 2021-02-26 中国科学院电子学研究所苏州研究院 Text relation automatic labeling method based on open type information extraction
CN112417891B (en) * 2020-11-29 2023-08-22 中国科学院电子学研究所苏州研究院 Text relation automatic labeling method based on open type information extraction
CN115391569A (en) * 2022-10-27 2022-11-25 哈尔滨工业大学(深圳)(哈尔滨工业大学深圳科技创新研究院) Method for automatically constructing industry chain map from research report and related equipment
CN115391569B (en) * 2022-10-27 2023-03-24 哈尔滨工业大学(深圳)(哈尔滨工业大学深圳科技创新研究院) Method for automatically constructing industry chain map from research report and related equipment
CN116127079B (en) * 2023-04-20 2023-06-20 中电科大数据研究院有限公司 Text classification method
CN116127079A (en) * 2023-04-20 2023-05-16 中电科大数据研究院有限公司 Text classification method
CN116467430B (en) * 2023-05-08 2023-09-19 北京科技大学 Material preparation processing technology information text mining method and system
CN116467430A (en) * 2023-05-08 2023-07-21 北京科技大学 Material preparation processing technology information text mining method and system

Similar Documents

Publication Publication Date Title
CN108363816A (en) Open entity relation extraction method based on sentence justice structural model
CN110866117B (en) Short text classification method based on semantic enhancement and multi-level label embedding
CN107992597B (en) Text structuring method for power grid fault case
CN107273913B (en) Short text similarity calculation method based on multi-feature fusion
CN110532328B (en) Text concept graph construction method
CN106776562A (en) A kind of keyword extracting method and extraction system
Suleiman et al. The use of hidden Markov model in natural ARABIC language processing: a survey
CN111027595A (en) Double-stage semantic word vector generation method
Huang et al. A topic BiLSTM model for sentiment classification
CN108920482B (en) Microblog short text classification method based on lexical chain feature extension and LDA (latent Dirichlet Allocation) model
Jayawardana et al. Semi-supervised instance population of an ontology using word vector embedding
CN109815400A (en) Personage's interest extracting method based on long text
CN111753058A (en) Text viewpoint mining method and system
CN110232127A (en) File classification method and device
CN111695358A (en) Method and device for generating word vector, computer storage medium and electronic equipment
Sun et al. Multi-channel CNN based inner-attention for compound sentence relation classification
CN114997288A (en) Design resource association method
CN114254645A (en) Artificial intelligence auxiliary writing system
Andrews et al. Robust entity clustering via phylogenetic inference
El Moubtahij et al. AraBERT transformer model for Arabic comments and reviews analysis
Wankhede et al. Data preprocessing for efficient sentimental analysis
Fahrni et al. HITS'Monolingual and Cross-lingual Entity Linking System at TAC 2013.
Li et al. Recursive graphical neural networks for text classification
Walia et al. Case based interpretation model for word sense disambiguation in Gurmukhi
Kuila et al. A Neural Network based Event Extraction System for Indian Languages.

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20180803

WD01 Invention patent application deemed withdrawn after publication