CN104199972B - A kind of name entity relation extraction and construction method based on deep learning - Google Patents

A kind of name entity relation extraction and construction method based on deep learning Download PDF

Info

Publication number
CN104199972B
CN104199972B CN201410488047.7A CN201410488047A CN104199972B CN 104199972 B CN104199972 B CN 104199972B CN 201410488047 A CN201410488047 A CN 201410488047A CN 104199972 B CN104199972 B CN 104199972B
Authority
CN
China
Prior art keywords
word
entity
news data
relationship
news
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201410488047.7A
Other languages
Chinese (zh)
Other versions
CN104199972A (en
Inventor
袁伟
邓攀
闫碧莹
赵鑫
李玉成
余雷
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhong kjia speed (Beijing) Information Technology Co., Ltd.
Original Assignee
Zhong Kjia Speed (beijing) Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhong Kjia Speed (beijing) Information Technology Co Ltd filed Critical Zhong Kjia Speed (beijing) Information Technology Co Ltd
Priority to CN201410488047.7A priority Critical patent/CN104199972B/en
Publication of CN104199972A publication Critical patent/CN104199972A/en
Application granted granted Critical
Publication of CN104199972B publication Critical patent/CN104199972B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/211Schema design and management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/285Clustering or classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/288Entity relationship models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The name entity relation that the present invention provides a kind of based on deep learning extracts and construction method, is used for technical field of Internet information.This method is directed to a certain specific area, and the news data on Vertical Website in crawl field pre-processes the news data of acquisition;News data segments, extracting keywords, generates industry dictionary, is segmented again to news data using industry dictionary;Extract seed dictionary;Unsupervised structure entity relationship network, the sentence for including more than two entities is extracted from news data, extracts the verb in sentence and corresponding document, and the term clustering model based on deep learning is established to the document of extraction, according to the relationship between the word of verb description, entity relationship network is built;It defines entity relationship classification and relationship classification is carried out to each entity pair in entity relationship network.The present invention, which is not required to put into extensive manpower, is marked sample data, and the dependence of corpus is low, and the performance for extracting entity relationship is high.

Description

A kind of name entity relation extraction and construction method based on deep learning
Technical field
The present invention relates to technical field of Internet information, a kind of method extracted in particular to name entity relation.
Background technology
In information research field, information extraction technique is an essential key technology.In face of the letter of such magnanimity Space is ceased, how faster and more accurately to extract the interested content of user is a problem in the urgent need to address, and letter Cease an important research direction of digging technology.Information extraction is different from the information processing technologies such as information retrieval, it is needed to text Originally it is named the identification of entity, and extracts the relationship between entity, and the flexible and changeable of word, word-building are multiple in Chinese text It is miscellaneous and do not indicate significantly so that the extraction of identification and relationship to Chinese name entity just seems more difficult.
Currently, there are two types of the main methods of information extraction, one is knowledge based library algorithm, this method needs to establish one A little rules, although the accuracy rate of this method is higher, the determination of this rule is relatively difficult, is had to author higher Requirement, and transplantability is not high;Another kind is the machine learning algorithm based on statistics, and this algorithm uses different models, and Learnt using the training set manually marked, then its relevant probability is calculated using model for new data set, and with this To obtain final result.This method cost is smaller, and performance is higher, convenient for transplanting, so being the hot spot of current research.
The relevant entity relation extraction technology of machine learning has mainly taken supervision entity relation extraction method and Weakly supervised Entity relation extraction method.There is the flow of supervision entity relation extraction method to be generally:Training text is pre-processed, relationship is carried out The handmarking of word pair and relationship, extraction feature vectorization are trained generation model with sorting algorithm, relationship are carried out with model Category label.Weakly supervised entity relation extraction method and having is pair in place of the main difference of supervision entity relation extraction method Mark the degree of dependence of language material.The a small amount of mark corpus of Weakly supervised entity relation extraction method, utilizes bootstrapping (self study) frame carries out entity relation extraction in conjunction with various sorting algorithms.
Weakly supervised entity relation extraction method is because use small-scale tagged corpus, performance poor.And there is supervision Entity relation extraction method relies on extensive tagged corpus, and this part need of work is according to task situation, manually into rower Note.Need to expend huge manpower and materials, use various algorithm training patterns on this basis, to the performance of the model of generation without Method accurately estimates that there are greater risks.
Invention content
The present invention indexes the acquisition of data set, mould to solve specific area present in existing entity relation extraction technology The acquisition of formula and coreference resolution problem provide a kind of name entity relation based on deep learning and extract and construction method.
Name entity relation provided by the invention based on deep learning extracts and construction method, for a certain specific neck Domain includes the following steps:
Step 1:Build crawlers, the news data on Vertical Website in crawl field;
Step 2:The news data of acquisition is pre-processed, junk information, including duplicate message, abnormal display are removed Information, coding mess code information etc.;
Step 3:News data is segmented, extracting keywords, dictionary is added in the keyword extracted, generates industry Dictionary;
Step 4:Chinese word segmentation is carried out again to news data using industry dictionary, obtains corresponding set of words;
Step 5:Seed dictionary is extracted, seed is the entity pair of setting;
Step 6:Unsupervised structure entity relationship network, specifically:It is extracted from news data comprising more than two entities Sentence extracts the verb in sentence and corresponding document;Entity term clustering based on deep learning is established to the document of extraction Model obtains probability distribution of the entity word on other words;According to the relationship between the word of verb description, entity relationship diagram is built Network;
Step 7:Entity relationship classification is defined, specifically:It is extracted from news data dynamic in the sentence comprising two entities Word clusters verb, and identical verb is classified as same class;
Step 8:Classify to entity relationship, specifically:To each entity pair in entity relationship network, it is based on step 7 cluster result carries out relationship classification.
Compared with the existing technology, name entity relation of the invention extracts and construction method, advantage and good effect exist In:
1. using unsupervised entity relation extraction, extensive manpower need not be put into, sample data is marked;
2. the dependence for corpus is low, using news information in common field as text, pumping is improved Take the performance of entity relationship;
3. the present invention is named entity by field and extracts, gibberish interferes between reducing different field, and it is accurate to extract result True rate is high.
Description of the drawings
Fig. 1 is the overview flow chart of specific area name entity extraction and construction method according to the ... of the embodiment of the present invention;
Fig. 2 is the overview flow chart of specific area industry keyword abstraction method according to the ... of the embodiment of the present invention;
Fig. 3 is the flow chart that specific area name entity according to the ... of the embodiment of the present invention extracts;
Fig. 4 is the flow chart that domain-specific relation template according to the ... of the embodiment of the present invention extracts.
Specific implementation mode
Below in conjunction with drawings and examples, the present invention is described in further detail.
In the embodiment of the present invention, the name based on deep learning of the present invention is illustrated in conjunction with this specific area of automobile Entity relation extraction and construction method.Including:The text collection of Automotive News is segmented;Based on self study bootstrap Method entity is extracted from the obtained cutting unit of participle to (automobile brand, automobile model), therefrom select a small amount of example and make For initial seed set;Method based on bootstrap extracts relationship templates from entity;And pass through depth learning technology, structure Relationship between entity is built, carrying out clustering/classification to relationship templates obtains relationship classification.
As shown in Figure 1, specific area according to the ... of the embodiment of the present invention, the present invention is based on the extractions of the name entity of deep learning With construction method, including it is as follows:
Step 1:Crawlers are built, capture the news data of Vertical Website, it includes automobile that the present invention, which implements mainly to use, Family, Pacific Ocean automobile data.Specific steps 1 are divided for following steps 101~102.
Step 101:Distributed reptile program is built, page crawl is carried out to Vertical Website data.
Step 102:The dom tree constructions that the page is generated according to the html pages grabbed, the page is climbed to according to tag extraction Middle contained text information.
Step 2:The news data of acquisition is pre-processed.Specific steps 2 are divided for step 201~202.
Step 201:It is cleaned according to news length, rubbish news is removed using regular expression and the rule set of formulation Information.
Step 202:News data is filtered using Bloom filter (Bloom filter), removal repeats news letter Breath.First then N number of hash values are calculated to subsequent comment using in N number of hash Function Mappings to bit array to news data, Judge whether the news data has existed.If the subsequent calculated hash values of comment are present in bit array, illustrate The comment data has existed, and filters this out.
Step 3:Extracting keywords form new industry dictionary.The present invention utilizes N-gram model extraction keywords, by institute The keyword of extraction, which is added, has basic dictionary, generates new industry dictionary.
Different from the Latin languages such as English, Chinese language text does not have the apparent separator such as space, therefore is carrying out Chinese language First step work seeks to carry out word segmentation to text when present treatment.It, will also be to carrying out word due to the needs of information extraction The later text of cutting is labeled.The present invention carries out Chinese word segmentation using ICTCLAS, and is excavated by keyword digging technology Automobile industry dictionary improves the precision of word segmentation.Keyword in the present invention further includes not only information content including discrimination, more lays particular stress on letter Breath amount.
Solidification degree PMI defined in the embodiment of the present invention is as follows:
PMI (a, b)=p (ab)/p (a) p (b)
PMI value is the solidification degree of word a and word b composition keywords ab, carrys out extracting keywords with this, wherein p (a) indicates word The frequency that a occurs, p (b) indicate that the frequency that word b occurs, p (ab) indicate the frequency that ab occurs.PMI has the shortcomings that a typical case:Incline To extracting the lower word of frequency, therefore the present invention specifically select word frequency more than the word of certain threshold value as candidate word when implementing, Remove the lower word of frequency.Using solidification degree defined herein come extracting keywords relative to other existing methods, by experiment Proof can remove more noise.PMI value also referred to as point mutual information (Pointwise Mutual Information) value.
Specific steps 3 are divided for step 301~step 305, as shown in Figure 2.
Step 301:Chinese word segmentation program is called, news data is tentatively segmented.
Step 302:Using 1-gram, the PMI value of word is calculated, PIM values is chosen and is more than the word of threshold value A as keyword.
Step 303:Using 2-gram, the PMI value of word is calculated, chooses word of the PMI value more than threshold value B as keyword.
Step 304:Using 3-gram, the PMI value of word is calculated, chooses word of the PMI value more than threshold value C as keyword.
Step 305:The keyword that step 302~step 304 is obtained and original dictionary merge, as what is segmented again Dictionary.
Threshold value A, B and C can be determined according to experiment.
Step 4:The industry dictionary obtained using step 3 carries out Chinese word segmentation processing again to news data, obtains and corresponds to Set of words.This step carries out Chinese word segmentation to all comment datas, removes stop words, obtains word segmentation result.
Step 4 includes step 401~step 402.
Step 401:It is segmented first, calls Chinese word segmentation program participle;Then, it is removed and is deactivated according to deactivated vocabulary Word carries out morphological transformation to English words wherein included, is transformed into unified expression-form.
After text is segmented and marked, text is expressed as a string of set of words being marked.At these There are many deactivated word in word.They are nonsensical to information extraction.In the present invention by a deactivated vocabulary by these Word is deactivated to reject.On the one hand the calculation amount of system can be reduced by doing so, on the other hand can improve in information extraction below Accuracy rate.When removing stop words, calculating sequence is simply carried out according to word frequency and document frequency, removes the highest word of word frequency.
Step 402:The document frequency df and word frequency tf for counting word, are calculated the reverse document-frequency idf of word, use meter The weights that formula log (tf* (idf+1)+1) calculates word are calculated, and comparison is carried out according to weight threshold D and carries out word set screening, will be weighed Word of the value more than threshold value D retains, to which extraction obtains to embody the set of words of news features, while by after threshold comparison Also the dimension of the corresponding set of words of news data is suitably reduced.
Step 5:Manual manufacture automobile brand and automotive type seed dictionary, bootstrap excavate automobile brand and vehicle Dictionary.
Word segmentation and mark are being carried out to text collection, filtered after deactivating word, in order to improve information extraction The range of extraction is limited to a suitable range by accuracy rate.It has to find out and occurs two name entities in same sentence To sentence.Find out the name entity pair in set contextual window.Entity abbreviation entity will be named below, name entity pair Abbreviation entity pair.Entity in the present invention to for<Automobile brand, automotive type>.In the embodiment of the present invention, automobile brand is one A entity, as soon as both automotive type is an entity, and the entity mentioned below refers to.
In order to realize the automatic extraction of relationship between entity, it is necessary to realize and provide certain relationship seed set.It can lead to Artificial method is crossed, a small amount of relationship seed set is provided.Due to manually merely providing a small amount of relationship seed set, for letter For breath extracts, this is inadequate.Pass through the extension of automatically trained method bootstrap implementation relation seeds.
Since the relationship between entity pair can be judged by the context between them.Above and below same or similar Two group objects of text are to same or analogous relationship.Context vector between computational entity pair and relationship seed can be passed through Similarity as the similarity between them.
This step includes step 501 and step 502, as shown in Figure 3.
Step 501:It is artificial to choose automobile brand and corresponding automotive type.A quantity of seeds is provided and seed extracts mould Plate, each seed are an entity pair.Particular number can be arranged as required to.Seed extraction template is for example:Such as (certain automobile product Board) publication (certain automotive type).
Step 502:Entity pair is excavated by bootstrap methods.By being closed between bootstrap method automatic mining entities System, can be continuously available seed extraction template, and seed is extracted according to seed extraction template again iteration.
Automobile brand is extracted in the embodiment of the present invention and the pseudocode of vehicle is as follows:
Step 6:Unsupervised structure entity relationship network, including step 601~step 604.It identifies first in each sentence Entity.For each sentence, the result being labeled to sentence is used.Then entity is built to the entity identified It is right, then carry out relationship classification.
Step 601:Extract it is all include two and more than two entity sentence, extract verb therein and corresponding Document.
Step 602:The verb extracted in step 601 is normalized and denoising, verb is corresponded into centrifugal pump 0 With 1, while removing and wherein repeating or meaningless verb.
Step 603:Entity word based on deep learning (Deep Learning) is established to the document extracted in step 601 Clustering Model obtains probability distribution of the entity word on other words.
Step 604:According to relationships such as the relationships between word, such as subject-predicate, dynamic guest, entity relationship network, the network are built Including the relationship between the entity of all verb descriptions extracted.The step of building entity relationship network is as shown in Figure 4.
The pseudocode that entity relationship network is built in the embodiment of the present invention is as follows:
The embodiment of the present invention builds word2vec models using deep learning, and point of word is obtained using word2vec models Cloth calculates the similitude between word according to the distribution of word, to realize the cluster of word.
Step 7:Define entity relationship classification.The verb in article is extracted, such as " purchase ", " cooperation ", " publication " is closed It is classification.Step 7 includes step 701~702.
Step 701:To pretreated news data in step 2 extract it is all include two entities sentence in it is dynamic Word.
Step 702:Verb is clustered again, obtains the classification of relationship, identical verb is classified as same class.
It is as follows to the pseudocode of entity relationship classification in the embodiment of the present invention:
Extract articles that contain more than 1entity extract the text of more than one entity Shelves;
Get all Verb between two entities obtain the verb between entity there are two institutes;
Using LDA cluster Verbs the verb obtained above is clustered using LDA Subject Clusterings model;
Get relation type as cluster result using the type of verb as cluster result.
Step 8:Classify to entity relationship.To each entity in entity relationship network to the cluster based on step 7 As a result relationship classification is carried out.An entity corresponds to a feature to relationship in entity relationship network, by extraction feature, based on step The rule that rapid 7 cluster is formed carries out relationship classification.
The entity relationship network obtained by step 6, the entity sets that can included, the embodiment of the present invention are automobile product Board set N and automotive type set O, to arbitrary n ∈ N, o ∈ O, structure entity is to (n, o).Due to only consider automobile brand with The relationship of automotive type, therefore when entity is to structure, automobile brand is placed above the other things always, automotive type is placed on second Position.And the sequence that they occur in sentence is then taken into account as feature in model learning and classification.For example, in sentence In " Toyota release trendy rav4 ", if identifying automobile brand " Toyota ", automotive type " rav4 ", then N={ Toyota }, O= { rav4 } then obtains entity to { (Toyota, rav4) }.

Claims (1)

1. a kind of name entity relation based on deep learning extracts and construction method, for a certain specific area, feature exists In including the following steps:
Step 1:Build crawlers, the news data on Vertical Website in crawl field;
Step 2:The news data of acquisition is pre-processed, junk information, including duplicate message, abnormal display information are removed With coding mess code information;Pretreated news data is used for below step;
Step 201:It is cleaned according to news length, is believed using regular expression and the rule set of formulation removal rubbish news Breath;
Step 202:News data is filtered using Bloom filter Bloom filter, removal repeats news information;It is first First then N number of hash values are calculated to subsequent comment using in N number of hash Function Mappings to bit array to news data, judged Whether the news data has existed;If the subsequent calculated hash values of comment are present in bit array, illustrate that this is commented It has existed, and filters this out by data;
Step 3:News data is segmented, extracting keywords, dictionary is added in the keyword of extraction, generates industry dictionary;
When extracting keywords, is segmented using N-gram models, N=1,2,3, calculate the point mutual information PMI value of word, the threshold with setting Value compares, and will be greater than the word of threshold value as keyword;
PMI value PMI (a, b)=p (ab)/p (a) p (b) of word a and word b, wherein p (a) indicates the frequency that word a occurs, p (b) tables Show that the frequency that word b occurs, p (ab) indicate the frequency that ab occurs;
Step 4:Chinese word segmentation is carried out to news data using industry dictionary, obtains corresponding set of words;
Step 401:It is segmented first, calls Chinese word segmentation program participle;Then, stop words is removed according to deactivated vocabulary, it is right English words wherein included carry out morphological transformation, are transformed into unified expression-form;
Step 402:The document frequency df and word frequency tf for counting word, are calculated the reverse document-frequency idf of word, public using calculating Formula log (tf* (idf+1)+1) calculates the weights of word, and carries out word set screening according to weights and threshold value D comparisons, and extraction weights are big In the word of threshold value D, corresponding set of words is obtained, while passing through threshold comparison, reduce the dimension of the corresponding set of words of news data Degree;
Step 5:Seed dictionary is extracted, seed is the entity pair of setting;Manual manufacture a quantity of seeds first, then utilizes Bootstrap methods excavate entity pair from news data;
Step 6:Unsupervised structure entity relationship network, specifically:The sentence for including more than two entities is extracted from news data Son extracts the verb in sentence and corresponding document;Entity term clustering mould based on deep learning is established to the document of extraction Type obtains probability distribution of the entity word on other words;According to the relationship between the word of verb description, entity relationship diagram is built Network;
Step 7:Entity relationship classification is defined, specifically:The verb in the sentence comprising two entities is extracted from news data, it is right Verb is clustered, and identical verb is classified as same class;
Step 8:To each entity pair in entity relationship network, the cluster result based on step 7 carries out relationship classification.
CN201410488047.7A 2013-09-22 2014-09-22 A kind of name entity relation extraction and construction method based on deep learning Active CN104199972B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410488047.7A CN104199972B (en) 2013-09-22 2014-09-22 A kind of name entity relation extraction and construction method based on deep learning

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
CN2013104319134 2013-09-22
CN201310431913.4 2013-09-22
CN201310431913 2013-09-22
CN201410488047.7A CN104199972B (en) 2013-09-22 2014-09-22 A kind of name entity relation extraction and construction method based on deep learning

Publications (2)

Publication Number Publication Date
CN104199972A CN104199972A (en) 2014-12-10
CN104199972B true CN104199972B (en) 2018-08-03

Family

ID=52085265

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410488047.7A Active CN104199972B (en) 2013-09-22 2014-09-22 A kind of name entity relation extraction and construction method based on deep learning

Country Status (1)

Country Link
CN (1) CN104199972B (en)

Families Citing this family (49)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104933164B (en) * 2015-06-26 2018-10-09 华南理工大学 In internet mass data name entity between relationship extracting method and its system
CN105260457B (en) * 2015-10-14 2018-07-13 南京大学 A kind of multi-semantic meaning network entity contrast table automatic generation method towards coreference resolution
CN105389470A (en) * 2015-11-18 2016-03-09 福建工程学院 Method for automatically extracting Traditional Chinese Medicine acupuncture entity relationship
CN105468583A (en) * 2015-12-09 2016-04-06 百度在线网络技术(北京)有限公司 Entity relationship obtaining method and device
CN105894088B (en) * 2016-03-25 2018-06-29 苏州赫博特医疗信息科技有限公司 Based on deep learning and distributed semantic feature medical information extraction system and method
CN105938495A (en) * 2016-04-29 2016-09-14 乐视控股(北京)有限公司 Entity relationship recognition method and apparatus
US11288573B2 (en) * 2016-05-05 2022-03-29 Baidu Usa Llc Method and system for training and neural network models for large number of discrete features for information rertieval
CN106021223B (en) * 2016-05-09 2020-06-23 Tcl科技集团股份有限公司 Sentence similarity calculation method and system
CN106372122B (en) * 2016-08-23 2018-04-10 温州大学瓯江学院 A kind of Document Classification Method and system based on Wiki semantic matches
CN108205524B (en) * 2016-12-20 2022-01-07 北京京东尚科信息技术有限公司 Text data processing method and device
CN108268431B (en) * 2016-12-30 2019-12-03 北京国双科技有限公司 The method and apparatus of paragraph vectorization
CN106897545B (en) * 2017-01-05 2019-04-30 浙江大学 A kind of tumor prognosis forecasting system based on depth confidence network
CN108334520A (en) * 2017-01-19 2018-07-27 北京京东尚科信息技术有限公司 social network data processing method, device, storage medium and electronic equipment
US10922606B2 (en) 2017-06-13 2021-02-16 International Business Machines Corporation Multi-directional reduction in large scale deep-learning
CN107402915A (en) * 2017-07-17 2017-11-28 广州特道信息科技有限公司 The generation method and device of the semantic network lexicon of multilayer
CN108037837A (en) * 2017-11-07 2018-05-15 朗坤智慧科技股份有限公司 A kind of intelligent prompt method of search term
CN107798136B (en) 2017-11-23 2020-12-01 北京百度网讯科技有限公司 Entity relation extraction method and device based on deep learning and server
CN108038106B (en) * 2017-12-22 2021-07-02 北京工业大学 Fine-grained domain term self-learning method based on context semantics
CN108446355B (en) * 2018-03-12 2022-05-20 深圳证券信息有限公司 Investment and financing event element extraction method, device and equipment
CN108363701B (en) * 2018-04-13 2022-06-28 达而观信息科技(上海)有限公司 Named entity identification method and system
CN108549640A (en) * 2018-04-24 2018-09-18 易联众信息技术股份有限公司 One kind being based on statistical enterprise name similarity calculating method
CN108920448B (en) * 2018-05-17 2021-09-14 南京大学 Comparison relation extraction method based on long-term and short-term memory network
CN108737423B (en) * 2018-05-24 2020-07-14 国家计算机网络与信息安全管理中心 Phishing website discovery method and system based on webpage key content similarity analysis
CN110633409B (en) * 2018-06-20 2023-06-09 上海财经大学 Automobile news event extraction method integrating rules and deep learning
CN109190110B (en) * 2018-08-02 2023-08-22 厦门快商通信息技术有限公司 Named entity recognition model training method and system and electronic equipment
US11080300B2 (en) 2018-08-21 2021-08-03 International Business Machines Corporation Using relation suggestions to build a relational database
CN109408642B (en) * 2018-08-30 2021-07-16 昆明理工大学 Domain entity attribute relation extraction method based on distance supervision
CN109359299A (en) * 2018-09-28 2019-02-19 中国电子科技集团公司信息科学研究院 A kind of internet of things equipment ability ontology based on commodity data is from construction method
CN109388806B (en) * 2018-10-26 2023-06-27 北京布本智能科技有限公司 Chinese word segmentation method based on deep learning and forgetting algorithm
CN109543046A (en) * 2018-11-16 2019-03-29 重庆邮电大学 A kind of robot data interoperability Methodologies for Building Domain Ontology based on deep learning
CN109710918B (en) * 2018-11-26 2024-10-18 平安科技(深圳)有限公司 Public opinion identification method, public opinion identification device, computer equipment and storage medium
CN109959109A (en) * 2019-03-18 2019-07-02 四川长虹电器股份有限公司 Air-conditioning control system and its control method based on abnormal speech identification
CN110134761A (en) * 2019-04-16 2019-08-16 深圳壹账通智能科技有限公司 Adjudicate document information retrieval method, device, computer equipment and storage medium
CN110298043B (en) * 2019-07-03 2023-04-07 吉林大学 Vehicle named entity identification method and system
CN110458397A (en) * 2019-07-05 2019-11-15 苏州热工研究院有限公司 A kind of nuclear material military service performance information extracting method
CN110413725A (en) * 2019-07-23 2019-11-05 福建奇点时空数字科技有限公司 A kind of industry data information extraction method based on depth learning technology
CN110737845A (en) * 2019-10-15 2020-01-31 精硕科技(北京)股份有限公司 method, computer storage medium and system for realizing information analysis
CN111178076B (en) * 2019-12-19 2023-08-08 成都欧珀通信科技有限公司 Named entity recognition and linking method, device, equipment and readable storage medium
CN111126067B (en) * 2019-12-23 2022-02-18 北大方正集团有限公司 Entity relationship extraction method and device
CN111274361A (en) * 2020-01-21 2020-06-12 北京明略软件系统有限公司 Industry new word discovery method and device, storage medium and electronic equipment
CN111582497A (en) * 2020-04-27 2020-08-25 平安医疗健康管理股份有限公司 Training file generation and evaluation method, device, computer system and storage medium
CN111881256B (en) * 2020-07-17 2022-11-08 中国人民解放军战略支援部队信息工程大学 Text entity relation extraction method and device and computer readable storage medium equipment
CN111859887A (en) * 2020-07-21 2020-10-30 北京北斗天巡科技有限公司 Scientific and technological news automatic writing system based on deep learning
CN112035621A (en) * 2020-09-03 2020-12-04 江苏经贸职业技术学院 Enterprise name similarity detection method based on statistics
CN112487190B (en) * 2020-12-13 2022-04-19 天津大学 Method for extracting relationships between entities from text based on self-supervision and clustering technology
CN112507060A (en) * 2020-12-14 2021-03-16 福建正孚软件有限公司 Domain corpus construction method and system
CN113157866B (en) * 2021-04-27 2024-05-14 平安科技(深圳)有限公司 Data analysis method, device, computer equipment and storage medium
CN113609844B (en) * 2021-07-30 2024-03-08 国网山西省电力公司晋城供电公司 Electric power professional word stock construction method based on hybrid model and clustering algorithm
CN117114739B (en) * 2023-09-27 2024-05-03 数据空间研究院 Enterprise supply chain information mining method, mining system and storage medium

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP4333229B2 (en) * 2003-06-23 2009-09-16 沖電気工業株式会社 Named character string evaluation device and evaluation method
CN102054029A (en) * 2010-12-17 2011-05-11 哈尔滨工业大学 Figure information disambiguation treatment method based on social network and name context

Also Published As

Publication number Publication date
CN104199972A (en) 2014-12-10

Similar Documents

Publication Publication Date Title
CN104199972B (en) A kind of name entity relation extraction and construction method based on deep learning
CN108052593A (en) A kind of subject key words extracting method based on descriptor vector sum network structure
CN102289522B (en) Method of intelligently classifying texts
CN106055675B (en) A kind of Relation extraction method based on convolutional neural networks and apart from supervision
CN107861939A (en) A kind of domain entities disambiguation method for merging term vector and topic model
CN110362678A (en) A kind of method and apparatus automatically extracting Chinese text keyword
CN108920482B (en) Microblog short text classification method based on lexical chain feature extension and LDA (latent Dirichlet Allocation) model
CN104462053A (en) Inner-text personal pronoun anaphora resolution method based on semantic features
CN109145180B (en) Enterprise hot event mining method based on incremental clustering
CN102054029A (en) Figure information disambiguation treatment method based on social network and name context
CN108763348A (en) A kind of classification improved method of extension short text word feature vector
CN105389354A (en) Social media text oriented unsupervised method for extracting and sorting events
CN106126502A (en) A kind of emotional semantic classification system and method based on support vector machine
CN106682123A (en) Hot event acquiring method and device
CN110188359B (en) Text entity extraction method
CN106599072B (en) Text clustering method and device
CN105512333A (en) Product comment theme searching method based on emotional tendency
Bolaj et al. Text classification for Marathi documents using supervised learning methods
CN109885675A (en) Method is found based on the text sub-topic for improving LDA
CN106547875A (en) A kind of online incident detection method of the microblogging based on sentiment analysis and label
CN108763192B (en) Entity relation extraction method and device for text processing
CN108038099A (en) Low frequency keyword recognition method based on term clustering
CN110377690A (en) A kind of information acquisition method and system based on long-range Relation extraction
CN110705285B (en) Government affair text subject word library construction method, device, server and readable storage medium
CN106503153A (en) Computer text classification system, system and text classification method thereof

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20180523

Address after: 100190 Room 502, 5 Building 4 South four street, Haidian District, Beijing, Zhongguancun.

Applicant after: Zhong kjia speed (Beijing) Information Technology Co., Ltd.

Address before: 100190 South four street, Zhongguancun, Haidian District, Beijing, 4

Applicant before: SINOPARADOFT (BEIJING) PARALLEL SOFTWARE CO., LTD.

GR01 Patent grant