CN104199972B - A kind of name entity relation extraction and construction method based on deep learning - Google Patents
A kind of name entity relation extraction and construction method based on deep learning Download PDFInfo
- Publication number
- CN104199972B CN104199972B CN201410488047.7A CN201410488047A CN104199972B CN 104199972 B CN104199972 B CN 104199972B CN 201410488047 A CN201410488047 A CN 201410488047A CN 104199972 B CN104199972 B CN 104199972B
- Authority
- CN
- China
- Prior art keywords
- word
- entity
- news data
- relationship
- news
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/21—Design, administration or maintenance of databases
- G06F16/211—Schema design and management
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/28—Databases characterised by their database models, e.g. relational or object models
- G06F16/284—Relational databases
- G06F16/285—Clustering or classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/28—Databases characterised by their database models, e.g. relational or object models
- G06F16/284—Relational databases
- G06F16/288—Entity relationship models
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/951—Indexing; Web crawling techniques
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The name entity relation that the present invention provides a kind of based on deep learning extracts and construction method, is used for technical field of Internet information.This method is directed to a certain specific area, and the news data on Vertical Website in crawl field pre-processes the news data of acquisition;News data segments, extracting keywords, generates industry dictionary, is segmented again to news data using industry dictionary;Extract seed dictionary;Unsupervised structure entity relationship network, the sentence for including more than two entities is extracted from news data, extracts the verb in sentence and corresponding document, and the term clustering model based on deep learning is established to the document of extraction, according to the relationship between the word of verb description, entity relationship network is built;It defines entity relationship classification and relationship classification is carried out to each entity pair in entity relationship network.The present invention, which is not required to put into extensive manpower, is marked sample data, and the dependence of corpus is low, and the performance for extracting entity relationship is high.
Description
Technical field
The present invention relates to technical field of Internet information, a kind of method extracted in particular to name entity relation.
Background technology
In information research field, information extraction technique is an essential key technology.In face of the letter of such magnanimity
Space is ceased, how faster and more accurately to extract the interested content of user is a problem in the urgent need to address, and letter
Cease an important research direction of digging technology.Information extraction is different from the information processing technologies such as information retrieval, it is needed to text
Originally it is named the identification of entity, and extracts the relationship between entity, and the flexible and changeable of word, word-building are multiple in Chinese text
It is miscellaneous and do not indicate significantly so that the extraction of identification and relationship to Chinese name entity just seems more difficult.
Currently, there are two types of the main methods of information extraction, one is knowledge based library algorithm, this method needs to establish one
A little rules, although the accuracy rate of this method is higher, the determination of this rule is relatively difficult, is had to author higher
Requirement, and transplantability is not high;Another kind is the machine learning algorithm based on statistics, and this algorithm uses different models, and
Learnt using the training set manually marked, then its relevant probability is calculated using model for new data set, and with this
To obtain final result.This method cost is smaller, and performance is higher, convenient for transplanting, so being the hot spot of current research.
The relevant entity relation extraction technology of machine learning has mainly taken supervision entity relation extraction method and Weakly supervised
Entity relation extraction method.There is the flow of supervision entity relation extraction method to be generally:Training text is pre-processed, relationship is carried out
The handmarking of word pair and relationship, extraction feature vectorization are trained generation model with sorting algorithm, relationship are carried out with model
Category label.Weakly supervised entity relation extraction method and having is pair in place of the main difference of supervision entity relation extraction method
Mark the degree of dependence of language material.The a small amount of mark corpus of Weakly supervised entity relation extraction method, utilizes bootstrapping
(self study) frame carries out entity relation extraction in conjunction with various sorting algorithms.
Weakly supervised entity relation extraction method is because use small-scale tagged corpus, performance poor.And there is supervision
Entity relation extraction method relies on extensive tagged corpus, and this part need of work is according to task situation, manually into rower
Note.Need to expend huge manpower and materials, use various algorithm training patterns on this basis, to the performance of the model of generation without
Method accurately estimates that there are greater risks.
Invention content
The present invention indexes the acquisition of data set, mould to solve specific area present in existing entity relation extraction technology
The acquisition of formula and coreference resolution problem provide a kind of name entity relation based on deep learning and extract and construction method.
Name entity relation provided by the invention based on deep learning extracts and construction method, for a certain specific neck
Domain includes the following steps:
Step 1:Build crawlers, the news data on Vertical Website in crawl field;
Step 2:The news data of acquisition is pre-processed, junk information, including duplicate message, abnormal display are removed
Information, coding mess code information etc.;
Step 3:News data is segmented, extracting keywords, dictionary is added in the keyword extracted, generates industry
Dictionary;
Step 4:Chinese word segmentation is carried out again to news data using industry dictionary, obtains corresponding set of words;
Step 5:Seed dictionary is extracted, seed is the entity pair of setting;
Step 6:Unsupervised structure entity relationship network, specifically:It is extracted from news data comprising more than two entities
Sentence extracts the verb in sentence and corresponding document;Entity term clustering based on deep learning is established to the document of extraction
Model obtains probability distribution of the entity word on other words;According to the relationship between the word of verb description, entity relationship diagram is built
Network;
Step 7:Entity relationship classification is defined, specifically:It is extracted from news data dynamic in the sentence comprising two entities
Word clusters verb, and identical verb is classified as same class;
Step 8:Classify to entity relationship, specifically:To each entity pair in entity relationship network, it is based on step
7 cluster result carries out relationship classification.
Compared with the existing technology, name entity relation of the invention extracts and construction method, advantage and good effect exist
In:
1. using unsupervised entity relation extraction, extensive manpower need not be put into, sample data is marked;
2. the dependence for corpus is low, using news information in common field as text, pumping is improved
Take the performance of entity relationship;
3. the present invention is named entity by field and extracts, gibberish interferes between reducing different field, and it is accurate to extract result
True rate is high.
Description of the drawings
Fig. 1 is the overview flow chart of specific area name entity extraction and construction method according to the ... of the embodiment of the present invention;
Fig. 2 is the overview flow chart of specific area industry keyword abstraction method according to the ... of the embodiment of the present invention;
Fig. 3 is the flow chart that specific area name entity according to the ... of the embodiment of the present invention extracts;
Fig. 4 is the flow chart that domain-specific relation template according to the ... of the embodiment of the present invention extracts.
Specific implementation mode
Below in conjunction with drawings and examples, the present invention is described in further detail.
In the embodiment of the present invention, the name based on deep learning of the present invention is illustrated in conjunction with this specific area of automobile
Entity relation extraction and construction method.Including:The text collection of Automotive News is segmented;Based on self study bootstrap
Method entity is extracted from the obtained cutting unit of participle to (automobile brand, automobile model), therefrom select a small amount of example and make
For initial seed set;Method based on bootstrap extracts relationship templates from entity;And pass through depth learning technology, structure
Relationship between entity is built, carrying out clustering/classification to relationship templates obtains relationship classification.
As shown in Figure 1, specific area according to the ... of the embodiment of the present invention, the present invention is based on the extractions of the name entity of deep learning
With construction method, including it is as follows:
Step 1:Crawlers are built, capture the news data of Vertical Website, it includes automobile that the present invention, which implements mainly to use,
Family, Pacific Ocean automobile data.Specific steps 1 are divided for following steps 101~102.
Step 101:Distributed reptile program is built, page crawl is carried out to Vertical Website data.
Step 102:The dom tree constructions that the page is generated according to the html pages grabbed, the page is climbed to according to tag extraction
Middle contained text information.
Step 2:The news data of acquisition is pre-processed.Specific steps 2 are divided for step 201~202.
Step 201:It is cleaned according to news length, rubbish news is removed using regular expression and the rule set of formulation
Information.
Step 202:News data is filtered using Bloom filter (Bloom filter), removal repeats news letter
Breath.First then N number of hash values are calculated to subsequent comment using in N number of hash Function Mappings to bit array to news data,
Judge whether the news data has existed.If the subsequent calculated hash values of comment are present in bit array, illustrate
The comment data has existed, and filters this out.
Step 3:Extracting keywords form new industry dictionary.The present invention utilizes N-gram model extraction keywords, by institute
The keyword of extraction, which is added, has basic dictionary, generates new industry dictionary.
Different from the Latin languages such as English, Chinese language text does not have the apparent separator such as space, therefore is carrying out Chinese language
First step work seeks to carry out word segmentation to text when present treatment.It, will also be to carrying out word due to the needs of information extraction
The later text of cutting is labeled.The present invention carries out Chinese word segmentation using ICTCLAS, and is excavated by keyword digging technology
Automobile industry dictionary improves the precision of word segmentation.Keyword in the present invention further includes not only information content including discrimination, more lays particular stress on letter
Breath amount.
Solidification degree PMI defined in the embodiment of the present invention is as follows:
PMI (a, b)=p (ab)/p (a) p (b)
PMI value is the solidification degree of word a and word b composition keywords ab, carrys out extracting keywords with this, wherein p (a) indicates word
The frequency that a occurs, p (b) indicate that the frequency that word b occurs, p (ab) indicate the frequency that ab occurs.PMI has the shortcomings that a typical case:Incline
To extracting the lower word of frequency, therefore the present invention specifically select word frequency more than the word of certain threshold value as candidate word when implementing,
Remove the lower word of frequency.Using solidification degree defined herein come extracting keywords relative to other existing methods, by experiment
Proof can remove more noise.PMI value also referred to as point mutual information (Pointwise Mutual Information) value.
Specific steps 3 are divided for step 301~step 305, as shown in Figure 2.
Step 301:Chinese word segmentation program is called, news data is tentatively segmented.
Step 302:Using 1-gram, the PMI value of word is calculated, PIM values is chosen and is more than the word of threshold value A as keyword.
Step 303:Using 2-gram, the PMI value of word is calculated, chooses word of the PMI value more than threshold value B as keyword.
Step 304:Using 3-gram, the PMI value of word is calculated, chooses word of the PMI value more than threshold value C as keyword.
Step 305:The keyword that step 302~step 304 is obtained and original dictionary merge, as what is segmented again
Dictionary.
Threshold value A, B and C can be determined according to experiment.
Step 4:The industry dictionary obtained using step 3 carries out Chinese word segmentation processing again to news data, obtains and corresponds to
Set of words.This step carries out Chinese word segmentation to all comment datas, removes stop words, obtains word segmentation result.
Step 4 includes step 401~step 402.
Step 401:It is segmented first, calls Chinese word segmentation program participle;Then, it is removed and is deactivated according to deactivated vocabulary
Word carries out morphological transformation to English words wherein included, is transformed into unified expression-form.
After text is segmented and marked, text is expressed as a string of set of words being marked.At these
There are many deactivated word in word.They are nonsensical to information extraction.In the present invention by a deactivated vocabulary by these
Word is deactivated to reject.On the one hand the calculation amount of system can be reduced by doing so, on the other hand can improve in information extraction below
Accuracy rate.When removing stop words, calculating sequence is simply carried out according to word frequency and document frequency, removes the highest word of word frequency.
Step 402:The document frequency df and word frequency tf for counting word, are calculated the reverse document-frequency idf of word, use meter
The weights that formula log (tf* (idf+1)+1) calculates word are calculated, and comparison is carried out according to weight threshold D and carries out word set screening, will be weighed
Word of the value more than threshold value D retains, to which extraction obtains to embody the set of words of news features, while by after threshold comparison
Also the dimension of the corresponding set of words of news data is suitably reduced.
Step 5:Manual manufacture automobile brand and automotive type seed dictionary, bootstrap excavate automobile brand and vehicle
Dictionary.
Word segmentation and mark are being carried out to text collection, filtered after deactivating word, in order to improve information extraction
The range of extraction is limited to a suitable range by accuracy rate.It has to find out and occurs two name entities in same sentence
To sentence.Find out the name entity pair in set contextual window.Entity abbreviation entity will be named below, name entity pair
Abbreviation entity pair.Entity in the present invention to for<Automobile brand, automotive type>.In the embodiment of the present invention, automobile brand is one
A entity, as soon as both automotive type is an entity, and the entity mentioned below refers to.
In order to realize the automatic extraction of relationship between entity, it is necessary to realize and provide certain relationship seed set.It can lead to
Artificial method is crossed, a small amount of relationship seed set is provided.Due to manually merely providing a small amount of relationship seed set, for letter
For breath extracts, this is inadequate.Pass through the extension of automatically trained method bootstrap implementation relation seeds.
Since the relationship between entity pair can be judged by the context between them.Above and below same or similar
Two group objects of text are to same or analogous relationship.Context vector between computational entity pair and relationship seed can be passed through
Similarity as the similarity between them.
This step includes step 501 and step 502, as shown in Figure 3.
Step 501:It is artificial to choose automobile brand and corresponding automotive type.A quantity of seeds is provided and seed extracts mould
Plate, each seed are an entity pair.Particular number can be arranged as required to.Seed extraction template is for example:Such as (certain automobile product
Board) publication (certain automotive type).
Step 502:Entity pair is excavated by bootstrap methods.By being closed between bootstrap method automatic mining entities
System, can be continuously available seed extraction template, and seed is extracted according to seed extraction template again iteration.
Automobile brand is extracted in the embodiment of the present invention and the pseudocode of vehicle is as follows:
Step 6:Unsupervised structure entity relationship network, including step 601~step 604.It identifies first in each sentence
Entity.For each sentence, the result being labeled to sentence is used.Then entity is built to the entity identified
It is right, then carry out relationship classification.
Step 601:Extract it is all include two and more than two entity sentence, extract verb therein and corresponding
Document.
Step 602:The verb extracted in step 601 is normalized and denoising, verb is corresponded into centrifugal pump 0
With 1, while removing and wherein repeating or meaningless verb.
Step 603:Entity word based on deep learning (Deep Learning) is established to the document extracted in step 601
Clustering Model obtains probability distribution of the entity word on other words.
Step 604:According to relationships such as the relationships between word, such as subject-predicate, dynamic guest, entity relationship network, the network are built
Including the relationship between the entity of all verb descriptions extracted.The step of building entity relationship network is as shown in Figure 4.
The pseudocode that entity relationship network is built in the embodiment of the present invention is as follows:
The embodiment of the present invention builds word2vec models using deep learning, and point of word is obtained using word2vec models
Cloth calculates the similitude between word according to the distribution of word, to realize the cluster of word.
Step 7:Define entity relationship classification.The verb in article is extracted, such as " purchase ", " cooperation ", " publication " is closed
It is classification.Step 7 includes step 701~702.
Step 701:To pretreated news data in step 2 extract it is all include two entities sentence in it is dynamic
Word.
Step 702:Verb is clustered again, obtains the classification of relationship, identical verb is classified as same class.
It is as follows to the pseudocode of entity relationship classification in the embodiment of the present invention:
Extract articles that contain more than 1entity extract the text of more than one entity
Shelves;
Get all Verb between two entities obtain the verb between entity there are two institutes;
Using LDA cluster Verbs the verb obtained above is clustered using LDA Subject Clusterings model;
Get relation type as cluster result using the type of verb as cluster result.
Step 8:Classify to entity relationship.To each entity in entity relationship network to the cluster based on step 7
As a result relationship classification is carried out.An entity corresponds to a feature to relationship in entity relationship network, by extraction feature, based on step
The rule that rapid 7 cluster is formed carries out relationship classification.
The entity relationship network obtained by step 6, the entity sets that can included, the embodiment of the present invention are automobile product
Board set N and automotive type set O, to arbitrary n ∈ N, o ∈ O, structure entity is to (n, o).Due to only consider automobile brand with
The relationship of automotive type, therefore when entity is to structure, automobile brand is placed above the other things always, automotive type is placed on second
Position.And the sequence that they occur in sentence is then taken into account as feature in model learning and classification.For example, in sentence
In " Toyota release trendy rav4 ", if identifying automobile brand " Toyota ", automotive type " rav4 ", then N={ Toyota }, O=
{ rav4 } then obtains entity to { (Toyota, rav4) }.
Claims (1)
1. a kind of name entity relation based on deep learning extracts and construction method, for a certain specific area, feature exists
In including the following steps:
Step 1:Build crawlers, the news data on Vertical Website in crawl field;
Step 2:The news data of acquisition is pre-processed, junk information, including duplicate message, abnormal display information are removed
With coding mess code information;Pretreated news data is used for below step;
Step 201:It is cleaned according to news length, is believed using regular expression and the rule set of formulation removal rubbish news
Breath;
Step 202:News data is filtered using Bloom filter Bloom filter, removal repeats news information;It is first
First then N number of hash values are calculated to subsequent comment using in N number of hash Function Mappings to bit array to news data, judged
Whether the news data has existed;If the subsequent calculated hash values of comment are present in bit array, illustrate that this is commented
It has existed, and filters this out by data;
Step 3:News data is segmented, extracting keywords, dictionary is added in the keyword of extraction, generates industry dictionary;
When extracting keywords, is segmented using N-gram models, N=1,2,3, calculate the point mutual information PMI value of word, the threshold with setting
Value compares, and will be greater than the word of threshold value as keyword;
PMI value PMI (a, b)=p (ab)/p (a) p (b) of word a and word b, wherein p (a) indicates the frequency that word a occurs, p (b) tables
Show that the frequency that word b occurs, p (ab) indicate the frequency that ab occurs;
Step 4:Chinese word segmentation is carried out to news data using industry dictionary, obtains corresponding set of words;
Step 401:It is segmented first, calls Chinese word segmentation program participle;Then, stop words is removed according to deactivated vocabulary, it is right
English words wherein included carry out morphological transformation, are transformed into unified expression-form;
Step 402:The document frequency df and word frequency tf for counting word, are calculated the reverse document-frequency idf of word, public using calculating
Formula log (tf* (idf+1)+1) calculates the weights of word, and carries out word set screening according to weights and threshold value D comparisons, and extraction weights are big
In the word of threshold value D, corresponding set of words is obtained, while passing through threshold comparison, reduce the dimension of the corresponding set of words of news data
Degree;
Step 5:Seed dictionary is extracted, seed is the entity pair of setting;Manual manufacture a quantity of seeds first, then utilizes
Bootstrap methods excavate entity pair from news data;
Step 6:Unsupervised structure entity relationship network, specifically:The sentence for including more than two entities is extracted from news data
Son extracts the verb in sentence and corresponding document;Entity term clustering mould based on deep learning is established to the document of extraction
Type obtains probability distribution of the entity word on other words;According to the relationship between the word of verb description, entity relationship diagram is built
Network;
Step 7:Entity relationship classification is defined, specifically:The verb in the sentence comprising two entities is extracted from news data, it is right
Verb is clustered, and identical verb is classified as same class;
Step 8:To each entity pair in entity relationship network, the cluster result based on step 7 carries out relationship classification.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410488047.7A CN104199972B (en) | 2013-09-22 | 2014-09-22 | A kind of name entity relation extraction and construction method based on deep learning |
Applications Claiming Priority (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN2013104319134 | 2013-09-22 | ||
CN201310431913.4 | 2013-09-22 | ||
CN201310431913 | 2013-09-22 | ||
CN201410488047.7A CN104199972B (en) | 2013-09-22 | 2014-09-22 | A kind of name entity relation extraction and construction method based on deep learning |
Publications (2)
Publication Number | Publication Date |
---|---|
CN104199972A CN104199972A (en) | 2014-12-10 |
CN104199972B true CN104199972B (en) | 2018-08-03 |
Family
ID=52085265
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201410488047.7A Active CN104199972B (en) | 2013-09-22 | 2014-09-22 | A kind of name entity relation extraction and construction method based on deep learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN104199972B (en) |
Families Citing this family (49)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104933164B (en) * | 2015-06-26 | 2018-10-09 | 华南理工大学 | In internet mass data name entity between relationship extracting method and its system |
CN105260457B (en) * | 2015-10-14 | 2018-07-13 | 南京大学 | A kind of multi-semantic meaning network entity contrast table automatic generation method towards coreference resolution |
CN105389470A (en) * | 2015-11-18 | 2016-03-09 | 福建工程学院 | Method for automatically extracting Traditional Chinese Medicine acupuncture entity relationship |
CN105468583A (en) * | 2015-12-09 | 2016-04-06 | 百度在线网络技术(北京)有限公司 | Entity relationship obtaining method and device |
CN105894088B (en) * | 2016-03-25 | 2018-06-29 | 苏州赫博特医疗信息科技有限公司 | Based on deep learning and distributed semantic feature medical information extraction system and method |
CN105938495A (en) * | 2016-04-29 | 2016-09-14 | 乐视控股(北京)有限公司 | Entity relationship recognition method and apparatus |
US11288573B2 (en) * | 2016-05-05 | 2022-03-29 | Baidu Usa Llc | Method and system for training and neural network models for large number of discrete features for information rertieval |
CN106021223B (en) * | 2016-05-09 | 2020-06-23 | Tcl科技集团股份有限公司 | Sentence similarity calculation method and system |
CN106372122B (en) * | 2016-08-23 | 2018-04-10 | 温州大学瓯江学院 | A kind of Document Classification Method and system based on Wiki semantic matches |
CN108205524B (en) * | 2016-12-20 | 2022-01-07 | 北京京东尚科信息技术有限公司 | Text data processing method and device |
CN108268431B (en) * | 2016-12-30 | 2019-12-03 | 北京国双科技有限公司 | The method and apparatus of paragraph vectorization |
CN106897545B (en) * | 2017-01-05 | 2019-04-30 | 浙江大学 | A kind of tumor prognosis forecasting system based on depth confidence network |
CN108334520A (en) * | 2017-01-19 | 2018-07-27 | 北京京东尚科信息技术有限公司 | social network data processing method, device, storage medium and electronic equipment |
US10922606B2 (en) | 2017-06-13 | 2021-02-16 | International Business Machines Corporation | Multi-directional reduction in large scale deep-learning |
CN107402915A (en) * | 2017-07-17 | 2017-11-28 | 广州特道信息科技有限公司 | The generation method and device of the semantic network lexicon of multilayer |
CN108037837A (en) * | 2017-11-07 | 2018-05-15 | 朗坤智慧科技股份有限公司 | A kind of intelligent prompt method of search term |
CN107798136B (en) | 2017-11-23 | 2020-12-01 | 北京百度网讯科技有限公司 | Entity relation extraction method and device based on deep learning and server |
CN108038106B (en) * | 2017-12-22 | 2021-07-02 | 北京工业大学 | Fine-grained domain term self-learning method based on context semantics |
CN108446355B (en) * | 2018-03-12 | 2022-05-20 | 深圳证券信息有限公司 | Investment and financing event element extraction method, device and equipment |
CN108363701B (en) * | 2018-04-13 | 2022-06-28 | 达而观信息科技(上海)有限公司 | Named entity identification method and system |
CN108549640A (en) * | 2018-04-24 | 2018-09-18 | 易联众信息技术股份有限公司 | One kind being based on statistical enterprise name similarity calculating method |
CN108920448B (en) * | 2018-05-17 | 2021-09-14 | 南京大学 | Comparison relation extraction method based on long-term and short-term memory network |
CN108737423B (en) * | 2018-05-24 | 2020-07-14 | 国家计算机网络与信息安全管理中心 | Phishing website discovery method and system based on webpage key content similarity analysis |
CN110633409B (en) * | 2018-06-20 | 2023-06-09 | 上海财经大学 | Automobile news event extraction method integrating rules and deep learning |
CN109190110B (en) * | 2018-08-02 | 2023-08-22 | 厦门快商通信息技术有限公司 | Named entity recognition model training method and system and electronic equipment |
US11080300B2 (en) | 2018-08-21 | 2021-08-03 | International Business Machines Corporation | Using relation suggestions to build a relational database |
CN109408642B (en) * | 2018-08-30 | 2021-07-16 | 昆明理工大学 | Domain entity attribute relation extraction method based on distance supervision |
CN109359299A (en) * | 2018-09-28 | 2019-02-19 | 中国电子科技集团公司信息科学研究院 | A kind of internet of things equipment ability ontology based on commodity data is from construction method |
CN109388806B (en) * | 2018-10-26 | 2023-06-27 | 北京布本智能科技有限公司 | Chinese word segmentation method based on deep learning and forgetting algorithm |
CN109543046A (en) * | 2018-11-16 | 2019-03-29 | 重庆邮电大学 | A kind of robot data interoperability Methodologies for Building Domain Ontology based on deep learning |
CN109710918B (en) * | 2018-11-26 | 2024-10-18 | 平安科技(深圳)有限公司 | Public opinion identification method, public opinion identification device, computer equipment and storage medium |
CN109959109A (en) * | 2019-03-18 | 2019-07-02 | 四川长虹电器股份有限公司 | Air-conditioning control system and its control method based on abnormal speech identification |
CN110134761A (en) * | 2019-04-16 | 2019-08-16 | 深圳壹账通智能科技有限公司 | Adjudicate document information retrieval method, device, computer equipment and storage medium |
CN110298043B (en) * | 2019-07-03 | 2023-04-07 | 吉林大学 | Vehicle named entity identification method and system |
CN110458397A (en) * | 2019-07-05 | 2019-11-15 | 苏州热工研究院有限公司 | A kind of nuclear material military service performance information extracting method |
CN110413725A (en) * | 2019-07-23 | 2019-11-05 | 福建奇点时空数字科技有限公司 | A kind of industry data information extraction method based on depth learning technology |
CN110737845A (en) * | 2019-10-15 | 2020-01-31 | 精硕科技(北京)股份有限公司 | method, computer storage medium and system for realizing information analysis |
CN111178076B (en) * | 2019-12-19 | 2023-08-08 | 成都欧珀通信科技有限公司 | Named entity recognition and linking method, device, equipment and readable storage medium |
CN111126067B (en) * | 2019-12-23 | 2022-02-18 | 北大方正集团有限公司 | Entity relationship extraction method and device |
CN111274361A (en) * | 2020-01-21 | 2020-06-12 | 北京明略软件系统有限公司 | Industry new word discovery method and device, storage medium and electronic equipment |
CN111582497A (en) * | 2020-04-27 | 2020-08-25 | 平安医疗健康管理股份有限公司 | Training file generation and evaluation method, device, computer system and storage medium |
CN111881256B (en) * | 2020-07-17 | 2022-11-08 | 中国人民解放军战略支援部队信息工程大学 | Text entity relation extraction method and device and computer readable storage medium equipment |
CN111859887A (en) * | 2020-07-21 | 2020-10-30 | 北京北斗天巡科技有限公司 | Scientific and technological news automatic writing system based on deep learning |
CN112035621A (en) * | 2020-09-03 | 2020-12-04 | 江苏经贸职业技术学院 | Enterprise name similarity detection method based on statistics |
CN112487190B (en) * | 2020-12-13 | 2022-04-19 | 天津大学 | Method for extracting relationships between entities from text based on self-supervision and clustering technology |
CN112507060A (en) * | 2020-12-14 | 2021-03-16 | 福建正孚软件有限公司 | Domain corpus construction method and system |
CN113157866B (en) * | 2021-04-27 | 2024-05-14 | 平安科技(深圳)有限公司 | Data analysis method, device, computer equipment and storage medium |
CN113609844B (en) * | 2021-07-30 | 2024-03-08 | 国网山西省电力公司晋城供电公司 | Electric power professional word stock construction method based on hybrid model and clustering algorithm |
CN117114739B (en) * | 2023-09-27 | 2024-05-03 | 数据空间研究院 | Enterprise supply chain information mining method, mining system and storage medium |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP4333229B2 (en) * | 2003-06-23 | 2009-09-16 | 沖電気工業株式会社 | Named character string evaluation device and evaluation method |
CN102054029A (en) * | 2010-12-17 | 2011-05-11 | 哈尔滨工业大学 | Figure information disambiguation treatment method based on social network and name context |
-
2014
- 2014-09-22 CN CN201410488047.7A patent/CN104199972B/en active Active
Also Published As
Publication number | Publication date |
---|---|
CN104199972A (en) | 2014-12-10 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN104199972B (en) | A kind of name entity relation extraction and construction method based on deep learning | |
CN108052593A (en) | A kind of subject key words extracting method based on descriptor vector sum network structure | |
CN102289522B (en) | Method of intelligently classifying texts | |
CN106055675B (en) | A kind of Relation extraction method based on convolutional neural networks and apart from supervision | |
CN107861939A (en) | A kind of domain entities disambiguation method for merging term vector and topic model | |
CN110362678A (en) | A kind of method and apparatus automatically extracting Chinese text keyword | |
CN108920482B (en) | Microblog short text classification method based on lexical chain feature extension and LDA (latent Dirichlet Allocation) model | |
CN104462053A (en) | Inner-text personal pronoun anaphora resolution method based on semantic features | |
CN109145180B (en) | Enterprise hot event mining method based on incremental clustering | |
CN102054029A (en) | Figure information disambiguation treatment method based on social network and name context | |
CN108763348A (en) | A kind of classification improved method of extension short text word feature vector | |
CN105389354A (en) | Social media text oriented unsupervised method for extracting and sorting events | |
CN106126502A (en) | A kind of emotional semantic classification system and method based on support vector machine | |
CN106682123A (en) | Hot event acquiring method and device | |
CN110188359B (en) | Text entity extraction method | |
CN106599072B (en) | Text clustering method and device | |
CN105512333A (en) | Product comment theme searching method based on emotional tendency | |
Bolaj et al. | Text classification for Marathi documents using supervised learning methods | |
CN109885675A (en) | Method is found based on the text sub-topic for improving LDA | |
CN106547875A (en) | A kind of online incident detection method of the microblogging based on sentiment analysis and label | |
CN108763192B (en) | Entity relation extraction method and device for text processing | |
CN108038099A (en) | Low frequency keyword recognition method based on term clustering | |
CN110377690A (en) | A kind of information acquisition method and system based on long-range Relation extraction | |
CN110705285B (en) | Government affair text subject word library construction method, device, server and readable storage medium | |
CN106503153A (en) | Computer text classification system, system and text classification method thereof |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
TA01 | Transfer of patent application right | ||
TA01 | Transfer of patent application right |
Effective date of registration: 20180523 Address after: 100190 Room 502, 5 Building 4 South four street, Haidian District, Beijing, Zhongguancun. Applicant after: Zhong kjia speed (Beijing) Information Technology Co., Ltd. Address before: 100190 South four street, Zhongguancun, Haidian District, Beijing, 4 Applicant before: SINOPARADOFT (BEIJING) PARALLEL SOFTWARE CO., LTD. |
|
GR01 | Patent grant |