CN107145503A - Remote supervision non-categorical relation extracting method and system based on word2vec - Google Patents

Remote supervision non-categorical relation extracting method and system based on word2vec Download PDF

Info

Publication number
CN107145503A
CN107145503A CN201710166727.0A CN201710166727A CN107145503A CN 107145503 A CN107145503 A CN 107145503A CN 201710166727 A CN201710166727 A CN 201710166727A CN 107145503 A CN107145503 A CN 107145503A
Authority
CN
China
Prior art keywords
sentence
training corpus
categorical
word2vec
relation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710166727.0A
Other languages
Chinese (zh)
Inventor
赵明
杜会芳
董翠翠
陈瑛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Agricultural University
Original Assignee
China Agricultural University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Agricultural University filed Critical China Agricultural University
Priority to CN201710166727.0A priority Critical patent/CN107145503A/en
Publication of CN107145503A publication Critical patent/CN107145503A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Animal Behavior & Ethology (AREA)
  • Computational Linguistics (AREA)
  • Machine Translation (AREA)

Abstract

The present invention discloses a kind of remote supervision non-categorical relation extracting method and system based on word2vec, can more accurately extract the non-categorical relation in vegetables field.Method includes:The network vegetables field unstructured text data of network encyclopaedia and large-scale vegetables website is crawled as language material, language material is pre-processed successively, preliminary training corpus is obtained;Word2vec models are trained using preliminary training corpus, the space vector of each sentence is obtained using word2vec models;Preliminary training corpus is polymerize according to non-categorical relationship type, for the aggregated data of each relation, common sentence pattern and uncommon sentence pattern is extracted;Selection two meets the sentence space vector of two kinds of different modes as the initial center of k means clustering methods respectively, all sentence space vectors is clustered, selection meets a class of common sentence pattern, obtains the preferable training corpus of quality;By the preferable training corpus training convolutional neural networks model of quality, by full softmax layers of a connection, non-categorical relation is extracted.

Description

Remote supervision non-categorical relation extracting method and system based on word2vec
Technical field
The present invention relates to Weakly supervised classification field, and in particular to a kind of remote supervision non-categorical relation based on word2vec is carried Take method and system.
Background technology
Currently in terms of the class ontology knowledge collection of illustrative plates of agriculture field, research also in the starting stage, non-categorical relation (except Other relations of hyponymy classification relation) pertinent literature report it is also fewer.Although there is document respectively towards ancient agriculture Learn the study that non-categorical relation has been also related to Tea Science field, e.g., He Lin's《The semi-automatic structure of domain body and retrieval are ground Study carefully》, Xu Jicheng《Towards the body learning Modeling Research in vegetables field》Deng, but be all to employ most basic correlation rule side Method finds the concept pair that there is relation.The relation species not only extracted is not enough enriched, and language material does not have essentially from books and document yet Have and utilize data resource huge on Web.And the accuracy rate of the non-categorical relation extracted is also far below general classification relation Extract accuracy rate.
Non-categorical Relation extraction is carried out using remote measure of supervision, label noise can be produced more, Zeng, D. et al. exists 《Distant Supervision for Relation Extraction via Piecewise Convolutional Neural Networks》Using many case-based learning methods removal noise, Takamatsu S et al. exist《Reducing wrong labels in distant supervision for relation extraction》Label is removed using high-quality template Noise.
But the clustering algorithm that label noise is removed in most of remote supervision relation recognition methods does not take into full account vector Grammer, semantic information in space between each term vector, and during network encyclopaedia and vegetables website are described to the entry of vegetable variety, Contextual information is critically important, and on relation extract influence it is very big, therefore, how to provide a kind of degree of accuracy it is higher be applied to vegetable The non-categorical relation extracting method in dish field, as technical problem urgently to be resolved hurrily.
The content of the invention
For defect of the prior art, the embodiment of the present invention provides a kind of remote supervision non-categorical based on word2vec and closed It is extracting method and system.
On the one hand, the embodiment of the present invention proposes a kind of remote supervision non-categorical relation extracting method based on word2vec, bag Include:
S1, the network vegetables field unstructured text data of network encyclopaedia and large-scale vegetables website is crawled as language material, The language material is pre-processed successively, alignment of data, obtain preliminary training corpus;
In the present embodiment, the language material is pre-processed successively, alignment of data is specially that the language material is carried out successively The processing such as participle, part-of-speech tagging, and by alignment of data in result and knowledge base.
S2, word2vec models are trained using the preliminary training corpus, and will be upper using the word2vec models The word stated in the sentence in preliminary training corpus changes into space vector, for each sentence, by the sky of the word in the sentence Between addition of vectors and do the space vector that average treatment obtains the sentence;
S3, the preliminary training corpus polymerize according to non-categorical relationship type, for polymerization obtain it is each The aggregated data of relation, extracts common sentence pattern and uncommon sentence pattern;
S4, k is set to 2, selects two sentence space vectors for meeting two kinds of different modes respectively to make with heuristics manner For the initial center of k-means clustering methods, and all sentence space vectors are clustered, selection meets common sentence pattern A class, obtain the preferable training corpus of quality;
S5, by the preferable training corpus training convolutional neural networks model of the quality, by constituting the convolutional Neural A convolutional layer, a pond layer and full softmax layers of a connection for network model, is extracted from the space vector of the sentence Non-categorical relation.
On the other hand, a kind of remote supervision non-categorical relation extraction system based on word2vec of the embodiment of the present invention, including:
Acquiring unit, the network vegetables field unstructured text data for crawling network encyclopaedia and large-scale vegetables website As language material, the language material is pre-processed successively, alignment of data, obtain preliminary training corpus;
Training unit, for training word2vec models using the preliminary training corpus, and described in Word in sentence in above-mentioned preliminary training corpus is changed into space vector by word2vec models, for each sentence, will The space vector phase adduction of word in the sentence does the space vector that average treatment obtains the sentence;
Polymerized unit, for the preliminary training corpus to be polymerize according to non-categorical relationship type, for polymerization The aggregated data of obtained each relation, extracts common sentence pattern and uncommon sentence pattern;
Cluster cell, for k to be set into 2, two sentences for meeting two kinds of different modes respectively are selected with heuristics manner Space vector and is clustered as the initial center of k-means clustering methods to all sentence space vectors, and selection meets normal See a class of sentence pattern, obtain the preferable training corpus of quality;
Extraction unit, for by the preferable training corpus training convolutional neural networks model of the quality, by constituting A convolutional layer, a pond layer and full softmax layers of a connection for convolutional neural networks model is stated, from the sky of the sentence Between vector extract non-categorical relation.
The remote supervision non-categorical relation extracting method and system based on word2vec that the embodiment of the present invention is proposed, with network Vegetables field non-structured text is language material, carries out language material training using word2vec instruments, label is reduced by clustering algorithm Noise, finally utilizes convolutional neural networks model extraction non-categorical relation.Word2vec instruments used not only train term vector With high efficiency, and the term vector obtained can obtain grammer, semantic information, and this allows for clustering by clustering algorithm To sentence there is syntactic and semantic information, this by effective guarantee far supervise remove label noise effect.In addition, utilizing convolution Neural network model, which extracts non-categorical relation, can be prevented effectively from the processing procedure error accumulation of natural language processing instrument multistage Problem, thus, compared to the grammer, the prior art of semantic information not taken into full account in vector space between each term vector, The present invention is more suitable for vegetables field, and the degree of accuracy that non-categorical relation is extracted is higher.
Brief description of the drawings
Fig. 1 shows for a kind of flow of the remote embodiment of supervision non-categorical relation extracting method one based on word2vec of the present invention It is intended to;
Fig. 2 shows for a kind of structure of the remote embodiment of supervision non-categorical relation extraction system one based on word2vec of the present invention It is intended to.
Embodiment
To make the purpose, technical scheme and advantage of the embodiment of the present invention clearer, below in conjunction with the embodiment of the present invention In accompanying drawing, the technical scheme in the embodiment of the present invention is explicitly described, it is clear that described embodiment be the present invention A part of embodiment, rather than whole embodiments.Based on the embodiment in the present invention, those of ordinary skill in the art are not having The every other embodiment obtained under the premise of creative work is made, the scope of protection of the invention is belonged to.
Referring to Fig. 1, the present embodiment discloses a kind of remote supervision non-categorical relation extracting method based on word2vec, including:
S1, the network vegetables field unstructured text data of network encyclopaedia and large-scale vegetables website is crawled as language material, The language material is pre-processed successively, alignment of data, obtain preliminary training corpus;
The S1, can include:
S10, using write language material collection shell script capture non-knot from network vegetables encyclopaedia and large-scale vegetables website Structure text data does the pretreatment such as low-frequency word filtering, participle, part-of-speech tagging to the language material as language material;
S11, the language material for obtaining step S10 are alignd with the relationship example in default knowledge base, obtain preliminary Training corpus,
This step be based on the assumption that:If there is certain semantic relation between two concepts, then all to include this The sentence of two entitative concepts also expresses this relation,
For example,<Health-care effect, tomato, stomach>Non-categorical relation, all include is focused to find out according to above-mentioned hypothesis from text The sentence I and II of " tomato " and " stomach ":
I. " tomato has effects that stomach strengthening and digestion promoting ";
II. " food tomato often results in stomach upset, stomach distending pain on an empty stomach ",
Non-categorical relationship example just constitutes an align data with these sentences, but II is not expressed as from the foregoing The relation of " health-care effect ", belongs to noise data, below step will remove label noise, and extract vegetables field non-categorical pass System:
S2, word2vec models are trained using the preliminary training corpus, and will be upper using the word2vec models The word stated in the sentence in preliminary training corpus changes into space vector, for each sentence, by the sky of the word in the sentence Between addition of vectors and do the space vector that average treatment obtains the sentence;
The Word2vec used in step S2 is a Software tool for being used to train term vector that Google companies open, It is empty that each word in sentence is quickly effectively mapped to k dimensions by it according to given corpus, by the training pattern after optimization Between in the vector with actual value, and these vectors obtain grammer, semantic feature, its core architecture include CBOW and Skip-gram。
Wherein, CBOW models simply understand to be exactly that context determines the probability that current word occurs, and the present invention uses Skip- Gram models, this model is to predict the probability that context occurs with current word., usually can be because of place when handling language material The limitation of window size is managed, causes the relation between the word and current word of window ranges to arrive mould by correctly reflection Among type, if the complexity of training can be increased by expanding window merely again.Skip-gram models by " skipping some characters " very Good solves this problem.For example 2 four-tuples of network encyclopaedia entry " eggplant growth requires higher temperature ", are " eggplant respectively Son growth require it is higher ", " growth requires higher temperature " all do not express sentence original idea.Skip-gram models but allow some Word is skipped, if skipping two words, there is four-tuple " eggplant requirement higher temperature ", and " eggplant growth higher temperature " can be expressed Original idea.Comprised the following steps that using word2vec instruments:
(1) word2vec models, are trained using the training corpus tentatively obtained;
(2) space vector of each word in language material sentence, can be obtained by word2vec models, these term vectors include language Method and semantic information.The space vector phase adduction of all words in each sentence is done into handling averagely and obtains corresponding each sentence Vector.Such as, sentence " fresh kidney beans are rich in protein, carrotene, are of high nutritive value ", by the word2vec models of training, can obtain To " fresh kidney beans ", " being rich in ", " protein ", " carrotene ", " nutrition ", " value ", " height " space vector, by upper predicate to Amount phase adduction does the average space vector that can obtain whole sentence.
S3, the preliminary training corpus polymerize according to non-categorical relationship type, for polymerization obtain it is each The aggregated data of relation, extracts common sentence pattern and uncommon sentence pattern;
The S3, can include:
S30, the training corpus tentatively obtained is polymerize according to the non-categorical relationship type contained by sentence, it is right In the aggregated data of each relation, sentence pattern is found using DL-CoTrain algorithms, one of them common sentence mould is extracted Formula and a uncommon sentence pattern, that is, select and cause h (x)=high model of (count (x)+a)/(N+ka) score values, wherein K is classification number 2, and a represents smoothing parameter (generally 0.1), and count (x) represents the number of times that feature x occurs, and N represents a kind of non- The number of the align data of classification relation;
S4, k is set to 2, selects two sentence space vectors for meeting two kinds of different modes respectively to make with heuristics manner For the initial center of k-means clustering methods, and all sentence space vectors are clustered, selection meets common sentence pattern A class, obtain the preferable training corpus of quality;
The S4, can include:
S40, two sentences for meeting different models of selection are used as the initial center of two classes;
S41, k is set to 2, all sentences for meeting both sentence patterns are gathered using K-means clustering algorithms Class, selection meets a class of common sentence pattern.This process is because be based on the text space word with syntactic and semantic information Vector, therefore the sentence finally given also has syntactic and semantic information, can effectively remove label noise, obtain quality preferable Training corpus;
S5, by the preferable training corpus training convolutional neural networks model of the quality, by constituting the convolutional Neural Full softmax layers of the connection of one of network model, non-categorical relation is extracted from the space vector of the sentence.
The S5, can include:
S50, by the preferable training corpus training convolutional neural networks model of the quality, by the space vector of the sentence The convolutional neural networks are inputted, text feature is automatically extracted by the convolutional layer of the convolutional neural networks successively, pond layer is done Down-sampling, full articulamentum exports the prediction probability of non-categorical relation, wherein, the convolutional neural networks model includes a convolution Layer, a pond layer and full softmax layers of a connection.
It is understood that convolutional neural networks structure includes a convolutional layer, a pond layer and a full connection Softmax layers, multiple sentence characteristics values are automatically extracted by each convolutional layer, can select most heavy using maximum pond operation And there are the sentence characteristics of regular length.The sentence characteristics vector that finally all convolutional layers are generated is concatenated, and is obtained One new sentence characteristics vector, all characteristic vectors are integrated, and are connected entirely eventually as incoming one an of characteristic vector In softmax layers, the probability distribution of each non-categorical relation is finally exported.
The remote supervision non-categorical relation extracting method based on word2vec that the present embodiment is proposed, it is non-with network vegetables field Structured text is language material, and language material training is carried out using word2vec instruments, and label noise, last profit are reduced by clustering algorithm With convolutional neural networks model extraction non-categorical relation.Word2vec instruments used not only train term vector to have high efficiency, And the term vector obtained can obtain grammer, semantic information, this allows for clustering obtained sentence tool by clustering algorithm There is syntactic and semantic information, this far supervises effective guarantee the effect for removing label noise.In addition, utilizing convolutional neural networks mould Type, which extracts non-categorical relation, can be prevented effectively from natural language processing instrument multistage processing procedure error accumulation problem, thus, Compared to the grammer, the prior art of semantic information not taken into full account in vector space between each term vector, the present invention is more suitable For vegetables field, and the degree of accuracy that non-categorical relation is extracted is higher.
Referring to Fig. 2, the present embodiment discloses a kind of remote supervision non-categorical relation extraction system based on word2vec, including:
Acquiring unit 1, the network vegetables field non-structured text number for crawling network encyclopaedia and large-scale vegetables website According to as language material, being pre-processed successively to the language material, alignment of data, obtain preliminary training corpus;
In the present embodiment, the acquiring unit can include:
Subelement is captured, for gathering shell script from network vegetables encyclopaedia and large-scale vegetables website using the language material write Upper crawl unstructured text data does the pre- places such as low-frequency word filtering, participle, part-of-speech tagging to the language material as language material Reason;
Align subelement, and the language material for the crawl subelement to be obtained enters with the relationship example in default knowledge base Row alignment, obtains preliminary training corpus.
Training unit 2, for training word2vec models using the preliminary training corpus, and described in Word in sentence in above-mentioned preliminary training corpus is changed into space vector by word2vec models, for each sentence, will The space vector phase adduction of word in the sentence does the space vector that average treatment obtains the sentence;
Polymerized unit 3, for the preliminary training corpus to be polymerize according to non-categorical relationship type, for polymerization The aggregated data of obtained each relation, extracts common sentence pattern and uncommon sentence pattern;
The polymerized unit, specifically can be used for:
The training corpus tentatively obtained is polymerize according to the non-categorical relationship type contained by sentence, for every The aggregated data of individual relation, using DL-CoTrain algorithms find sentence pattern, extract one of them common sentence pattern and One uncommon sentence pattern.
Cluster cell 4, for k to be set into 2, two sentences for meeting two kinds of different modes respectively are selected with heuristics manner Space vector and is clustered as the initial center of k-means clustering methods to all sentence space vectors, and selection meets normal See a class of sentence pattern, obtain the preferable training corpus of quality;
Extraction unit 5, for by the preferable training corpus training convolutional neural networks model of the quality, by constituting A convolutional layer, a pond layer and full softmax layers of a connection for convolutional neural networks model is stated, from the sky of the sentence Between vector extract non-categorical relation.
The extraction unit, specifically can be used for:
By the preferable training corpus training convolutional neural networks model of the quality, the space vector of the sentence is inputted The convolutional neural networks, automatically extract text feature by the convolutional layer of the convolutional neural networks successively, and pond layer, which is done down, to be adopted Sample, full articulamentum exports the prediction probability of non-categorical relation, wherein, the convolutional neural networks model includes a convolutional layer, One pond layer and full softmax layers of a connection.
The remote supervision non-categorical relation extraction system based on word2vec that the present embodiment is proposed, it is non-with network vegetables field Structured text is language material, and language material training is carried out using word2vec instruments, and label noise, last profit are reduced by clustering algorithm With convolutional neural networks model extraction non-categorical relation.Word2vec instruments used not only train term vector to have high efficiency, And the term vector obtained can obtain grammer, semantic information, this allows for clustering obtained sentence tool by clustering algorithm There is syntactic and semantic information, this far supervises effective guarantee the effect for removing label noise.In addition, utilizing convolutional neural networks mould Type, which extracts non-categorical relation, can be prevented effectively from natural language processing instrument multistage processing procedure error accumulation problem, thus, Compared to the grammer, the prior art of semantic information not taken into full account in vector space between each term vector, the present invention is more suitable For vegetables field, and the degree of accuracy that non-categorical relation is extracted is higher.
The invention has the advantages that;
In terms of application field, this invention address that extracting vegetables field non-categorical relation, non-categorical relation is in very great Cheng Degree can improve the accuracy rate and recall rate of information inquiry in the magnanimity information of vegetables field, increase the completeness of knowledge representation, will The intelligent semantic information service of vegetables information for needed for rapidly and accurately obtaining people brings possibility, improves vegetables Informatization The level of service.
Although being described in conjunction with the accompanying embodiments of the present invention, those skilled in the art can not depart from this hair Various modifications and variations are made in the case of bright spirit and scope, such modifications and variations are each fallen within by appended claims Within limited range.

Claims (8)

1. a kind of remote supervision non-categorical relation extracting method based on word2vec, it is characterised in that including:
S1, the network vegetables field unstructured text data of network encyclopaedia and large-scale vegetables website is crawled as language material, to institute Predicate material is pre-processed successively, alignment of data, obtains preliminary training corpus;
S2, train word2vec models using the preliminary training corpus, and using the word2vec models will it is above-mentioned at the beginning of The word in sentence in the training corpus of step changes into space vector, for each sentence, by the space of the word in the sentence to Amount phase adduction does the space vector that average treatment obtains the sentence;
S3, the preliminary training corpus polymerize according to non-categorical relationship type, each relation obtained for polymerization Aggregated data, extract common sentence pattern and uncommon sentence pattern;
S4, k is set to 2, selects two sentence space vectors for meeting two kinds of different modes to be respectively used as k- using heuristics manner The initial center of means clustering methods, and all sentence space vectors are clustered, selection meets the one of common sentence pattern Class, obtains the preferable training corpus of quality;
S5, by the preferable training corpus training convolutional neural networks model of the quality, by constituting the convolutional neural networks A convolutional layer, a pond layer and full softmax layers of a connection for model, extracts overstepping one's bounds from the space vector of the sentence Class relation.
2. the remote supervision non-categorical relation extracting method according to claim 1 based on word2vec, it is characterised in that institute S1 is stated, including:
S10, using write language material collection shell script captured from network vegetables encyclopaedia and large-scale vegetables website it is unstructured Text data does the pretreatment such as low-frequency word filtering, participle, part-of-speech tagging to the language material as language material;
S11, the language material for obtaining step S10 are alignd with the relationship example in default knowledge base, obtain preliminary training Language material.
3. the remote supervision non-categorical relation extracting method according to claim 2 based on word2vec, it is characterised in that institute S3 is stated, including:
S30, the training corpus tentatively obtained is polymerize according to the non-categorical relationship type contained by sentence, for every The aggregated data of individual relation, using DL-CoTrain algorithms find sentence pattern, extract one of them common sentence pattern and One uncommon sentence pattern.
4. the remote supervision non-categorical relation extracting method according to claim 3 based on word2vec, it is characterised in that institute S5 is stated, including:
S50, by the preferable training corpus training convolutional neural networks model of the quality, the space vector of the sentence is inputted The convolutional neural networks, automatically extract text feature by the convolutional layer of the convolutional neural networks successively, and pond layer, which is done down, to be adopted Sample, full articulamentum exports the prediction probability of non-categorical relation, wherein, the convolutional neural networks model includes a convolutional layer, One pond layer and full softmax layers of a connection.
5. a kind of remote supervision non-categorical relation extraction system based on word2vec, it is characterised in that including:
Acquiring unit, the network vegetables field unstructured text data conduct for crawling network encyclopaedia and large-scale vegetables website Language material, is pre-processed, alignment of data successively to the language material, obtains preliminary training corpus;
Training unit, for training word2vec models using the preliminary training corpus, and utilizes the word2vec moulds Word in sentence in above-mentioned preliminary training corpus is changed into space vector by type, for each sentence, by the sentence The space vector phase adduction of word does the space vector that average treatment obtains the sentence;
Polymerized unit, for the preliminary training corpus to be polymerize according to non-categorical relationship type, is obtained for polymerization Each relation aggregated data, extract common sentence pattern and uncommon sentence pattern;
Cluster cell, for k to be set into 2, two sentence spaces for meeting two kinds of different modes respectively are selected with heuristics manner The vectorial initial center as k-means clustering methods, and all sentence space vectors are clustered, selection meets common sentence One class of subpattern, obtains the preferable training corpus of quality;
Extraction unit, for by the preferable training corpus training convolutional neural networks model of the quality, by constituting the volume Product one convolutional layer of neural network model, a pond layer and one connect softmax layers entirely, from the space of the sentence to Amount extracts non-categorical relation.
6. the remote supervision non-categorical relation extraction system according to claim 5 based on word2vec, it is characterised in that institute Acquiring unit is stated, including:
Subelement is captured, for being grabbed using the language material collection shell script write from network vegetables encyclopaedia and large-scale vegetables website Unstructured text data is taken as language material, and the pretreatment such as low-frequency word filtering, participle, part-of-speech tagging is done to the language material;
Align subelement, for language material and the relationship example progress pair in default knowledge base for obtaining the crawl subelement Together, preliminary training corpus is obtained.
7. the remote supervision non-categorical relation extraction system according to claim 6 based on word2vec, it is characterised in that institute Polymerized unit is stated, specifically for:
The training corpus tentatively obtained is polymerize according to the non-categorical relationship type contained by sentence, closed for each The aggregated data of system, finds sentence pattern using DL-CoTrain algorithms, extracts one of them common sentence pattern and one Uncommon sentence pattern.
8. the remote supervision non-categorical relation extraction system according to claim 7 based on word2vec, it is characterised in that institute Extraction unit is stated, specifically for:
By the preferable training corpus training convolutional neural networks model of the quality, the space vector of the sentence is inputted described Convolutional neural networks, automatically extract text feature by the convolutional layer of the convolutional neural networks successively, and pond layer does down-sampling, entirely Articulamentum exports the prediction probability of non-categorical relation, wherein, the convolutional neural networks model includes a convolutional layer, a pond Change layer and full softmax layers of a connection.
CN201710166727.0A 2017-03-20 2017-03-20 Remote supervision non-categorical relation extracting method and system based on word2vec Pending CN107145503A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710166727.0A CN107145503A (en) 2017-03-20 2017-03-20 Remote supervision non-categorical relation extracting method and system based on word2vec

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710166727.0A CN107145503A (en) 2017-03-20 2017-03-20 Remote supervision non-categorical relation extracting method and system based on word2vec

Publications (1)

Publication Number Publication Date
CN107145503A true CN107145503A (en) 2017-09-08

Family

ID=59783444

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710166727.0A Pending CN107145503A (en) 2017-03-20 2017-03-20 Remote supervision non-categorical relation extracting method and system based on word2vec

Country Status (1)

Country Link
CN (1) CN107145503A (en)

Cited By (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107704558A (en) * 2017-09-28 2018-02-16 北京车慧互动广告有限公司 A kind of consumers' opinions abstracting method and system
CN107908757A (en) * 2017-11-21 2018-04-13 恒安嘉新(北京)科技股份公司 Website classification method and system
CN108154234A (en) * 2017-12-04 2018-06-12 盈盛资讯科技有限公司 A kind of knowledge learning method and system based on template
CN108280055A (en) * 2017-12-04 2018-07-13 盈盛资讯科技有限公司 A kind of knowledge learning method and system based on binary crelation
CN108280058A (en) * 2018-01-02 2018-07-13 中国科学院自动化研究所 Relation extraction method and apparatus based on intensified learning
CN108427717A (en) * 2018-02-06 2018-08-21 北京航空航天大学 It is a kind of based on the alphabetic class family of languages medical treatment text Relation extraction method gradually extended
CN109145120A (en) * 2018-07-02 2019-01-04 北京妙医佳信息技术有限公司 The Relation extraction method and system of medical health domain knowledge map
CN109271632A (en) * 2018-09-14 2019-01-25 重庆邂智科技有限公司 A kind of term vector learning method of supervision
CN109446300A (en) * 2018-09-06 2019-03-08 厦门快商通信息技术有限公司 A kind of corpus preprocess method, the pre- mask method of corpus and electronic equipment
CN109885698A (en) * 2019-02-13 2019-06-14 北京航空航天大学 A kind of knowledge mapping construction method and device, electronic equipment
CN110209836A (en) * 2019-05-17 2019-09-06 北京邮电大学 Remote supervisory Relation extraction method and device
CN110458162A (en) * 2019-07-25 2019-11-15 上海兑观信息科技技术有限公司 A kind of method of intelligent extraction pictograph information
CN110647919A (en) * 2019-08-27 2020-01-03 华东师范大学 Text clustering method and system based on K-means clustering and capsule network
CN110674265A (en) * 2019-08-06 2020-01-10 上海孚典智能科技有限公司 Unstructured information oriented feature discrimination and information recommendation system
CN110825851A (en) * 2019-11-07 2020-02-21 中电福富信息科技有限公司 Sentence pair relation discrimination method based on median conversion model
CN111914555A (en) * 2019-05-09 2020-11-10 中国人民大学 Automatic relation extraction system based on Transformer structure
CN112016330A (en) * 2020-08-28 2020-12-01 平安国际智慧城市科技股份有限公司 Semantic parsing method, semantic parsing device and storage medium
CN112528045A (en) * 2020-12-23 2021-03-19 中译语通科技股份有限公司 Method and system for judging domain map relation based on open encyclopedia map
CN112906368A (en) * 2021-02-19 2021-06-04 北京百度网讯科技有限公司 Industry text increment method, related device and computer program product
CN113688238A (en) * 2021-08-19 2021-11-23 支付宝(杭州)信息技术有限公司 Method and device for recognizing upper and lower word relations
CN114442623A (en) * 2022-01-20 2022-05-06 中国农业大学 Agricultural machinery operation track field segmentation method based on space-time diagram neural network

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150154193A1 (en) * 2013-12-02 2015-06-04 Qbase, LLC System and method for extracting facts from unstructured text
CN105389379A (en) * 2015-11-20 2016-03-09 重庆邮电大学 Rubbish article classification method based on distributed feature representation of text
CN105740349A (en) * 2016-01-25 2016-07-06 重庆邮电大学 Sentiment classification method capable of combining Doc2vce with convolutional neural network
CN106055675A (en) * 2016-06-06 2016-10-26 杭州量知数据科技有限公司 Relation extracting method based on convolution neural network and distance supervision

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150154193A1 (en) * 2013-12-02 2015-06-04 Qbase, LLC System and method for extracting facts from unstructured text
CN105389379A (en) * 2015-11-20 2016-03-09 重庆邮电大学 Rubbish article classification method based on distributed feature representation of text
CN105740349A (en) * 2016-01-25 2016-07-06 重庆邮电大学 Sentiment classification method capable of combining Doc2vce with convolutional neural network
CN106055675A (en) * 2016-06-06 2016-10-26 杭州量知数据科技有限公司 Relation extracting method based on convolution neural network and distance supervision

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
朱兆龙: "《结合聚类去噪和类型约束的 Distant Supervision 关系抽取方法》", 《中国优秀硕士学位论文全文数据库信息科技辑》 *

Cited By (29)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107704558A (en) * 2017-09-28 2018-02-16 北京车慧互动广告有限公司 A kind of consumers' opinions abstracting method and system
CN107908757A (en) * 2017-11-21 2018-04-13 恒安嘉新(北京)科技股份公司 Website classification method and system
CN107908757B (en) * 2017-11-21 2020-05-26 恒安嘉新(北京)科技股份公司 Website classification method and system
CN108154234A (en) * 2017-12-04 2018-06-12 盈盛资讯科技有限公司 A kind of knowledge learning method and system based on template
CN108280055A (en) * 2017-12-04 2018-07-13 盈盛资讯科技有限公司 A kind of knowledge learning method and system based on binary crelation
CN108280058A (en) * 2018-01-02 2018-07-13 中国科学院自动化研究所 Relation extraction method and apparatus based on intensified learning
CN108427717A (en) * 2018-02-06 2018-08-21 北京航空航天大学 It is a kind of based on the alphabetic class family of languages medical treatment text Relation extraction method gradually extended
CN108427717B (en) * 2018-02-06 2021-09-03 北京航空航天大学 Letter class language family medical text relation extraction method based on gradual expansion
CN109145120A (en) * 2018-07-02 2019-01-04 北京妙医佳信息技术有限公司 The Relation extraction method and system of medical health domain knowledge map
CN109446300A (en) * 2018-09-06 2019-03-08 厦门快商通信息技术有限公司 A kind of corpus preprocess method, the pre- mask method of corpus and electronic equipment
CN109446300B (en) * 2018-09-06 2021-04-20 厦门快商通信息技术有限公司 Corpus preprocessing method, corpus pre-labeling method and electronic equipment
CN109271632A (en) * 2018-09-14 2019-01-25 重庆邂智科技有限公司 A kind of term vector learning method of supervision
CN109885698A (en) * 2019-02-13 2019-06-14 北京航空航天大学 A kind of knowledge mapping construction method and device, electronic equipment
CN111914555A (en) * 2019-05-09 2020-11-10 中国人民大学 Automatic relation extraction system based on Transformer structure
CN110209836A (en) * 2019-05-17 2019-09-06 北京邮电大学 Remote supervisory Relation extraction method and device
CN110209836B (en) * 2019-05-17 2022-04-26 北京邮电大学 Remote supervision relation extraction method and device
CN110458162B (en) * 2019-07-25 2023-06-23 上海兑观信息科技技术有限公司 Method for intelligently extracting image text information
CN110458162A (en) * 2019-07-25 2019-11-15 上海兑观信息科技技术有限公司 A kind of method of intelligent extraction pictograph information
CN110674265A (en) * 2019-08-06 2020-01-10 上海孚典智能科技有限公司 Unstructured information oriented feature discrimination and information recommendation system
CN110674265B (en) * 2019-08-06 2021-03-02 上海孚典智能科技有限公司 Unstructured information oriented feature discrimination and information recommendation system
CN110647919A (en) * 2019-08-27 2020-01-03 华东师范大学 Text clustering method and system based on K-means clustering and capsule network
CN110825851A (en) * 2019-11-07 2020-02-21 中电福富信息科技有限公司 Sentence pair relation discrimination method based on median conversion model
CN112016330A (en) * 2020-08-28 2020-12-01 平安国际智慧城市科技股份有限公司 Semantic parsing method, semantic parsing device and storage medium
CN112528045A (en) * 2020-12-23 2021-03-19 中译语通科技股份有限公司 Method and system for judging domain map relation based on open encyclopedia map
CN112528045B (en) * 2020-12-23 2024-04-02 中译语通科技股份有限公司 Method and system for judging domain map relation based on open encyclopedia map
CN112906368A (en) * 2021-02-19 2021-06-04 北京百度网讯科技有限公司 Industry text increment method, related device and computer program product
CN113688238A (en) * 2021-08-19 2021-11-23 支付宝(杭州)信息技术有限公司 Method and device for recognizing upper and lower word relations
CN114442623A (en) * 2022-01-20 2022-05-06 中国农业大学 Agricultural machinery operation track field segmentation method based on space-time diagram neural network
CN114442623B (en) * 2022-01-20 2023-10-24 中国农业大学 Agricultural machinery operation track Tian Lu segmentation method based on space-time diagram neural network

Similar Documents

Publication Publication Date Title
CN107145503A (en) Remote supervision non-categorical relation extracting method and system based on word2vec
CN111126386B (en) Sequence domain adaptation method based on countermeasure learning in scene text recognition
CN103116766B (en) A kind of image classification method of encoding based on Increment Artificial Neural Network and subgraph
CN112395393B (en) Remote supervision relation extraction method based on multitask and multiple examples
CN109684476B (en) Text classification method, text classification device and terminal equipment
CN107704558A (en) A kind of consumers&#39; opinions abstracting method and system
CN110059181A (en) Short text stamp methods, system, device towards extensive classification system
CN110188195B (en) Text intention recognition method, device and equipment based on deep learning
CN108197294A (en) A kind of text automatic generation method based on deep learning
CN109871885A (en) A kind of plants identification method based on deep learning and Plant Taxonomy
CN107679110A (en) The method and device of knowledge mapping is improved with reference to text classification and picture attribute extraction
CN107688576B (en) Construction and tendency classification method of CNN-SVM model
CN110223675A (en) The screening technique and system of training text data for speech recognition
CN111523324A (en) Training method and device for named entity recognition model
CN108846047A (en) A kind of picture retrieval method and system based on convolution feature
CN110442725A (en) Entity relation extraction method and device
Van Hieu et al. Automatic plant image identification of Vietnamese species using deep learning models
CN110751216A (en) Judgment document industry classification method based on improved convolutional neural network
CN103049490A (en) Attribute generation system and generation method among knowledge network nodes
CN110245226A (en) Enterprises &#39; industry classification method and its device
CN113673246A (en) Semantic fusion and knowledge distillation agricultural entity identification method and device
CN110765285A (en) Multimedia information content control method and system based on visual characteristics
CN113268370A (en) Root cause alarm analysis method, system, equipment and storage medium
CN108595426A (en) Term vector optimization method based on Chinese character pattern structural information
CN115146062A (en) Intelligent event analysis method and system fusing expert recommendation and text clustering

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20170908