CN107145503A - Remote supervision non-categorical relation extracting method and system based on word2vec - Google Patents
Remote supervision non-categorical relation extracting method and system based on word2vec Download PDFInfo
- Publication number
- CN107145503A CN107145503A CN201710166727.0A CN201710166727A CN107145503A CN 107145503 A CN107145503 A CN 107145503A CN 201710166727 A CN201710166727 A CN 201710166727A CN 107145503 A CN107145503 A CN 107145503A
- Authority
- CN
- China
- Prior art keywords
- sentence
- training corpus
- categorical
- word2vec
- relation
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/36—Creation of semantic tools, e.g. ontology or thesauri
- G06F16/367—Ontology
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Animal Behavior & Ethology (AREA)
- Computational Linguistics (AREA)
- Machine Translation (AREA)
Abstract
The present invention discloses a kind of remote supervision non-categorical relation extracting method and system based on word2vec, can more accurately extract the non-categorical relation in vegetables field.Method includes:The network vegetables field unstructured text data of network encyclopaedia and large-scale vegetables website is crawled as language material, language material is pre-processed successively, preliminary training corpus is obtained;Word2vec models are trained using preliminary training corpus, the space vector of each sentence is obtained using word2vec models;Preliminary training corpus is polymerize according to non-categorical relationship type, for the aggregated data of each relation, common sentence pattern and uncommon sentence pattern is extracted;Selection two meets the sentence space vector of two kinds of different modes as the initial center of k means clustering methods respectively, all sentence space vectors is clustered, selection meets a class of common sentence pattern, obtains the preferable training corpus of quality;By the preferable training corpus training convolutional neural networks model of quality, by full softmax layers of a connection, non-categorical relation is extracted.
Description
Technical field
The present invention relates to Weakly supervised classification field, and in particular to a kind of remote supervision non-categorical relation based on word2vec is carried
Take method and system.
Background technology
Currently in terms of the class ontology knowledge collection of illustrative plates of agriculture field, research also in the starting stage, non-categorical relation (except
Other relations of hyponymy classification relation) pertinent literature report it is also fewer.Although there is document respectively towards ancient agriculture
Learn the study that non-categorical relation has been also related to Tea Science field, e.g., He Lin's《The semi-automatic structure of domain body and retrieval are ground
Study carefully》, Xu Jicheng《Towards the body learning Modeling Research in vegetables field》Deng, but be all to employ most basic correlation rule side
Method finds the concept pair that there is relation.The relation species not only extracted is not enough enriched, and language material does not have essentially from books and document yet
Have and utilize data resource huge on Web.And the accuracy rate of the non-categorical relation extracted is also far below general classification relation
Extract accuracy rate.
Non-categorical Relation extraction is carried out using remote measure of supervision, label noise can be produced more, Zeng, D. et al. exists
《Distant Supervision for Relation Extraction via Piecewise Convolutional
Neural Networks》Using many case-based learning methods removal noise, Takamatsu S et al. exist《Reducing wrong
labels in distant supervision for relation extraction》Label is removed using high-quality template
Noise.
But the clustering algorithm that label noise is removed in most of remote supervision relation recognition methods does not take into full account vector
Grammer, semantic information in space between each term vector, and during network encyclopaedia and vegetables website are described to the entry of vegetable variety,
Contextual information is critically important, and on relation extract influence it is very big, therefore, how to provide a kind of degree of accuracy it is higher be applied to vegetable
The non-categorical relation extracting method in dish field, as technical problem urgently to be resolved hurrily.
The content of the invention
For defect of the prior art, the embodiment of the present invention provides a kind of remote supervision non-categorical based on word2vec and closed
It is extracting method and system.
On the one hand, the embodiment of the present invention proposes a kind of remote supervision non-categorical relation extracting method based on word2vec, bag
Include:
S1, the network vegetables field unstructured text data of network encyclopaedia and large-scale vegetables website is crawled as language material,
The language material is pre-processed successively, alignment of data, obtain preliminary training corpus;
In the present embodiment, the language material is pre-processed successively, alignment of data is specially that the language material is carried out successively
The processing such as participle, part-of-speech tagging, and by alignment of data in result and knowledge base.
S2, word2vec models are trained using the preliminary training corpus, and will be upper using the word2vec models
The word stated in the sentence in preliminary training corpus changes into space vector, for each sentence, by the sky of the word in the sentence
Between addition of vectors and do the space vector that average treatment obtains the sentence;
S3, the preliminary training corpus polymerize according to non-categorical relationship type, for polymerization obtain it is each
The aggregated data of relation, extracts common sentence pattern and uncommon sentence pattern;
S4, k is set to 2, selects two sentence space vectors for meeting two kinds of different modes respectively to make with heuristics manner
For the initial center of k-means clustering methods, and all sentence space vectors are clustered, selection meets common sentence pattern
A class, obtain the preferable training corpus of quality;
S5, by the preferable training corpus training convolutional neural networks model of the quality, by constituting the convolutional Neural
A convolutional layer, a pond layer and full softmax layers of a connection for network model, is extracted from the space vector of the sentence
Non-categorical relation.
On the other hand, a kind of remote supervision non-categorical relation extraction system based on word2vec of the embodiment of the present invention, including:
Acquiring unit, the network vegetables field unstructured text data for crawling network encyclopaedia and large-scale vegetables website
As language material, the language material is pre-processed successively, alignment of data, obtain preliminary training corpus;
Training unit, for training word2vec models using the preliminary training corpus, and described in
Word in sentence in above-mentioned preliminary training corpus is changed into space vector by word2vec models, for each sentence, will
The space vector phase adduction of word in the sentence does the space vector that average treatment obtains the sentence;
Polymerized unit, for the preliminary training corpus to be polymerize according to non-categorical relationship type, for polymerization
The aggregated data of obtained each relation, extracts common sentence pattern and uncommon sentence pattern;
Cluster cell, for k to be set into 2, two sentences for meeting two kinds of different modes respectively are selected with heuristics manner
Space vector and is clustered as the initial center of k-means clustering methods to all sentence space vectors, and selection meets normal
See a class of sentence pattern, obtain the preferable training corpus of quality;
Extraction unit, for by the preferable training corpus training convolutional neural networks model of the quality, by constituting
A convolutional layer, a pond layer and full softmax layers of a connection for convolutional neural networks model is stated, from the sky of the sentence
Between vector extract non-categorical relation.
The remote supervision non-categorical relation extracting method and system based on word2vec that the embodiment of the present invention is proposed, with network
Vegetables field non-structured text is language material, carries out language material training using word2vec instruments, label is reduced by clustering algorithm
Noise, finally utilizes convolutional neural networks model extraction non-categorical relation.Word2vec instruments used not only train term vector
With high efficiency, and the term vector obtained can obtain grammer, semantic information, and this allows for clustering by clustering algorithm
To sentence there is syntactic and semantic information, this by effective guarantee far supervise remove label noise effect.In addition, utilizing convolution
Neural network model, which extracts non-categorical relation, can be prevented effectively from the processing procedure error accumulation of natural language processing instrument multistage
Problem, thus, compared to the grammer, the prior art of semantic information not taken into full account in vector space between each term vector,
The present invention is more suitable for vegetables field, and the degree of accuracy that non-categorical relation is extracted is higher.
Brief description of the drawings
Fig. 1 shows for a kind of flow of the remote embodiment of supervision non-categorical relation extracting method one based on word2vec of the present invention
It is intended to;
Fig. 2 shows for a kind of structure of the remote embodiment of supervision non-categorical relation extraction system one based on word2vec of the present invention
It is intended to.
Embodiment
To make the purpose, technical scheme and advantage of the embodiment of the present invention clearer, below in conjunction with the embodiment of the present invention
In accompanying drawing, the technical scheme in the embodiment of the present invention is explicitly described, it is clear that described embodiment be the present invention
A part of embodiment, rather than whole embodiments.Based on the embodiment in the present invention, those of ordinary skill in the art are not having
The every other embodiment obtained under the premise of creative work is made, the scope of protection of the invention is belonged to.
Referring to Fig. 1, the present embodiment discloses a kind of remote supervision non-categorical relation extracting method based on word2vec, including:
S1, the network vegetables field unstructured text data of network encyclopaedia and large-scale vegetables website is crawled as language material,
The language material is pre-processed successively, alignment of data, obtain preliminary training corpus;
The S1, can include:
S10, using write language material collection shell script capture non-knot from network vegetables encyclopaedia and large-scale vegetables website
Structure text data does the pretreatment such as low-frequency word filtering, participle, part-of-speech tagging to the language material as language material;
S11, the language material for obtaining step S10 are alignd with the relationship example in default knowledge base, obtain preliminary
Training corpus,
This step be based on the assumption that:If there is certain semantic relation between two concepts, then all to include this
The sentence of two entitative concepts also expresses this relation,
For example,<Health-care effect, tomato, stomach>Non-categorical relation, all include is focused to find out according to above-mentioned hypothesis from text
The sentence I and II of " tomato " and " stomach ":
I. " tomato has effects that stomach strengthening and digestion promoting ";
II. " food tomato often results in stomach upset, stomach distending pain on an empty stomach ",
Non-categorical relationship example just constitutes an align data with these sentences, but II is not expressed as from the foregoing
The relation of " health-care effect ", belongs to noise data, below step will remove label noise, and extract vegetables field non-categorical pass
System:
S2, word2vec models are trained using the preliminary training corpus, and will be upper using the word2vec models
The word stated in the sentence in preliminary training corpus changes into space vector, for each sentence, by the sky of the word in the sentence
Between addition of vectors and do the space vector that average treatment obtains the sentence;
The Word2vec used in step S2 is a Software tool for being used to train term vector that Google companies open,
It is empty that each word in sentence is quickly effectively mapped to k dimensions by it according to given corpus, by the training pattern after optimization
Between in the vector with actual value, and these vectors obtain grammer, semantic feature, its core architecture include CBOW and
Skip-gram。
Wherein, CBOW models simply understand to be exactly that context determines the probability that current word occurs, and the present invention uses Skip-
Gram models, this model is to predict the probability that context occurs with current word., usually can be because of place when handling language material
The limitation of window size is managed, causes the relation between the word and current word of window ranges to arrive mould by correctly reflection
Among type, if the complexity of training can be increased by expanding window merely again.Skip-gram models by " skipping some characters " very
Good solves this problem.For example 2 four-tuples of network encyclopaedia entry " eggplant growth requires higher temperature ", are " eggplant respectively
Son growth require it is higher ", " growth requires higher temperature " all do not express sentence original idea.Skip-gram models but allow some
Word is skipped, if skipping two words, there is four-tuple " eggplant requirement higher temperature ", and " eggplant growth higher temperature " can be expressed
Original idea.Comprised the following steps that using word2vec instruments:
(1) word2vec models, are trained using the training corpus tentatively obtained;
(2) space vector of each word in language material sentence, can be obtained by word2vec models, these term vectors include language
Method and semantic information.The space vector phase adduction of all words in each sentence is done into handling averagely and obtains corresponding each sentence
Vector.Such as, sentence " fresh kidney beans are rich in protein, carrotene, are of high nutritive value ", by the word2vec models of training, can obtain
To " fresh kidney beans ", " being rich in ", " protein ", " carrotene ", " nutrition ", " value ", " height " space vector, by upper predicate to
Amount phase adduction does the average space vector that can obtain whole sentence.
S3, the preliminary training corpus polymerize according to non-categorical relationship type, for polymerization obtain it is each
The aggregated data of relation, extracts common sentence pattern and uncommon sentence pattern;
The S3, can include:
S30, the training corpus tentatively obtained is polymerize according to the non-categorical relationship type contained by sentence, it is right
In the aggregated data of each relation, sentence pattern is found using DL-CoTrain algorithms, one of them common sentence mould is extracted
Formula and a uncommon sentence pattern, that is, select and cause h (x)=high model of (count (x)+a)/(N+ka) score values, wherein
K is classification number 2, and a represents smoothing parameter (generally 0.1), and count (x) represents the number of times that feature x occurs, and N represents a kind of non-
The number of the align data of classification relation;
S4, k is set to 2, selects two sentence space vectors for meeting two kinds of different modes respectively to make with heuristics manner
For the initial center of k-means clustering methods, and all sentence space vectors are clustered, selection meets common sentence pattern
A class, obtain the preferable training corpus of quality;
The S4, can include:
S40, two sentences for meeting different models of selection are used as the initial center of two classes;
S41, k is set to 2, all sentences for meeting both sentence patterns are gathered using K-means clustering algorithms
Class, selection meets a class of common sentence pattern.This process is because be based on the text space word with syntactic and semantic information
Vector, therefore the sentence finally given also has syntactic and semantic information, can effectively remove label noise, obtain quality preferable
Training corpus;
S5, by the preferable training corpus training convolutional neural networks model of the quality, by constituting the convolutional Neural
Full softmax layers of the connection of one of network model, non-categorical relation is extracted from the space vector of the sentence.
The S5, can include:
S50, by the preferable training corpus training convolutional neural networks model of the quality, by the space vector of the sentence
The convolutional neural networks are inputted, text feature is automatically extracted by the convolutional layer of the convolutional neural networks successively, pond layer is done
Down-sampling, full articulamentum exports the prediction probability of non-categorical relation, wherein, the convolutional neural networks model includes a convolution
Layer, a pond layer and full softmax layers of a connection.
It is understood that convolutional neural networks structure includes a convolutional layer, a pond layer and a full connection
Softmax layers, multiple sentence characteristics values are automatically extracted by each convolutional layer, can select most heavy using maximum pond operation
And there are the sentence characteristics of regular length.The sentence characteristics vector that finally all convolutional layers are generated is concatenated, and is obtained
One new sentence characteristics vector, all characteristic vectors are integrated, and are connected entirely eventually as incoming one an of characteristic vector
In softmax layers, the probability distribution of each non-categorical relation is finally exported.
The remote supervision non-categorical relation extracting method based on word2vec that the present embodiment is proposed, it is non-with network vegetables field
Structured text is language material, and language material training is carried out using word2vec instruments, and label noise, last profit are reduced by clustering algorithm
With convolutional neural networks model extraction non-categorical relation.Word2vec instruments used not only train term vector to have high efficiency,
And the term vector obtained can obtain grammer, semantic information, this allows for clustering obtained sentence tool by clustering algorithm
There is syntactic and semantic information, this far supervises effective guarantee the effect for removing label noise.In addition, utilizing convolutional neural networks mould
Type, which extracts non-categorical relation, can be prevented effectively from natural language processing instrument multistage processing procedure error accumulation problem, thus,
Compared to the grammer, the prior art of semantic information not taken into full account in vector space between each term vector, the present invention is more suitable
For vegetables field, and the degree of accuracy that non-categorical relation is extracted is higher.
Referring to Fig. 2, the present embodiment discloses a kind of remote supervision non-categorical relation extraction system based on word2vec, including:
Acquiring unit 1, the network vegetables field non-structured text number for crawling network encyclopaedia and large-scale vegetables website
According to as language material, being pre-processed successively to the language material, alignment of data, obtain preliminary training corpus;
In the present embodiment, the acquiring unit can include:
Subelement is captured, for gathering shell script from network vegetables encyclopaedia and large-scale vegetables website using the language material write
Upper crawl unstructured text data does the pre- places such as low-frequency word filtering, participle, part-of-speech tagging to the language material as language material
Reason;
Align subelement, and the language material for the crawl subelement to be obtained enters with the relationship example in default knowledge base
Row alignment, obtains preliminary training corpus.
Training unit 2, for training word2vec models using the preliminary training corpus, and described in
Word in sentence in above-mentioned preliminary training corpus is changed into space vector by word2vec models, for each sentence, will
The space vector phase adduction of word in the sentence does the space vector that average treatment obtains the sentence;
Polymerized unit 3, for the preliminary training corpus to be polymerize according to non-categorical relationship type, for polymerization
The aggregated data of obtained each relation, extracts common sentence pattern and uncommon sentence pattern;
The polymerized unit, specifically can be used for:
The training corpus tentatively obtained is polymerize according to the non-categorical relationship type contained by sentence, for every
The aggregated data of individual relation, using DL-CoTrain algorithms find sentence pattern, extract one of them common sentence pattern and
One uncommon sentence pattern.
Cluster cell 4, for k to be set into 2, two sentences for meeting two kinds of different modes respectively are selected with heuristics manner
Space vector and is clustered as the initial center of k-means clustering methods to all sentence space vectors, and selection meets normal
See a class of sentence pattern, obtain the preferable training corpus of quality;
Extraction unit 5, for by the preferable training corpus training convolutional neural networks model of the quality, by constituting
A convolutional layer, a pond layer and full softmax layers of a connection for convolutional neural networks model is stated, from the sky of the sentence
Between vector extract non-categorical relation.
The extraction unit, specifically can be used for:
By the preferable training corpus training convolutional neural networks model of the quality, the space vector of the sentence is inputted
The convolutional neural networks, automatically extract text feature by the convolutional layer of the convolutional neural networks successively, and pond layer, which is done down, to be adopted
Sample, full articulamentum exports the prediction probability of non-categorical relation, wherein, the convolutional neural networks model includes a convolutional layer,
One pond layer and full softmax layers of a connection.
The remote supervision non-categorical relation extraction system based on word2vec that the present embodiment is proposed, it is non-with network vegetables field
Structured text is language material, and language material training is carried out using word2vec instruments, and label noise, last profit are reduced by clustering algorithm
With convolutional neural networks model extraction non-categorical relation.Word2vec instruments used not only train term vector to have high efficiency,
And the term vector obtained can obtain grammer, semantic information, this allows for clustering obtained sentence tool by clustering algorithm
There is syntactic and semantic information, this far supervises effective guarantee the effect for removing label noise.In addition, utilizing convolutional neural networks mould
Type, which extracts non-categorical relation, can be prevented effectively from natural language processing instrument multistage processing procedure error accumulation problem, thus,
Compared to the grammer, the prior art of semantic information not taken into full account in vector space between each term vector, the present invention is more suitable
For vegetables field, and the degree of accuracy that non-categorical relation is extracted is higher.
The invention has the advantages that;
In terms of application field, this invention address that extracting vegetables field non-categorical relation, non-categorical relation is in very great Cheng
Degree can improve the accuracy rate and recall rate of information inquiry in the magnanimity information of vegetables field, increase the completeness of knowledge representation, will
The intelligent semantic information service of vegetables information for needed for rapidly and accurately obtaining people brings possibility, improves vegetables Informatization
The level of service.
Although being described in conjunction with the accompanying embodiments of the present invention, those skilled in the art can not depart from this hair
Various modifications and variations are made in the case of bright spirit and scope, such modifications and variations are each fallen within by appended claims
Within limited range.
Claims (8)
1. a kind of remote supervision non-categorical relation extracting method based on word2vec, it is characterised in that including:
S1, the network vegetables field unstructured text data of network encyclopaedia and large-scale vegetables website is crawled as language material, to institute
Predicate material is pre-processed successively, alignment of data, obtains preliminary training corpus;
S2, train word2vec models using the preliminary training corpus, and using the word2vec models will it is above-mentioned at the beginning of
The word in sentence in the training corpus of step changes into space vector, for each sentence, by the space of the word in the sentence to
Amount phase adduction does the space vector that average treatment obtains the sentence;
S3, the preliminary training corpus polymerize according to non-categorical relationship type, each relation obtained for polymerization
Aggregated data, extract common sentence pattern and uncommon sentence pattern;
S4, k is set to 2, selects two sentence space vectors for meeting two kinds of different modes to be respectively used as k- using heuristics manner
The initial center of means clustering methods, and all sentence space vectors are clustered, selection meets the one of common sentence pattern
Class, obtains the preferable training corpus of quality;
S5, by the preferable training corpus training convolutional neural networks model of the quality, by constituting the convolutional neural networks
A convolutional layer, a pond layer and full softmax layers of a connection for model, extracts overstepping one's bounds from the space vector of the sentence
Class relation.
2. the remote supervision non-categorical relation extracting method according to claim 1 based on word2vec, it is characterised in that institute
S1 is stated, including:
S10, using write language material collection shell script captured from network vegetables encyclopaedia and large-scale vegetables website it is unstructured
Text data does the pretreatment such as low-frequency word filtering, participle, part-of-speech tagging to the language material as language material;
S11, the language material for obtaining step S10 are alignd with the relationship example in default knowledge base, obtain preliminary training
Language material.
3. the remote supervision non-categorical relation extracting method according to claim 2 based on word2vec, it is characterised in that institute
S3 is stated, including:
S30, the training corpus tentatively obtained is polymerize according to the non-categorical relationship type contained by sentence, for every
The aggregated data of individual relation, using DL-CoTrain algorithms find sentence pattern, extract one of them common sentence pattern and
One uncommon sentence pattern.
4. the remote supervision non-categorical relation extracting method according to claim 3 based on word2vec, it is characterised in that institute
S5 is stated, including:
S50, by the preferable training corpus training convolutional neural networks model of the quality, the space vector of the sentence is inputted
The convolutional neural networks, automatically extract text feature by the convolutional layer of the convolutional neural networks successively, and pond layer, which is done down, to be adopted
Sample, full articulamentum exports the prediction probability of non-categorical relation, wherein, the convolutional neural networks model includes a convolutional layer,
One pond layer and full softmax layers of a connection.
5. a kind of remote supervision non-categorical relation extraction system based on word2vec, it is characterised in that including:
Acquiring unit, the network vegetables field unstructured text data conduct for crawling network encyclopaedia and large-scale vegetables website
Language material, is pre-processed, alignment of data successively to the language material, obtains preliminary training corpus;
Training unit, for training word2vec models using the preliminary training corpus, and utilizes the word2vec moulds
Word in sentence in above-mentioned preliminary training corpus is changed into space vector by type, for each sentence, by the sentence
The space vector phase adduction of word does the space vector that average treatment obtains the sentence;
Polymerized unit, for the preliminary training corpus to be polymerize according to non-categorical relationship type, is obtained for polymerization
Each relation aggregated data, extract common sentence pattern and uncommon sentence pattern;
Cluster cell, for k to be set into 2, two sentence spaces for meeting two kinds of different modes respectively are selected with heuristics manner
The vectorial initial center as k-means clustering methods, and all sentence space vectors are clustered, selection meets common sentence
One class of subpattern, obtains the preferable training corpus of quality;
Extraction unit, for by the preferable training corpus training convolutional neural networks model of the quality, by constituting the volume
Product one convolutional layer of neural network model, a pond layer and one connect softmax layers entirely, from the space of the sentence to
Amount extracts non-categorical relation.
6. the remote supervision non-categorical relation extraction system according to claim 5 based on word2vec, it is characterised in that institute
Acquiring unit is stated, including:
Subelement is captured, for being grabbed using the language material collection shell script write from network vegetables encyclopaedia and large-scale vegetables website
Unstructured text data is taken as language material, and the pretreatment such as low-frequency word filtering, participle, part-of-speech tagging is done to the language material;
Align subelement, for language material and the relationship example progress pair in default knowledge base for obtaining the crawl subelement
Together, preliminary training corpus is obtained.
7. the remote supervision non-categorical relation extraction system according to claim 6 based on word2vec, it is characterised in that institute
Polymerized unit is stated, specifically for:
The training corpus tentatively obtained is polymerize according to the non-categorical relationship type contained by sentence, closed for each
The aggregated data of system, finds sentence pattern using DL-CoTrain algorithms, extracts one of them common sentence pattern and one
Uncommon sentence pattern.
8. the remote supervision non-categorical relation extraction system according to claim 7 based on word2vec, it is characterised in that institute
Extraction unit is stated, specifically for:
By the preferable training corpus training convolutional neural networks model of the quality, the space vector of the sentence is inputted described
Convolutional neural networks, automatically extract text feature by the convolutional layer of the convolutional neural networks successively, and pond layer does down-sampling, entirely
Articulamentum exports the prediction probability of non-categorical relation, wherein, the convolutional neural networks model includes a convolutional layer, a pond
Change layer and full softmax layers of a connection.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710166727.0A CN107145503A (en) | 2017-03-20 | 2017-03-20 | Remote supervision non-categorical relation extracting method and system based on word2vec |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710166727.0A CN107145503A (en) | 2017-03-20 | 2017-03-20 | Remote supervision non-categorical relation extracting method and system based on word2vec |
Publications (1)
Publication Number | Publication Date |
---|---|
CN107145503A true CN107145503A (en) | 2017-09-08 |
Family
ID=59783444
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710166727.0A Pending CN107145503A (en) | 2017-03-20 | 2017-03-20 | Remote supervision non-categorical relation extracting method and system based on word2vec |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107145503A (en) |
Cited By (21)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107704558A (en) * | 2017-09-28 | 2018-02-16 | 北京车慧互动广告有限公司 | A kind of consumers' opinions abstracting method and system |
CN107908757A (en) * | 2017-11-21 | 2018-04-13 | 恒安嘉新(北京)科技股份公司 | Website classification method and system |
CN108154234A (en) * | 2017-12-04 | 2018-06-12 | 盈盛资讯科技有限公司 | A kind of knowledge learning method and system based on template |
CN108280055A (en) * | 2017-12-04 | 2018-07-13 | 盈盛资讯科技有限公司 | A kind of knowledge learning method and system based on binary crelation |
CN108280058A (en) * | 2018-01-02 | 2018-07-13 | 中国科学院自动化研究所 | Relation extraction method and apparatus based on intensified learning |
CN108427717A (en) * | 2018-02-06 | 2018-08-21 | 北京航空航天大学 | It is a kind of based on the alphabetic class family of languages medical treatment text Relation extraction method gradually extended |
CN109145120A (en) * | 2018-07-02 | 2019-01-04 | 北京妙医佳信息技术有限公司 | The Relation extraction method and system of medical health domain knowledge map |
CN109271632A (en) * | 2018-09-14 | 2019-01-25 | 重庆邂智科技有限公司 | A kind of term vector learning method of supervision |
CN109446300A (en) * | 2018-09-06 | 2019-03-08 | 厦门快商通信息技术有限公司 | A kind of corpus preprocess method, the pre- mask method of corpus and electronic equipment |
CN109885698A (en) * | 2019-02-13 | 2019-06-14 | 北京航空航天大学 | A kind of knowledge mapping construction method and device, electronic equipment |
CN110209836A (en) * | 2019-05-17 | 2019-09-06 | 北京邮电大学 | Remote supervisory Relation extraction method and device |
CN110458162A (en) * | 2019-07-25 | 2019-11-15 | 上海兑观信息科技技术有限公司 | A kind of method of intelligent extraction pictograph information |
CN110647919A (en) * | 2019-08-27 | 2020-01-03 | 华东师范大学 | Text clustering method and system based on K-means clustering and capsule network |
CN110674265A (en) * | 2019-08-06 | 2020-01-10 | 上海孚典智能科技有限公司 | Unstructured information oriented feature discrimination and information recommendation system |
CN110825851A (en) * | 2019-11-07 | 2020-02-21 | 中电福富信息科技有限公司 | Sentence pair relation discrimination method based on median conversion model |
CN111914555A (en) * | 2019-05-09 | 2020-11-10 | 中国人民大学 | Automatic relation extraction system based on Transformer structure |
CN112016330A (en) * | 2020-08-28 | 2020-12-01 | 平安国际智慧城市科技股份有限公司 | Semantic parsing method, semantic parsing device and storage medium |
CN112528045A (en) * | 2020-12-23 | 2021-03-19 | 中译语通科技股份有限公司 | Method and system for judging domain map relation based on open encyclopedia map |
CN112906368A (en) * | 2021-02-19 | 2021-06-04 | 北京百度网讯科技有限公司 | Industry text increment method, related device and computer program product |
CN113688238A (en) * | 2021-08-19 | 2021-11-23 | 支付宝(杭州)信息技术有限公司 | Method and device for recognizing upper and lower word relations |
CN114442623A (en) * | 2022-01-20 | 2022-05-06 | 中国农业大学 | Agricultural machinery operation track field segmentation method based on space-time diagram neural network |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150154193A1 (en) * | 2013-12-02 | 2015-06-04 | Qbase, LLC | System and method for extracting facts from unstructured text |
CN105389379A (en) * | 2015-11-20 | 2016-03-09 | 重庆邮电大学 | Rubbish article classification method based on distributed feature representation of text |
CN105740349A (en) * | 2016-01-25 | 2016-07-06 | 重庆邮电大学 | Sentiment classification method capable of combining Doc2vce with convolutional neural network |
CN106055675A (en) * | 2016-06-06 | 2016-10-26 | 杭州量知数据科技有限公司 | Relation extracting method based on convolution neural network and distance supervision |
-
2017
- 2017-03-20 CN CN201710166727.0A patent/CN107145503A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150154193A1 (en) * | 2013-12-02 | 2015-06-04 | Qbase, LLC | System and method for extracting facts from unstructured text |
CN105389379A (en) * | 2015-11-20 | 2016-03-09 | 重庆邮电大学 | Rubbish article classification method based on distributed feature representation of text |
CN105740349A (en) * | 2016-01-25 | 2016-07-06 | 重庆邮电大学 | Sentiment classification method capable of combining Doc2vce with convolutional neural network |
CN106055675A (en) * | 2016-06-06 | 2016-10-26 | 杭州量知数据科技有限公司 | Relation extracting method based on convolution neural network and distance supervision |
Non-Patent Citations (1)
Title |
---|
朱兆龙: "《结合聚类去噪和类型约束的 Distant Supervision 关系抽取方法》", 《中国优秀硕士学位论文全文数据库信息科技辑》 * |
Cited By (29)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107704558A (en) * | 2017-09-28 | 2018-02-16 | 北京车慧互动广告有限公司 | A kind of consumers' opinions abstracting method and system |
CN107908757A (en) * | 2017-11-21 | 2018-04-13 | 恒安嘉新(北京)科技股份公司 | Website classification method and system |
CN107908757B (en) * | 2017-11-21 | 2020-05-26 | 恒安嘉新(北京)科技股份公司 | Website classification method and system |
CN108154234A (en) * | 2017-12-04 | 2018-06-12 | 盈盛资讯科技有限公司 | A kind of knowledge learning method and system based on template |
CN108280055A (en) * | 2017-12-04 | 2018-07-13 | 盈盛资讯科技有限公司 | A kind of knowledge learning method and system based on binary crelation |
CN108280058A (en) * | 2018-01-02 | 2018-07-13 | 中国科学院自动化研究所 | Relation extraction method and apparatus based on intensified learning |
CN108427717A (en) * | 2018-02-06 | 2018-08-21 | 北京航空航天大学 | It is a kind of based on the alphabetic class family of languages medical treatment text Relation extraction method gradually extended |
CN108427717B (en) * | 2018-02-06 | 2021-09-03 | 北京航空航天大学 | Letter class language family medical text relation extraction method based on gradual expansion |
CN109145120A (en) * | 2018-07-02 | 2019-01-04 | 北京妙医佳信息技术有限公司 | The Relation extraction method and system of medical health domain knowledge map |
CN109446300A (en) * | 2018-09-06 | 2019-03-08 | 厦门快商通信息技术有限公司 | A kind of corpus preprocess method, the pre- mask method of corpus and electronic equipment |
CN109446300B (en) * | 2018-09-06 | 2021-04-20 | 厦门快商通信息技术有限公司 | Corpus preprocessing method, corpus pre-labeling method and electronic equipment |
CN109271632A (en) * | 2018-09-14 | 2019-01-25 | 重庆邂智科技有限公司 | A kind of term vector learning method of supervision |
CN109885698A (en) * | 2019-02-13 | 2019-06-14 | 北京航空航天大学 | A kind of knowledge mapping construction method and device, electronic equipment |
CN111914555A (en) * | 2019-05-09 | 2020-11-10 | 中国人民大学 | Automatic relation extraction system based on Transformer structure |
CN110209836A (en) * | 2019-05-17 | 2019-09-06 | 北京邮电大学 | Remote supervisory Relation extraction method and device |
CN110209836B (en) * | 2019-05-17 | 2022-04-26 | 北京邮电大学 | Remote supervision relation extraction method and device |
CN110458162B (en) * | 2019-07-25 | 2023-06-23 | 上海兑观信息科技技术有限公司 | Method for intelligently extracting image text information |
CN110458162A (en) * | 2019-07-25 | 2019-11-15 | 上海兑观信息科技技术有限公司 | A kind of method of intelligent extraction pictograph information |
CN110674265A (en) * | 2019-08-06 | 2020-01-10 | 上海孚典智能科技有限公司 | Unstructured information oriented feature discrimination and information recommendation system |
CN110674265B (en) * | 2019-08-06 | 2021-03-02 | 上海孚典智能科技有限公司 | Unstructured information oriented feature discrimination and information recommendation system |
CN110647919A (en) * | 2019-08-27 | 2020-01-03 | 华东师范大学 | Text clustering method and system based on K-means clustering and capsule network |
CN110825851A (en) * | 2019-11-07 | 2020-02-21 | 中电福富信息科技有限公司 | Sentence pair relation discrimination method based on median conversion model |
CN112016330A (en) * | 2020-08-28 | 2020-12-01 | 平安国际智慧城市科技股份有限公司 | Semantic parsing method, semantic parsing device and storage medium |
CN112528045A (en) * | 2020-12-23 | 2021-03-19 | 中译语通科技股份有限公司 | Method and system for judging domain map relation based on open encyclopedia map |
CN112528045B (en) * | 2020-12-23 | 2024-04-02 | 中译语通科技股份有限公司 | Method and system for judging domain map relation based on open encyclopedia map |
CN112906368A (en) * | 2021-02-19 | 2021-06-04 | 北京百度网讯科技有限公司 | Industry text increment method, related device and computer program product |
CN113688238A (en) * | 2021-08-19 | 2021-11-23 | 支付宝(杭州)信息技术有限公司 | Method and device for recognizing upper and lower word relations |
CN114442623A (en) * | 2022-01-20 | 2022-05-06 | 中国农业大学 | Agricultural machinery operation track field segmentation method based on space-time diagram neural network |
CN114442623B (en) * | 2022-01-20 | 2023-10-24 | 中国农业大学 | Agricultural machinery operation track Tian Lu segmentation method based on space-time diagram neural network |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107145503A (en) | Remote supervision non-categorical relation extracting method and system based on word2vec | |
CN111126386B (en) | Sequence domain adaptation method based on countermeasure learning in scene text recognition | |
CN103116766B (en) | A kind of image classification method of encoding based on Increment Artificial Neural Network and subgraph | |
CN112395393B (en) | Remote supervision relation extraction method based on multitask and multiple examples | |
CN109684476B (en) | Text classification method, text classification device and terminal equipment | |
CN107704558A (en) | A kind of consumers' opinions abstracting method and system | |
CN110059181A (en) | Short text stamp methods, system, device towards extensive classification system | |
CN110188195B (en) | Text intention recognition method, device and equipment based on deep learning | |
CN108197294A (en) | A kind of text automatic generation method based on deep learning | |
CN109871885A (en) | A kind of plants identification method based on deep learning and Plant Taxonomy | |
CN107679110A (en) | The method and device of knowledge mapping is improved with reference to text classification and picture attribute extraction | |
CN107688576B (en) | Construction and tendency classification method of CNN-SVM model | |
CN110223675A (en) | The screening technique and system of training text data for speech recognition | |
CN111523324A (en) | Training method and device for named entity recognition model | |
CN108846047A (en) | A kind of picture retrieval method and system based on convolution feature | |
CN110442725A (en) | Entity relation extraction method and device | |
Van Hieu et al. | Automatic plant image identification of Vietnamese species using deep learning models | |
CN110751216A (en) | Judgment document industry classification method based on improved convolutional neural network | |
CN103049490A (en) | Attribute generation system and generation method among knowledge network nodes | |
CN110245226A (en) | Enterprises ' industry classification method and its device | |
CN113673246A (en) | Semantic fusion and knowledge distillation agricultural entity identification method and device | |
CN110765285A (en) | Multimedia information content control method and system based on visual characteristics | |
CN113268370A (en) | Root cause alarm analysis method, system, equipment and storage medium | |
CN108595426A (en) | Term vector optimization method based on Chinese character pattern structural information | |
CN115146062A (en) | Intelligent event analysis method and system fusing expert recommendation and text clustering |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20170908 |