CN107102989B - Entity disambiguation method based on word vector and convolutional neural network - Google Patents
Entity disambiguation method based on word vector and convolutional neural network Download PDFInfo
- Publication number
- CN107102989B CN107102989B CN201710373502.2A CN201710373502A CN107102989B CN 107102989 B CN107102989 B CN 107102989B CN 201710373502 A CN201710373502 A CN 201710373502A CN 107102989 B CN107102989 B CN 107102989B
- Authority
- CN
- China
- Prior art keywords
- entity
- disambiguated
- word
- neural network
- word vector
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/237—Lexical tools
- G06F40/247—Thesauruses; Synonyms
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Machine Translation (AREA)
Abstract
The invention provides an entity disambiguation method based on word vectors and a convolutional neural network. The method constructs semantic feature vectors respectively aiming at the context of an entity to be disambiguated and candidate entity abstract information in a knowledge base by depending on word vectors and a convolutional neural network trained by word2 vec. And calculating the cosine similarity of the characteristic vectors in the entity classification stage, and taking the candidate entity with the maximum similarity as the final target entity of the entity to be disambiguated. By the method, the semantic representation capability of the entity is greatly improved, and the accuracy of subsequent disambiguation is further improved.
Description
Technical Field
The invention belongs to the technical field of internet information, and particularly relates to an entity disambiguation method, in particular to an entity disambiguation method based on word vectors and a convolutional neural network.
Background
With the popularization of mobile internet, microblogs, blogs, posts, forums, various news websites, government work websites and the like greatly facilitate the life of people. Most of the data on the platforms exist in an unstructured or semi-structured form, so that a large amount of ambiguity exists in the data. If these ambiguous entities can be accurately disambiguated, great convenience is brought to later utilization.
Most of the existing mainstream entity disambiguation algorithm bottom models are based on bag-of-words models, and the bag-of-words models have inherent limitations, so that the algorithms cannot fully utilize semantic information of contexts, and the entity disambiguation effect has a great improvement space. Word embedding is a hot spot of machine learning in recent years, and the core idea of word embedding is to construct a distributed representation for each word, so that a gap between words is avoided. The convolutional neural network is a branch of a neural network model, and can effectively capture local features and then carry out global modeling. If the convolutional neural network can be used for modeling word embedding, semantic features more effective than the bag-of-words model can be obtained. And based on the thought of local perception and weight sharing, the parameters in the convolutional neural network model are greatly reduced, the training speed is high, and the core of Alphago of Google is two convolutional neural networks.
The invention combines the word vector and the convolutional neural network, constructs semantic representation respectively aiming at the context of the entity to be disambiguated and the entity abstract information of the knowledge base, trains the convolutional neural network and predicts. The semantic description capability of the entity context is greatly improved.
Disclosure of Invention
The purpose of the invention is as follows: the invention provides an entity disambiguation method based on word vectors and a convolutional neural network aiming at the current situation that the existing entity disambiguation method is difficult to utilize context semantic information, and aims to capture the context semantic information to help entity disambiguation.
The technical scheme is as follows:
an entity disambiguation method based on word vector and convolutional neural network comprises the following steps:
step 1: according to a text set which is collected by an application scene and contains entities to be disambiguated, preprocessing the text set, and determining each entity to be disambiguated and context characteristics thereof in the text set;
step 2: constructing a knowledge base of the entities to be disambiguated according to the domain knowledge, searching the knowledge base, and determining a candidate entity set of each entity to be disambiguated and description characteristics of each candidate entity in the set;
and step 3: taking word vectors of nouns in a fixed-size window taking an entity to be disambiguated as a center to form a word vector matrix as a context semantic feature of the entity to be disambiguated; taking word vectors of the first 20 nouns with larger weights after calculating TF & IDF of the abstract information of each entity in the knowledge base to form a word vector matrix as semantic features of the knowledge base entities;
and 4, step 4: combining known unambiguous entities in the text with target entities and candidate entities in a knowledge base to form a training set, inputting the training set into a convolutional neural network model for training, and adjusting parameters in the model;
and 5: inputting a sample consisting of each entity to be disambiguated and the knowledge base candidate entity set into the convolutional neural network model obtained in the step (4) to respectively obtain semantic feature vectors of each knowledge base entity in the entity to be disambiguated and the knowledge base candidate entity set;
step 6: calculating the cosine similarity of the entity to be disambiguated and each entity in the knowledge base candidate entity set based on the semantic feature vector; and taking the candidate entity with the maximum similarity as the final target entity of the entity to be disambiguated.
The preprocessing in the step 1 is to use Chinese word segmentation program ICTCCLAS of Chinese academy of sciences to label and segment words of the text set, then filter out stop words according to a stop word list, and create a name dictionary for proper nouns and entity names which are difficult to recognize.
And in the step 2, a Chinese word segmentation program ICTCCLAS is called to perform part-of-speech tagging and word segmentation on entity descriptions in the knowledge base, and stop words are filtered according to the stop word list.
The step 3 of forming a word vector matrix by word vectors of nouns in a fixed-size window with the entity to be disambiguated as the center specifically comprises the following steps:
1) calling a Google deep learning program word2vec to train a Wikipedia corpus so as to obtain a word vector table L, wherein the length of a word vector is 200 dimensions, and each dimension is a real number;
2) context to be disambiguated entity ee={w1,w2,…,wi,…,wKEach noun w iniInquiring the word vector table L to obtain the word vector v of each nouni;
3) Constructing a context word vector matrix [ v ] of the entity e to be disambiguated according to the word vector of the context word of the entity e to be disambiguated1,v2,v3,…vi,…,vK];
4) And (6) ending.
The step 3 of taking the word vectors of the first 20 nouns with larger weights after calculating the TF · IDF of the summary information of each entity in the knowledge base to form the word vector matrix specifically comprises:
1) for candidate entity set E ═ E1,e2,…,enEach candidate entity e iniEach noun w in the description of (1)iInquiring the word vector table L to obtain the word vector v of each nouni;
2) Constructing a word vector matrix of entity description according to the word vector of each noun in the description characteristics;
3) and (6) ending.
The convolutional neural network learning training of the step 4 specifically comprises the following steps:
1) each semantic feature to be disambiguated and the semantic features of the candidate entity set are used as a training sample and input into the neural network model;
2) convolving semantic features to be disambiguated, setting the number of convolution kernel feature maps to be 200, and setting the size of the convolution kernel feature maps to be [2, 200], namely setting a matrix with the length of 2 and the width of 200;
3) pooling the convolution result of each convolution kernel by using 1-max to obtain the characteristic of each convolution kernel;
4)200 convolution kernel features form an intermediate result, the intermediate result is input into a full connection layer, the size of the full connection layer is 50, and a 50-dimensional semantic feature vector is finally obtained;
5) the semantic features of the candidate entity set are added and averaged, then the sum and the average are input into a full connection layer, the size of the full connection layer is also 50, and finally a 50-dimensional semantic feature vector is obtained;
6) loss function Loss of each training sample in neural networkeIs defined as:
Losse=max(0,1-sim(e,e)+sim(e,e′))
wherein: e.g. of the typeRepresenting a target entity of an entity e to be disambiguated, and e' representing any other candidate entity in the candidate entity set, which means that the difference between the semantic feature vector similarity of the target entity and any other candidate entity is maximized;
the global Loss function is defined as Loss ∑ Losse;
7) Parameters in the neural network are initialized by uniformly distributed U (-0.01, 0.01);
8) the activation functions in the neural network all adopt tanh hyperbolic tangent activation functions;
9) parameters in the neural network are updated by adopting random gradient descent;
10) and (6) ending.
The step 6 of entity classification specifically comprises the following steps:
1) reading a semantic feature vector a of an entity e to be disambiguated from a file system;
2) reading a candidate entity set E ═ E from a file system1,e2,…,enSet of semantic feature vectors in { B ═ B } ═ B1,b2,…,bn};
3) Traversing the candidate entity set, and calculating the cosine similarity of each feature vector in E and E
5) and (6) ending.
Has the advantages that: the entity disambiguation method based on the word vector and the convolutional neural network constructs semantic representations for the entity to be disambiguated and the candidate entity of the knowledge base respectively. And training the neural network model by utilizing the training set, inputting the entity to be disambiguated into the trained neural network model when the entity is disambiguated, and outputting the most similar candidate entity of the entity to be disambiguated as a final target entity.
Drawings
For a more clear description of the invention, reference will now be made to the accompanying drawings, which form a part hereof and in which:
FIG. 1 is a block diagram of an entity disambiguation method based on word vector convolutional neural networks of the present invention.
Fig. 2 is a block diagram of a convolutional neural network model.
FIG. 3 is a flow chart of an entity classification phase.
Detailed Description
The present invention will be further described with reference to the accompanying drawings.
The flow chart of the entity disambiguation method based on the word vector and the convolutional neural network is shown in FIG. 1.
in the entity identification phase (steps 1-6):
in the entity semantic representation phase (step 7-10):
1) calling a Google deep learning program word2vec to train a Wikipedia corpus so as to obtain a word vector table L, wherein the length of a word vector is 200 dimensions, and each dimension is a real number;
2) context to be disambiguated entity ee={w1,w2,…,wi,…,wKEach noun w iniInquiring the word vector table L to obtain the word vector v of each nouni;
3) Constructing a context word vector matrix [ v ] of the entity e to be disambiguated according to the word vector of the context word of the entity e to be disambiguated1,v2,v3,…vi,…,vK];
4) And (6) ending.
1) for candidate entity set E ═ E1,e2,…,enEach candidate entity e iniEach noun w in the description of (1)iInquiring the word vector table L to obtain the word vector v of each nouni;
2) Constructing a word vector matrix of entity description according to the word vector of each noun in the description characteristics;
3) and (6) ending.
in the neural network learning training phase (step 11-12):
1) each semantic representation to be disambiguated and the semantic features of the candidate entity set are used as a training sample and input into the neural network model;
2) convolving semantic features to be disambiguated, setting the number of convolution kernel feature maps to be 200, and setting the size of the convolution kernel feature maps to be [2, 200], namely setting a matrix with the length of 2 and the width of 200;
3) pooling the convolution result of each convolution kernel by using 1-max to obtain the characteristic of each convolution kernel;
4)200 convolution kernel features form an intermediate result, the intermediate result is input into a full connection layer, the size of the full connection layer is 50, and a 50-dimensional semantic feature vector is finally obtained;
5) the semantic features of the candidate entity set are added and averaged, then the sum and the average are input into a full connection layer, the size of the full connection layer is also 50, and finally a 50-dimensional semantic feature vector is obtained;
6) loss function Loss of each training sample in neural networkeIs defined as:
Losse=max(0,1-sim(e,e)+sim(e,e′))
wherein: e.g. of the typeRepresenting a target entity of an entity e to be disambiguated, and e' representing any other candidate entity in the candidate entity set, which means that the difference between the semantic feature vector similarity of the target entity and any other candidate entity is maximized;
the global Loss function is defined as Loss ∑ Losse;
7) Parameters in the neural network are initialized by uniformly distributed U (-0.01, 0.01);
8) the activation functions in the neural network all adopt tanh hyperbolic tangent activation functions;
9) parameters in the neural network are updated by adopting random gradient descent;
10) and (6) ending.
In the entity classification phase (steps 13-14):
FIG. 2 is a detailed overview of the neural network structure of step 12 in the training phase for neural network learning of FIG. 1, including the following components:
a word vector matrix: taking a word vector matrix of the context of the entity to be disambiguated and a word vector matrix of the entity description characteristics of the knowledge base as the input of the convolutional neural network;
and (3) rolling layers: carrying out convolution on the context word vector matrix of the entity to be disambiguated through 200 different convolution kernels to obtain the characteristics of each convolution kernel;
1-max pooling layer: performing 1-max pooling on the output characteristics of the convolutional layer to obtain a 200-dimensional intermediate result;
full connection layer: connecting a full-connection layer with the size of 50 to the intermediate result, and connecting a full-connection layer with the size of 50 to the word vector sum average of the candidate entities in the knowledge base so as to obtain two 50-dimensional semantic feature vectors;
and (3) similarity calculation: calculating cosine similarity of the two semantic feature vectors;
FIG. 3 is a detailed flow description of step 14 in the entity classification phase of FIG. 1:
specifically, the method comprises the following steps: 1) reading a semantic feature vector a of an entity e to be disambiguated from a file system;
2) reading a candidate entity set E ═ E from a file system1,e2,…,enSet of semantic feature vectors in { B ═ B } ═ B1,b2,…,bn};
3) Traversing the candidate entity set, and calculating the cosine similarity of each feature vector in E and E
5) and (6) ending.
In summary, the present invention constructs a word vector matrix by comprehensively using the word vector and the convolutional neural network, and respectively constructing the word vector matrix for the context of the entity to be disambiguated and the abstract information of the candidate entity in the knowledge base, and inputting the word vector matrix into the convolutional neural network model. And training a convolutional neural network model, and adjusting parameters in the model. In the prediction phase, the most similar entity is output as the target entity. The problem that the semantic representation capability is insufficient due to the existence of a vocabulary gap in a traditional bag-of-words model is solved, and the accuracy of entity disambiguation is further improved.
The above description is only of the preferred embodiments of the present invention, and it should be noted that: it will be apparent to those skilled in the art that various modifications and adaptations can be made without departing from the principles of the invention and these are intended to be within the scope of the invention.
Claims (3)
1. An entity disambiguation method based on word vectors and convolutional neural networks is characterized in that: the method comprises the following steps:
step 1: according to a text set which is collected by an application scene and contains entities to be disambiguated, preprocessing the text set, and determining each entity to be disambiguated and context characteristics thereof in the text set;
step 2: constructing a knowledge base of the entities to be disambiguated according to the domain knowledge, searching the knowledge base, and determining a candidate entity set of each entity to be disambiguated and description characteristics of each candidate entity in the set;
and step 3: taking word vectors of nouns in a fixed-size window taking an entity to be disambiguated as a center to form a word vector matrix as a context semantic feature of the entity to be disambiguated; taking word vectors of the first 20 nouns with larger weights after calculating TF & IDF of the abstract information of each entity in the knowledge base to form a word vector matrix as semantic features of the knowledge base entities;
the method for forming the word vector matrix by using the word vectors of the nouns in the fixed-size window with the entity to be disambiguated as the center specifically comprises the following steps:
1) calling a Google deep learning program word2vec to train a Wikipedia corpus so as to obtain a word vector table L, wherein the length of a word vector is 200 dimensions, and each dimension is a real number;
2) context to be disambiguated entity ee={w1,w2,…,wi,…,wKEach noun w iniInquiring the word vector table L to obtain the word vector v of each nouni;
3) Constructing a context word vector matrix [ v ] of the entity e to be disambiguated according to the word vector of the context word of the entity e to be disambiguated1,v2,v3,…vi,…,vK];
4) Finishing;
the specific method for forming the word vector matrix by taking the word vectors of the first 20 nouns with larger weights after calculating TF & IDF of the abstract information of each entity in the knowledge base is as follows:
1) for candidate entity set E ═ E1,e2,…,enEach candidate entity e iniEach noun w in the description of (1)iInquiring the word vector table L to obtain the word vector v of each nouni;
2) Constructing a word vector matrix of entity description according to the word vector of each noun in the description characteristics;
3) finishing;
and 4, step 4: combining known unambiguous entities in the text with target entities and candidate entities in a knowledge base to form a training set, inputting the training set into a convolutional neural network model for training, and adjusting parameters in the model; the method specifically comprises the following steps:
1) each semantic feature to be disambiguated and the semantic features of the candidate entity set are used as a training sample and input into the neural network model;
2) convolving semantic features to be disambiguated, setting the number of convolution kernel feature maps to be 200, and setting the size of the convolution kernel feature maps to be [2, 200], namely setting a matrix with the length of 2 and the width of 200;
3) pooling the convolution result of each convolution kernel by using 1-max to obtain the characteristic of each convolution kernel;
4)200 convolution kernel features form an intermediate result, the intermediate result is input into a full connection layer, the size of the full connection layer is 50, and a 50-dimensional semantic feature vector is finally obtained;
5) the semantic features of the candidate entity set are added and averaged, then the sum and the average are input into a full connection layer, the size of the full connection layer is also 50, and finally a 50-dimensional semantic feature vector is obtained;
6) loss function Loss of each training sample in neural networkeIs defined as:
Losse=max(0,1-sim(e,e)+sim(e,e′))
wherein: e.g. of the typeRepresenting a target entity of an entity e to be disambiguated, and e' representing any other candidate entity in the candidate entity set, which means that the difference between the semantic feature vector similarity of the target entity and any other candidate entity is maximized;
the global Loss function is defined as Loss ∑ Losse;
7) Parameters in the neural network are initialized by uniformly distributed U (-0.01, 0.01);
8) the activation functions in the neural network all adopt tanh hyperbolic tangent activation functions;
9) parameters in the neural network are updated by adopting random gradient descent;
10) finishing;
and 5: inputting a sample consisting of each entity to be disambiguated and the knowledge base candidate entity set into the convolutional neural network model obtained in the step (4) to respectively obtain semantic feature vectors of each knowledge base entity in the entity to be disambiguated and the knowledge base candidate entity set;
step 6: calculating the cosine similarity of the entity to be disambiguated and each entity in the knowledge base candidate entity set based on the semantic feature vector; taking the candidate entity with the maximum similarity as the final target entity of the entity to be disambiguated; the method specifically comprises the following steps:
1) reading a semantic feature vector a of an entity e to be disambiguated from a file system;
2) reading a candidate entity set E ═ E from a file system1,e2,…,enSet of semantic feature vectors in { B ═ B } ═ B1,b2,…,bn};
3) Traversing the candidate entity set, and calculating the cosine similarity l of each feature vector in E and Ei,
5) and (6) ending.
2. The entity disambiguation method of claim 1, characterized in that: the preprocessing in the step 1 is to use Chinese word segmentation program ICTCCLAS of Chinese academy of sciences to perform part-of-speech tagging and word segmentation on a text set, then filter out stop words according to a stop word list, and create a noun dictionary for proper nouns and entity names which are difficult to recognize.
3. The entity disambiguation method of claim 1, characterized in that: and in the step 2, a Chinese word segmentation program ICTCCLAS is called to perform part-of-speech tagging and word segmentation on entity descriptions in the knowledge base, and stop words are filtered according to the stop word list.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710373502.2A CN107102989B (en) | 2017-05-24 | 2017-05-24 | Entity disambiguation method based on word vector and convolutional neural network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710373502.2A CN107102989B (en) | 2017-05-24 | 2017-05-24 | Entity disambiguation method based on word vector and convolutional neural network |
Publications (2)
Publication Number | Publication Date |
---|---|
CN107102989A CN107102989A (en) | 2017-08-29 |
CN107102989B true CN107102989B (en) | 2020-09-29 |
Family
ID=59670296
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710373502.2A Active CN107102989B (en) | 2017-05-24 | 2017-05-24 | Entity disambiguation method based on word vector and convolutional neural network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107102989B (en) |
Families Citing this family (55)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107562729B (en) * | 2017-09-14 | 2020-12-08 | 云南大学 | Party building text representation method based on neural network and theme enhancement |
CN107730002B (en) * | 2017-10-13 | 2020-06-02 | 国网湖南省电力公司 | Intelligent fuzzy comparison method for remote control parameters of communication gateway machine |
CN107729509B (en) * | 2017-10-23 | 2020-07-07 | 中国电子科技集团公司第二十八研究所 | Discourse similarity determination method based on recessive high-dimensional distributed feature representation |
CN110019792A (en) * | 2017-10-30 | 2019-07-16 | 阿里巴巴集团控股有限公司 | File classification method and device and sorter model training method |
CN108280061B (en) * | 2018-01-17 | 2021-10-26 | 北京百度网讯科技有限公司 | Text processing method and device based on ambiguous entity words |
CN108304552B (en) * | 2018-02-01 | 2021-01-08 | 浙江大学 | Named entity linking method based on knowledge base feature extraction |
CN108335731A (en) * | 2018-02-09 | 2018-07-27 | 辽宁工程技术大学 | A kind of invalid diet's recommendation method based on computer vision |
CN108399230A (en) * | 2018-02-13 | 2018-08-14 | 上海大学 | A kind of Chinese financial and economic news file classification method based on convolutional neural networks |
CN108446269B (en) * | 2018-03-05 | 2021-11-23 | 昆明理工大学 | Word sense disambiguation method and device based on word vector |
CN108573047A (en) * | 2018-04-18 | 2018-09-25 | 广东工业大学 | A kind of training method and device of Module of Automatic Chinese Documents Classification |
CN108563766A (en) * | 2018-04-19 | 2018-09-21 | 天津科技大学 | The method and device of food retrieval |
CN108959242B (en) * | 2018-05-08 | 2021-07-27 | 中国科学院信息工程研究所 | Target entity identification method and device based on part-of-speech characteristics of Chinese characters |
CN108647785A (en) * | 2018-05-17 | 2018-10-12 | 普强信息技术(北京)有限公司 | A kind of neural network method for automatic modeling, device and storage medium |
CN108647191B (en) * | 2018-05-17 | 2021-06-25 | 南京大学 | Sentiment dictionary construction method based on supervised sentiment text and word vector |
CN108804595B (en) * | 2018-05-28 | 2021-07-27 | 中山大学 | Short text representation method based on word2vec |
CN110555208B (en) * | 2018-06-04 | 2021-11-19 | 北京三快在线科技有限公司 | Ambiguity elimination method and device in information query and electronic equipment |
CN108805290B (en) * | 2018-06-28 | 2021-03-12 | 国信优易数据股份有限公司 | Entity category determination method and device |
CN108921213B (en) * | 2018-06-28 | 2021-06-22 | 国信优易数据股份有限公司 | Entity classification model training method and device |
CN109101579B (en) * | 2018-07-19 | 2021-11-23 | 深圳追一科技有限公司 | Customer service robot knowledge base ambiguity detection method |
CN108920467B (en) * | 2018-08-01 | 2021-04-27 | 北京三快在线科技有限公司 | Method and device for learning word meaning of polysemous word and search result display method |
CN109325108B (en) * | 2018-08-13 | 2022-05-27 | 北京百度网讯科技有限公司 | Query processing method, device, server and storage medium |
CN109241294A (en) * | 2018-08-29 | 2019-01-18 | 国信优易数据有限公司 | A kind of entity link method and device |
CN109214007A (en) * | 2018-09-19 | 2019-01-15 | 哈尔滨理工大学 | A kind of Chinese sentence meaning of a word based on convolutional neural networks disappears qi method |
CN109299462B (en) * | 2018-09-20 | 2022-11-29 | 武汉理工大学 | Short text similarity calculation method based on multi-dimensional convolution characteristics |
CN109614615B (en) * | 2018-12-04 | 2022-04-22 | 联想(北京)有限公司 | Entity matching method and device and electronic equipment |
CN109740728B (en) * | 2018-12-10 | 2019-11-01 | 杭州世平信息科技有限公司 | A kind of measurement of penalty calculation method based on a variety of neural network ensembles |
CN109635114A (en) * | 2018-12-17 | 2019-04-16 | 北京百度网讯科技有限公司 | Method and apparatus for handling information |
CN109933788B (en) * | 2019-02-14 | 2023-05-23 | 北京百度网讯科技有限公司 | Type determining method, device, equipment and medium |
CN110263324B (en) * | 2019-05-16 | 2021-02-12 | 华为技术有限公司 | Text processing method, model training method and device |
CN110598846B (en) * | 2019-08-15 | 2022-05-03 | 北京航空航天大学 | Hierarchical recurrent neural network decoder and decoding method |
CN110705292B (en) * | 2019-08-22 | 2022-11-29 | 成都信息工程大学 | Entity name extraction method based on knowledge base and deep learning |
CN110569506A (en) * | 2019-09-05 | 2019-12-13 | 清华大学 | Medical named entity recognition method based on medical dictionary |
CN110705295B (en) * | 2019-09-11 | 2021-08-24 | 北京航空航天大学 | Entity name disambiguation method based on keyword extraction |
CN110674304A (en) * | 2019-10-09 | 2020-01-10 | 北京明略软件系统有限公司 | Entity disambiguation method and device, readable storage medium and electronic equipment |
CN110826331B (en) * | 2019-10-28 | 2023-04-18 | 南京师范大学 | Intelligent construction method of place name labeling corpus based on interactive and iterative learning |
CN110852106B (en) * | 2019-11-06 | 2024-05-03 | 腾讯科技(深圳)有限公司 | Named entity processing method and device based on artificial intelligence and electronic equipment |
CN110852108B (en) * | 2019-11-11 | 2022-03-29 | 中山大学 | Joint training method, apparatus and medium for entity recognition and entity disambiguation |
CN113010633B (en) * | 2019-12-20 | 2023-01-31 | 海信视像科技股份有限公司 | Information interaction method and equipment |
CN111241298B (en) * | 2020-01-08 | 2023-10-10 | 腾讯科技(深圳)有限公司 | Information processing method, apparatus, and computer-readable storage medium |
CN111241824B (en) * | 2020-01-09 | 2020-11-24 | 中国搜索信息科技股份有限公司 | Method for identifying Chinese metaphor information |
CN111310481B (en) * | 2020-01-19 | 2021-05-18 | 百度在线网络技术(北京)有限公司 | Speech translation method, device, computer equipment and storage medium |
CN111597804B (en) * | 2020-05-15 | 2023-03-10 | 腾讯科技(深圳)有限公司 | Method and related device for training entity recognition model |
CN111709243B (en) * | 2020-06-19 | 2023-07-07 | 南京优慧信安科技有限公司 | Knowledge extraction method and device based on deep learning |
CN112069826B (en) * | 2020-07-15 | 2021-12-07 | 浙江工业大学 | Vertical domain entity disambiguation method fusing topic model and convolutional neural network |
CN112100356A (en) * | 2020-09-17 | 2020-12-18 | 武汉纺织大学 | Knowledge base question-answer entity linking method and system based on similarity |
CN112257443B (en) * | 2020-09-30 | 2024-04-02 | 华泰证券股份有限公司 | MRC-based company entity disambiguation method combined with knowledge base |
CN112464669B (en) * | 2020-12-07 | 2024-02-09 | 宁波深擎信息科技有限公司 | Stock entity word disambiguation method, computer device, and storage medium |
CN112966117A (en) * | 2020-12-28 | 2021-06-15 | 成都数之联科技有限公司 | Entity linking method |
CN112580351B (en) * | 2020-12-31 | 2022-04-19 | 成都信息工程大学 | Machine-generated text detection method based on self-information loss compensation |
CN113761218B (en) * | 2021-04-27 | 2024-05-10 | 腾讯科技(深圳)有限公司 | Method, device, equipment and storage medium for entity linking |
CN113283236B (en) * | 2021-05-31 | 2022-07-19 | 北京邮电大学 | Entity disambiguation method in complex Chinese text |
CN113361283B (en) * | 2021-06-28 | 2024-09-24 | 东南大学 | Paired entity joint disambiguation method for Web form |
CN113704416B (en) * | 2021-10-26 | 2022-03-04 | 深圳市北科瑞声科技股份有限公司 | Word sense disambiguation method and device, electronic equipment and computer-readable storage medium |
CN114298028B (en) * | 2021-12-13 | 2024-09-03 | 盈嘉互联(北京)科技有限公司 | BIM semantic disambiguation method and system |
CN116976324A (en) * | 2022-04-21 | 2023-10-31 | 北京沃东天骏信息技术有限公司 | Disambiguation method and disambiguation device for product words |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104572892A (en) * | 2014-12-24 | 2015-04-29 | 中国科学院自动化研究所 | Text classification method based on cyclic convolution network |
CN106295796A (en) * | 2016-07-22 | 2017-01-04 | 浙江大学 | Entity link method based on degree of depth study |
CN106547735A (en) * | 2016-10-25 | 2017-03-29 | 复旦大学 | The structure and using method of the dynamic word or word vector based on the context-aware of deep learning |
CN106570170A (en) * | 2016-11-09 | 2017-04-19 | 武汉泰迪智慧科技有限公司 | Text classification and naming entity recognition integrated method and system based on depth cyclic neural network |
-
2017
- 2017-05-24 CN CN201710373502.2A patent/CN107102989B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104572892A (en) * | 2014-12-24 | 2015-04-29 | 中国科学院自动化研究所 | Text classification method based on cyclic convolution network |
CN106295796A (en) * | 2016-07-22 | 2017-01-04 | 浙江大学 | Entity link method based on degree of depth study |
CN106547735A (en) * | 2016-10-25 | 2017-03-29 | 复旦大学 | The structure and using method of the dynamic word or word vector based on the context-aware of deep learning |
CN106570170A (en) * | 2016-11-09 | 2017-04-19 | 武汉泰迪智慧科技有限公司 | Text classification and naming entity recognition integrated method and system based on depth cyclic neural network |
Non-Patent Citations (1)
Title |
---|
基于中文维基百科的命名实体消歧方法;杜婧君等;《杭州电子科技大学学报》;20121231;第32卷(第6期);第57-60页 * |
Also Published As
Publication number | Publication date |
---|---|
CN107102989A (en) | 2017-08-29 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107102989B (en) | Entity disambiguation method based on word vector and convolutional neural network | |
WO2021114745A1 (en) | Named entity recognition method employing affix perception for use in social media | |
Kumar et al. | Identifying clickbait: A multi-strategy approach using neural networks | |
CN104615767B (en) | Training method, search processing method and the device of searching order model | |
CN106570141B (en) | Approximate repeated image detection method | |
CN104391942B (en) | Short essay eigen extended method based on semantic collection of illustrative plates | |
CN112084331A (en) | Text processing method, text processing device, model training method, model training device, computer equipment and storage medium | |
CN107085581A (en) | Short text classification method and device | |
CN107480143A (en) | Dialogue topic dividing method and system based on context dependence | |
CN106095749A (en) | A kind of text key word extracting method based on degree of depth study | |
Tuan et al. | Multimodal fusion with BERT and attention mechanism for fake news detection | |
CN110347790B (en) | Text duplicate checking method, device and equipment based on attention mechanism and storage medium | |
CN112434533B (en) | Entity disambiguation method, entity disambiguation device, electronic device, and computer-readable storage medium | |
CN110222328B (en) | Method, device and equipment for labeling participles and parts of speech based on neural network and storage medium | |
Zhang et al. | Relation classification via BiLSTM-CNN | |
CN106933787A (en) | Adjudicate the computational methods of document similarity, search device and computer equipment | |
CN108304377B (en) | Extraction method of long-tail words and related device | |
CN110569405A (en) | method for extracting government affair official document ontology concept based on BERT | |
CN105760363B (en) | Word sense disambiguation method and device for text file | |
CN105740448B (en) | More microblogging timing abstract methods towards topic | |
CN117251551B (en) | Natural language processing system and method based on large language model | |
JP7181999B2 (en) | SEARCH METHOD AND SEARCH DEVICE, STORAGE MEDIUM | |
CN110390104B (en) | Irregular text transcription method and system for voice dialogue platform | |
Le Huy et al. | Keyphrase extraction model: a new design and application on tourism information | |
CN113076744A (en) | Cultural relic knowledge relation extraction method based on convolutional neural network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |