CN112257443A - MRC-based company entity disambiguation method combined with knowledge base - Google Patents
MRC-based company entity disambiguation method combined with knowledge base Download PDFInfo
- Publication number
- CN112257443A CN112257443A CN202011070276.9A CN202011070276A CN112257443A CN 112257443 A CN112257443 A CN 112257443A CN 202011070276 A CN202011070276 A CN 202011070276A CN 112257443 A CN112257443 A CN 112257443A
- Authority
- CN
- China
- Prior art keywords
- entity
- mrc
- task
- loss
- knowledge base
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 37
- 238000012549 training Methods 0.000 claims abstract description 27
- 230000006870 function Effects 0.000 claims description 35
- 238000013528 artificial neural network Methods 0.000 claims description 8
- 239000011159 matrix material Substances 0.000 claims description 6
- 239000013598 vector Substances 0.000 claims description 6
- 230000008569 process Effects 0.000 claims description 4
- 238000004364 calculation method Methods 0.000 claims description 2
- 238000002372 labelling Methods 0.000 abstract description 2
- 241000220225 Malus Species 0.000 description 4
- 238000010586 diagram Methods 0.000 description 2
- 230000007246 mechanism Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000006467 substitution reaction Methods 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 235000013399 edible fruits Nutrition 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000002349 favourable effect Effects 0.000 description 1
- 230000004927 fusion Effects 0.000 description 1
- 238000012423 maintenance Methods 0.000 description 1
- 238000005065 mining Methods 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 230000008092 positive effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
- G06F40/295—Named entity recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
- G06F16/3344—Query execution using natural language analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
- G06F40/211—Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Data Mining & Analysis (AREA)
- Biomedical Technology (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Evolutionary Computation (AREA)
- Biophysics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Life Sciences & Earth Sciences (AREA)
- Databases & Information Systems (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Machine Translation (AREA)
Abstract
The invention discloses a company entity disambiguation method based on MRC combined with a knowledge base, which comprises the following steps: acquiring a statement to be disambiguated; splicing the sentence to be disambiguated and the question sentence to obtain an MRC structure; acquiring different entity description sentences corresponding to the ambiguity abbreviation in the sentences to be disambiguated from an entity knowledge base; splicing different entity description sentences at the end of the MRC structure; inputting the MRC structure spliced with different entity description sentences into a Bert model; and the Bert model outputs an ambiguity corresponding to the real entity for short, so as to realize statement disambiguation. The method effectively improves the accuracy of model prediction, and simultaneously has the generalization capability of the supervision model, thereby avoiding the need of re-labeling and model training when a company entity is newly added.
Description
Technical Field
The invention relates to the field of artificial intelligence, in particular to a company entity disambiguation method based on MRC combined with a knowledge base.
Background
The text information is the main medium for information dissemination of company entities, and the accurate positioning of the company entities (company association) where news occurs directly determines how to carry out downstream financial work. In the financial information, many company entities (in tens of millions of company entities) appear in the form of domain short names, which is very easy to cause ambiguity. For example, the common people may refer to a listed company or "general public"; the wuliangye can be marketed company or liquor. The essence of entity disambiguation is that a word may have multiple meanings, the exact meaning it expresses being determined in conjunction with the context and knowledge of the knowledge base. The ambiguity resolution of the company entity has important significance for subsequent understanding of the financial news information content and accurate related company entity information.
At the present stage, the common methods for company entity disambiguation are: (1) the regular expression matching method comprises the following steps: maintaining positive and negative sample (unambiguous is a positive sample, and ambiguous is a negative sample) rules of all possible ambiguous companies, and judging whether ambiguity exists in a regular matching mode; (2) the unsupervised sample clustering-based method comprises the following steps: mining positive and negative sample clusters by semantic clustering of a text containing a company entity for short, and carrying out disambiguation; (3) a method based on supervised sample classification: and training a binary model by marking positive and negative samples of the company which possibly generates ambiguity to disambiguate.
Among the above methods, the regular expression matching based method has high accuracy, but has low recall, poor expansibility, and low efficiency because the rule base needs to be continuously maintained manually; on one hand, the accuracy is relatively low due to lack of supervision information, and on the other hand, for each newly added entity of the company to be disambiguated, corresponding unsupervised corpora need to be newly added and clustered again; on the one hand, the method based on supervised sample classification cannot determine the specific ambiguous class of the negative sample, namely the ambiguous sample, because only two classifications are carried out on the positive and negative samples, and on the other hand, the description of the knowledge base on the entity cannot be effectively utilized because the information of the entity knowledge base is not introduced.
Disclosure of Invention
Aiming at the defects of the prior art, the invention provides a company entity disambiguation method based on MRC combined with a knowledge base, so as to solve the problem of relatively low accuracy rate in the prior art.
In order to solve the technical problems, the technical scheme adopted by the invention is as follows:
a MRC-based corporate entity disambiguation method incorporating a knowledge base, comprising the steps of:
acquiring a statement to be disambiguated;
splicing the sentence to be disambiguated and the question sentence to obtain an MRC structure;
acquiring different entity description sentences corresponding to the ambiguity abbreviation in the sentences to be disambiguated from an entity knowledge base;
splicing different entity description sentences at the end of the MRC structure;
inputting the MRC structure spliced with different entity description sentences into a Bert model;
and the Bert model outputs an ambiguity corresponding to the real entity for short, so as to realize statement disambiguation.
Furthermore, two loss functions are arranged at the output end of the Bert model; the loss function includes a first task loss function and a second task loss function.
Further, the first task loss function is a binary loss; the second task loss function is a multi-classification loss.
Further, the first task loss function is expressed by the following formula:
output1=Sigmoid(W1×H[CLS])
loss1=binary_crossentropy(output1,label1)
in the formula, output1A model output representing task one; sigmoid () represents a logical (logical) function; w1A weight matrix representing an output of a task computation task; h【CLS】Representing semantic vectors at the beginning positions of the periods; loss1Represents a loss of task one; binary _ crossentrypy () represents a two-class cross entropy loss computation function; label1A true tag representing task one.
Further, the second task loss function is expressed by the following formula:
entity_outputi=Sigmoid(Wentity_i×Hentity_i)
lossentity_i=binary_crossentropy(entity_outputi,labelentity_i)
in the formula, entity _ outputiRepresenting the model output at the ith entity; wentity_iRepresenting calculating a weight matrix of an output at an ith entity; hentity_iRepresenting a semantic vector at the ith entity location; lossentity_iRepresents the loss of the ith entity; labelentity_iA real tag representing an i-th entity; loss2Represents the loss of task two; n represents the number of ambiguous entities or simply the possible corresponding entities.
Further, the Bert model disambiguates the MRC structure spliced with different entity description statements through the first task loss function and the second task loss function, and the specific disambiguation process is as follows:
judging whether an ambiguity abbreviation exists in the statement to be disambiguated or not through the first task loss function;
and if so, acquiring the true entity corresponding to the ambiguity abbreviation from different entity description sentences through a second task loss function.
Further, the Bert model is stacked by 12 layers of the basic neural network structure.
Further, training the Bert model, and inputting the MRC structure into the trained Bert model to realize statement disambiguation; the training method of the Bert model comprises the following steps:
setting parameters of a basic neural network structure in a Bert model;
initializing parameters of the last 3 layers of basic neural network structures at equal probability randomly;
and training the Bert model after the parameters are initialized randomly, and stopping training after the loss function of the Bert model is converged to obtain the Bert model after the training is optimized.
Compared with the prior art, the invention has the following beneficial effects:
the invention inputs more effective information into the model through the introduction of entity description in the entity knowledge base, improves the prediction capability of the model, simultaneously utilizes the input construction mode of MRC, conforms to the input characteristics of a pretraining stage of a Bert model, further improves the accuracy rate of entity disambiguation, and further finely distinguishes different types of ambiguities through the specific classification of entity reference content, and accelerates the convergence of the model and enhances the training stability through the use of multi-task learning and weight reinitialization.
Drawings
FIG. 1 is a sample diagram of a similar reading understanding input constructed by way of concatenation with a question;
FIG. 2 is a sample diagram of a sentence described by effectively associating a sentence to be disambiguated with an entity.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail with reference to the following embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
A MRC-based corporate entity disambiguation method incorporating a knowledge base, comprising the steps of:
(1) acquiring a statement to be disambiguated;
(2) splicing the sentences to be disambiguated and the question sentences to obtain an MRC structure;
firstly, constructing input in a reading and understanding-like mode, namely splicing a sentence to be disambiguated as a reading and understanding text with a question to obtain an MRC structure and taking the MRC structure as input.
(3) Acquiring different entity description sentences corresponding to the ambiguity abbreviation in the sentences to be disambiguated from an entity knowledge base;
(4) splicing different entity description sentences at the end of the MRC structure;
entity description statements in which the same abbreviation is used but different references are concatenated at the end of the MRC structure input by means of the entity knowledge base for providing more detailed disambiguation information.
(5) Inputting the MRC structure spliced with different entity description sentences into a Bert model;
(6) and the Bert model outputs an ambiguity corresponding to the real entity for short, so as to realize statement disambiguation.
At the output end of the Bert model, two tasks are designed and losses are accumulated by utilizing the characteristic that multi-task learning can be mutually promoted, wherein one is binary loss for judging whether ambiguity exists or not, and the other is multi-classification loss for determining a specific ambiguity category.
At the initial training of the Bert model, only the model weight close to the input layer is reserved, and the model weight close to the output layer is initialized randomly again, so that the convergence speed of the model can be accelerated, the loss jitter in the training process is reduced, and the training stability is improved.
The method comprises the following specific steps:
step 1-construct input of MRC mode
The method comprises the steps of constructing model input in an MRC mode, using a sentence needing disambiguation as a reading text, and constructing input similar to reading understanding through a mode of splicing with a question sentence, wherein an example is shown in FIG. 1, the sentence to be disambiguated in FIG. 1 is 'apple futures continuous pressure bearing, but fluctuation is gradually reduced', a reading understanding problem is 'indication of an ambiguous main body', and a question and the sentence to be disambiguated 'pay attention to each other' through a Self-attention (Self-attention) mechanism, so that the model learns to extract information favorable for disambiguation from the sentence according to the question.
Step 2-input in conjunction with entity knowledge base
In order to fully utilize effective information in the structured knowledge base, a structure which can fuse the knowledge of the model and the knowledge base needs to be designed. The most effective information in the entity knowledge base comes from entity description sentences, all possible entity descriptions for ambiguity short are spliced at the end of MRC structure input in sequence, and the sentence to be disambiguated and the entity description sentences are effectively associated through an attention mechanism, for example, as shown in FIG. 2, the ambiguity in the sentence to be disambiguated is simply referred to as "apple", and the keyword has a plurality of corresponding entities in the structured entity knowledge base, such as "Malus plant", that is, the fruit that we eat usually, and "one high-tech company", that is, the apple company in the United states. Through the fusion mode, the structured information in the entity knowledge base can be effectively utilized, the model is helped to understand the semantics of the sentence to be disambiguated, and the disambiguation accuracy is improved.
Step3 multitask learning
At the output, the design of the loss function is also crucial. And designing two tasks by utilizing the characteristic that the multi-task learning can mutually promote, and accumulating the loss.
1) The first task is to distinguish whether the sentence to be disambiguated contains an ambiguous entity abbreviation;
output1=Sigmoid(W1×H[CLS])
loss1=binary_crossentropy(output1,label1)
in the formula, output1A model output representing task one; sigmoid () represents a logical (logical) function; w1A weight matrix representing an output of a task computation task; h【CLS】Representing semantic vectors at the beginning positions of the periods; loss1Represents a loss of task one; binary _ cross entropy () represents a two-class cross entropy loss calculation function; label1A true tag representing task one.
The second task is to determine which entity the ambiguous entity corresponds to in short;
entity_outputi=Sigmoid(Wentity_i×Hentity_i)
lossentity_i=binary_crossentropy(entity_outputi,labelentity_i)
in the formula, entity _ outputiRepresenting the model output at the ith entity; wentity_iRepresenting calculating a weight matrix of an output at an ith entity; hentity_iRepresenting a semantic vector at the ith entity location; lossentity_iRepresents the loss of the ith entity; labelentity_iA real tag representing an i-th entity; loss2Represents the loss of task two; n represents the number of ambiguous entities or simply the possible corresponding entities.
Step 4-weight reinitialization
Generally, for the model based on the Bert, only the structural parameters of all 12 layers of basic neural networks of the pre-trained Bert model need to be reserved, and fine tuning is directly performed on a downstream task. However, the training process of this method is unstable and the convergence rate is slow. The reason is that the weights of all 12-layer network structures of the pre-trained Bert model do not have positive effects on downstream tasks, general semantic information such as part of speech, syntax and the like is learned close to an input layer, and strongly related knowledge of the downstream tasks is learned close to an output layer. Obviously, the downstream task during the Bert pre-training is different from the disambiguation task in the scheme, so that the network weight close to the output layer in the pre-trained Bert has a negative effect on the training of the downstream task, and a weight reinitialization method is provided for solving the problem that the training of the downstream task is unstable:
1) copying all 12 layers of basic neural network structure parameters in the pretrained Bert into the model of the scheme;
2) replacing parameters of the last 3-layer network structure of the model in the scheme by an equal probability random initialization mode between 0 and 1;
3) training the model after the final 3-layer network structure parameters are reinitialized, and stopping training after the model loss function is converged to obtain a model after training optimization;
4) and (4) receiving input by using the trained model to obtain disambiguation output.
Step 1-construct a knowledge base containing information of company entities, which needs to contain various entities corresponding to the entities possibly causing ambiguity and description information thereof.
Step 2-label a certain amount of supervised corpora, including unambiguous corpora and ambiguous corpora, wherein the ambiguous corpora need to be labeled specifically for the corresponding ambiguous entity class.
Step 3-constructing input in MRC mode, splicing question sentences and sentences to be disambiguated, and splicing description sentences of all possible corresponding entities for short one by one at sentence tail.
Step 4-input Bert model, calculate two losses at the output and superimpose them, one is binary loss to determine if the sentence is ambiguous to the corporate entity description, and the other is multi-classification loss to determine the specific ambiguous class.
Step 5-training is initiated by keeping only the Bert weights close to the input layer and re-initializing the weights close to the output layer.
And Step 6-finishing training to obtain a complete company entity ambiguity resolution model, wherein in prediction, the input is consistent with that of Step3, and two outputs are provided, one is used for judging whether the sentence contains ambiguous company entities, and the other is used for determining the specific ambiguity categories corresponding to the entities.
The invention discloses a company Entity Disambiguation (Entity Disambiguation) method based on MRC (machine Reading company) combined with a knowledge base. Aiming at the ambiguity problem existing in company entity association, the technology is based on a Bert model, firstly a disambiguation question is constructed in an MRC (machine Reading compatibility) mode, then model input is constructed by combining entity description information in an entity knowledge base, at an output end, whether the entity binary loss and specifically the entity multi-classification loss are included is accumulated through multi-task learning, a loss function is constructed, finally the convergence speed of model training is accelerated through Weight Re-initialization (Weight Re-initialization), and the training stability is improved. The invention effectively solves the ambiguity problem of company entities which are mostly referred to as short in a large amount of financial news information, avoids semantic understanding deviation caused by company entities with various meanings, improves the accuracy rate of company association, and provides important basic technical support for various downstream financial analysis algorithms.
Compared with a regular expression matching-based method, the method has the advantages that by means of initial labeled linguistic data and by means of the generalization capability of the model, a large amount of follow-up manual rule maintenance work can be effectively avoided. Compared with a method based on unsupervised sample clustering, the introduction of the labeled data effectively improves the accuracy of model prediction, and the generalization capability of the supervised model also avoids the need of re-labeling and model training when company entities are added.
The present invention is not limited to the above-mentioned embodiments, and based on the technical solutions disclosed in the present invention, those skilled in the art can make some substitutions and modifications to some technical features without creative efforts according to the disclosed technical contents, and these substitutions and modifications are all within the protection scope of the present invention.
Claims (8)
1. A MRC-based corporate entity disambiguation method incorporating a knowledge base, comprising the steps of:
acquiring a statement to be disambiguated;
splicing the sentence to be disambiguated and the question sentence to obtain an MRC structure;
acquiring different entity description sentences corresponding to the ambiguity abbreviation in the sentences to be disambiguated from an entity knowledge base;
splicing different entity description sentences at the end of the MRC structure;
inputting the MRC structure spliced with different entity description sentences into a Bert model;
and the Bert model outputs an ambiguity corresponding to the real entity for short, so as to realize statement disambiguation.
2. The MRC-based corporate entity disambiguation method in combination with a knowledge base according to claim 1, characterised in that the output of the Bert model is provided with two loss functions; the loss function includes a first task loss function and a second task loss function.
3. The MRC-based corporate entity disambiguation method in combination with a knowledge base of claim 2, wherein said first mission loss function is a binary classification loss; the second task loss function is a multi-classification loss.
4. The MRC-based corporate entity disambiguation method in combination with a knowledge base of claim 2, wherein said first task loss function is represented by the following formula:
output1=Sigmoid(W1×H[CLS])
loss1=binary_crossentropy(output1,label1)
in the formula, output1A model output representing task one; sigmoid () represents a logical (logical) function; w1A weight matrix representing an output of a task computation task; h【CLS】Representing semantic vectors at the beginning positions of the periods; loss1Represents a loss of task one; binary _ cross entropy () represents a two-class cross entropy loss calculation function; label1A true tag representing task one.
5. The MRC-based corporate entity disambiguation method in combination with a knowledge base of claim 2, wherein said second task loss function is represented by the following formula:
entity_outputi=Sigmoid(Wentity_i×Hentity_i)
lossentity_i=binary_crossentropy(entity_outputi,labelentity_i)
in the formula, entity _ outputiRepresenting the model output at the ith entity; wentity_iRepresenting calculating a weight matrix of an output at an ith entity; hentity_iRepresenting a semantic vector at the ith entity location; lossentity_iIs shown asLoss of i entities; labelentity_iA real tag representing an i-th entity; loss2Represents the loss of task two; n represents the number of ambiguous entities or simply the possible corresponding entities.
6. The MRC based company entity disambiguation method in combination with knowledge base of claim 2, wherein the Bert model disambiguates MRC structure spliced with different entity description sentences by a first task loss function and a second task loss function, and the specific disambiguation process is as follows:
judging whether an ambiguity abbreviation exists in the statement to be disambiguated or not through the first task loss function;
and if so, acquiring the true entity corresponding to the ambiguity abbreviation from different entity description sentences through a second task loss function.
7. The MRC-based corporate entity disambiguation method in conjunction with a knowledge base of claim 1, wherein said Bert model is stacked through a 12-layer basis neural network structure.
8. The MRC based company entity disambiguation method in combination with knowledge base of claim 7, wherein the Bert model is trained, and the MRC structure is input into the trained Bert model to implement sentence disambiguation; the training method of the Bert model comprises the following steps:
setting parameters of a basic neural network structure in a Bert model;
initializing parameters of the last 3 layers of basic neural network structures at equal probability randomly;
and training the Bert model after the parameters are initialized randomly, and stopping training after the loss function of the Bert model is converged to obtain the Bert model after the training is optimized.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011070276.9A CN112257443B (en) | 2020-09-30 | 2020-09-30 | MRC-based company entity disambiguation method combined with knowledge base |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011070276.9A CN112257443B (en) | 2020-09-30 | 2020-09-30 | MRC-based company entity disambiguation method combined with knowledge base |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112257443A true CN112257443A (en) | 2021-01-22 |
CN112257443B CN112257443B (en) | 2024-04-02 |
Family
ID=74234991
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011070276.9A Active CN112257443B (en) | 2020-09-30 | 2020-09-30 | MRC-based company entity disambiguation method combined with knowledge base |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112257443B (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113051892A (en) * | 2021-03-22 | 2021-06-29 | 哈尔滨理工大学 | Chinese word sense disambiguation method based on transformer model |
CN113065353A (en) * | 2021-03-16 | 2021-07-02 | 北京金堤征信服务有限公司 | Entity identification method and device |
CN113128238A (en) * | 2021-04-28 | 2021-07-16 | 安徽智侒信信息技术有限公司 | Financial information semantic analysis method and system based on natural language processing technology |
CN113158687A (en) * | 2021-04-29 | 2021-07-23 | 新声科技(深圳)有限公司 | Semantic disambiguation method and device, storage medium and electronic device |
CN113220900A (en) * | 2021-05-10 | 2021-08-06 | 深圳价值在线信息科技股份有限公司 | Modeling method of entity disambiguation model and entity disambiguation prediction method |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107102989A (en) * | 2017-05-24 | 2017-08-29 | 南京大学 | A kind of entity disambiguation method based on term vector, convolutional neural networks |
CN109101579A (en) * | 2018-07-19 | 2018-12-28 | 深圳追科技有限公司 | customer service robot knowledge base ambiguity detection method |
CN110781670A (en) * | 2019-10-28 | 2020-02-11 | 合肥工业大学 | Chinese place name semantic disambiguation method based on encyclopedic knowledge base and word vector |
CN111339778A (en) * | 2020-03-13 | 2020-06-26 | 苏州跃盟信息科技有限公司 | Text processing method, device, storage medium and processor |
CN111401049A (en) * | 2020-03-12 | 2020-07-10 | 京东方科技集团股份有限公司 | Entity linking method and device |
CN111523326A (en) * | 2020-04-23 | 2020-08-11 | 北京百度网讯科技有限公司 | Entity chain finger method, device, equipment and storage medium |
CN111709243A (en) * | 2020-06-19 | 2020-09-25 | 南京优慧信安科技有限公司 | Knowledge extraction method and device based on deep learning |
-
2020
- 2020-09-30 CN CN202011070276.9A patent/CN112257443B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107102989A (en) * | 2017-05-24 | 2017-08-29 | 南京大学 | A kind of entity disambiguation method based on term vector, convolutional neural networks |
CN109101579A (en) * | 2018-07-19 | 2018-12-28 | 深圳追科技有限公司 | customer service robot knowledge base ambiguity detection method |
CN110781670A (en) * | 2019-10-28 | 2020-02-11 | 合肥工业大学 | Chinese place name semantic disambiguation method based on encyclopedic knowledge base and word vector |
CN111401049A (en) * | 2020-03-12 | 2020-07-10 | 京东方科技集团股份有限公司 | Entity linking method and device |
CN111339778A (en) * | 2020-03-13 | 2020-06-26 | 苏州跃盟信息科技有限公司 | Text processing method, device, storage medium and processor |
CN111523326A (en) * | 2020-04-23 | 2020-08-11 | 北京百度网讯科技有限公司 | Entity chain finger method, device, equipment and storage medium |
CN111709243A (en) * | 2020-06-19 | 2020-09-25 | 南京优慧信安科技有限公司 | Knowledge extraction method and device based on deep learning |
Non-Patent Citations (2)
Title |
---|
XIAOYA LI等: ""A Unified MRC Framework for Named Entity Recognition"", 《ARXIV》, pages 1 - 11 * |
穆玲玲;程晓煜;昝红英;韩英杰;: "融合语言知识的神经网络中文词义消歧模型", 郑州大学学报(理学版), no. 03, pages 15 - 20 * |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113065353A (en) * | 2021-03-16 | 2021-07-02 | 北京金堤征信服务有限公司 | Entity identification method and device |
CN113065353B (en) * | 2021-03-16 | 2024-04-02 | 北京金堤征信服务有限公司 | Entity identification method and device |
CN113051892A (en) * | 2021-03-22 | 2021-06-29 | 哈尔滨理工大学 | Chinese word sense disambiguation method based on transformer model |
CN113128238A (en) * | 2021-04-28 | 2021-07-16 | 安徽智侒信信息技术有限公司 | Financial information semantic analysis method and system based on natural language processing technology |
CN113128238B (en) * | 2021-04-28 | 2023-06-20 | 安徽智侒信信息技术有限公司 | Financial information semantic analysis method and system based on natural language processing technology |
CN113158687A (en) * | 2021-04-29 | 2021-07-23 | 新声科技(深圳)有限公司 | Semantic disambiguation method and device, storage medium and electronic device |
CN113158687B (en) * | 2021-04-29 | 2021-12-28 | 新声科技(深圳)有限公司 | Semantic disambiguation method and device, storage medium and electronic device |
CN113220900A (en) * | 2021-05-10 | 2021-08-06 | 深圳价值在线信息科技股份有限公司 | Modeling method of entity disambiguation model and entity disambiguation prediction method |
CN113220900B (en) * | 2021-05-10 | 2023-08-25 | 深圳价值在线信息科技股份有限公司 | Modeling Method of Entity Disambiguation Model and Entity Disambiguation Prediction Method |
Also Published As
Publication number | Publication date |
---|---|
CN112257443B (en) | 2024-04-02 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Liu et al. | Attention-based BiGRU-CNN for Chinese question classification | |
Zhang et al. | A text sentiment classification modeling method based on coordinated CNN‐LSTM‐attention model | |
CN112257443A (en) | MRC-based company entity disambiguation method combined with knowledge base | |
CN109325231B (en) | Method for generating word vector by multitasking model | |
CN111325029B (en) | Text similarity calculation method based on deep learning integrated model | |
CN108170848B (en) | Chinese mobile intelligent customer service-oriented conversation scene classification method | |
CN111368542A (en) | Text language association extraction method and system based on recurrent neural network | |
Wang et al. | Text categorization with improved deep learning methods | |
CN111191464A (en) | Semantic similarity calculation method based on combined distance | |
CN113515632A (en) | Text classification method based on graph path knowledge extraction | |
CN112766359A (en) | Word double-dimensional microblog rumor recognition method for food safety public sentiment | |
CN115392248A (en) | Event extraction method based on context and drawing attention | |
CN114239828A (en) | Supply chain affair map construction method based on causal relationship | |
Sarikaya et al. | Shrinkage based features for slot tagging with conditional random fields. | |
CN113869040A (en) | Voice recognition method for power grid dispatching | |
CN113486143A (en) | User portrait generation method based on multi-level text representation and model fusion | |
CN116522165B (en) | Public opinion text matching system and method based on twin structure | |
CN113239694A (en) | Argument role identification method based on argument phrase | |
Neill et al. | Meta-embedding as auxiliary task regularization | |
Cai et al. | Multi-view and attention-based bi-lstm for weibo emotion recognition | |
CN110705277A (en) | Chinese word sense disambiguation method based on cyclic neural network | |
Liao et al. | The sg-cim entity linking method based on bert and entity name embeddings | |
CN115329075A (en) | Text classification method based on distributed machine learning | |
Shi | Using domain knowledge for low resource named entity recognition | |
Wang et al. | BiLSTM-ATT Chinese sentiment classification model based on pre-training word vectors |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |