CN112257443A - MRC-based company entity disambiguation method combined with knowledge base - Google Patents

MRC-based company entity disambiguation method combined with knowledge base Download PDF

Info

Publication number
CN112257443A
CN112257443A CN202011070276.9A CN202011070276A CN112257443A CN 112257443 A CN112257443 A CN 112257443A CN 202011070276 A CN202011070276 A CN 202011070276A CN 112257443 A CN112257443 A CN 112257443A
Authority
CN
China
Prior art keywords
entity
mrc
task
loss
knowledge base
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011070276.9A
Other languages
Chinese (zh)
Other versions
CN112257443B (en
Inventor
张汝宸
朱德伟
朱峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huatai Securities Co ltd
Original Assignee
Huatai Securities Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huatai Securities Co ltd filed Critical Huatai Securities Co ltd
Priority to CN202011070276.9A priority Critical patent/CN112257443B/en
Publication of CN112257443A publication Critical patent/CN112257443A/en
Application granted granted Critical
Publication of CN112257443B publication Critical patent/CN112257443B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/211Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Machine Translation (AREA)

Abstract

The invention discloses a company entity disambiguation method based on MRC combined with a knowledge base, which comprises the following steps: acquiring a statement to be disambiguated; splicing the sentence to be disambiguated and the question sentence to obtain an MRC structure; acquiring different entity description sentences corresponding to the ambiguity abbreviation in the sentences to be disambiguated from an entity knowledge base; splicing different entity description sentences at the end of the MRC structure; inputting the MRC structure spliced with different entity description sentences into a Bert model; and the Bert model outputs an ambiguity corresponding to the real entity for short, so as to realize statement disambiguation. The method effectively improves the accuracy of model prediction, and simultaneously has the generalization capability of the supervision model, thereby avoiding the need of re-labeling and model training when a company entity is newly added.

Description

MRC-based company entity disambiguation method combined with knowledge base
Technical Field
The invention relates to the field of artificial intelligence, in particular to a company entity disambiguation method based on MRC combined with a knowledge base.
Background
The text information is the main medium for information dissemination of company entities, and the accurate positioning of the company entities (company association) where news occurs directly determines how to carry out downstream financial work. In the financial information, many company entities (in tens of millions of company entities) appear in the form of domain short names, which is very easy to cause ambiguity. For example, the common people may refer to a listed company or "general public"; the wuliangye can be marketed company or liquor. The essence of entity disambiguation is that a word may have multiple meanings, the exact meaning it expresses being determined in conjunction with the context and knowledge of the knowledge base. The ambiguity resolution of the company entity has important significance for subsequent understanding of the financial news information content and accurate related company entity information.
At the present stage, the common methods for company entity disambiguation are: (1) the regular expression matching method comprises the following steps: maintaining positive and negative sample (unambiguous is a positive sample, and ambiguous is a negative sample) rules of all possible ambiguous companies, and judging whether ambiguity exists in a regular matching mode; (2) the unsupervised sample clustering-based method comprises the following steps: mining positive and negative sample clusters by semantic clustering of a text containing a company entity for short, and carrying out disambiguation; (3) a method based on supervised sample classification: and training a binary model by marking positive and negative samples of the company which possibly generates ambiguity to disambiguate.
Among the above methods, the regular expression matching based method has high accuracy, but has low recall, poor expansibility, and low efficiency because the rule base needs to be continuously maintained manually; on one hand, the accuracy is relatively low due to lack of supervision information, and on the other hand, for each newly added entity of the company to be disambiguated, corresponding unsupervised corpora need to be newly added and clustered again; on the one hand, the method based on supervised sample classification cannot determine the specific ambiguous class of the negative sample, namely the ambiguous sample, because only two classifications are carried out on the positive and negative samples, and on the other hand, the description of the knowledge base on the entity cannot be effectively utilized because the information of the entity knowledge base is not introduced.
Disclosure of Invention
Aiming at the defects of the prior art, the invention provides a company entity disambiguation method based on MRC combined with a knowledge base, so as to solve the problem of relatively low accuracy rate in the prior art.
In order to solve the technical problems, the technical scheme adopted by the invention is as follows:
a MRC-based corporate entity disambiguation method incorporating a knowledge base, comprising the steps of:
acquiring a statement to be disambiguated;
splicing the sentence to be disambiguated and the question sentence to obtain an MRC structure;
acquiring different entity description sentences corresponding to the ambiguity abbreviation in the sentences to be disambiguated from an entity knowledge base;
splicing different entity description sentences at the end of the MRC structure;
inputting the MRC structure spliced with different entity description sentences into a Bert model;
and the Bert model outputs an ambiguity corresponding to the real entity for short, so as to realize statement disambiguation.
Furthermore, two loss functions are arranged at the output end of the Bert model; the loss function includes a first task loss function and a second task loss function.
Further, the first task loss function is a binary loss; the second task loss function is a multi-classification loss.
Further, the first task loss function is expressed by the following formula:
output1=Sigmoid(W1×H[CLS])
loss1=binary_crossentropy(output1,label1)
in the formula, output1A model output representing task one; sigmoid () represents a logical (logical) function; w1A weight matrix representing an output of a task computation task; h【CLS】Representing semantic vectors at the beginning positions of the periods; loss1Represents a loss of task one; binary _ crossentrypy () represents a two-class cross entropy loss computation function; label1A true tag representing task one.
Further, the second task loss function is expressed by the following formula:
entity_outputi=Sigmoid(Wentity_i×Hentity_i)
lossentity_i=binary_crossentropy(entity_outputi,labelentity_i)
Figure BDA0002713275530000031
in the formula, entity _ outputiRepresenting the model output at the ith entity; wentity_iRepresenting calculating a weight matrix of an output at an ith entity; hentity_iRepresenting a semantic vector at the ith entity location; lossentity_iRepresents the loss of the ith entity; labelentity_iA real tag representing an i-th entity; loss2Represents the loss of task two; n represents the number of ambiguous entities or simply the possible corresponding entities.
Further, the Bert model disambiguates the MRC structure spliced with different entity description statements through the first task loss function and the second task loss function, and the specific disambiguation process is as follows:
judging whether an ambiguity abbreviation exists in the statement to be disambiguated or not through the first task loss function;
and if so, acquiring the true entity corresponding to the ambiguity abbreviation from different entity description sentences through a second task loss function.
Further, the Bert model is stacked by 12 layers of the basic neural network structure.
Further, training the Bert model, and inputting the MRC structure into the trained Bert model to realize statement disambiguation; the training method of the Bert model comprises the following steps:
setting parameters of a basic neural network structure in a Bert model;
initializing parameters of the last 3 layers of basic neural network structures at equal probability randomly;
and training the Bert model after the parameters are initialized randomly, and stopping training after the loss function of the Bert model is converged to obtain the Bert model after the training is optimized.
Compared with the prior art, the invention has the following beneficial effects:
the invention inputs more effective information into the model through the introduction of entity description in the entity knowledge base, improves the prediction capability of the model, simultaneously utilizes the input construction mode of MRC, conforms to the input characteristics of a pretraining stage of a Bert model, further improves the accuracy rate of entity disambiguation, and further finely distinguishes different types of ambiguities through the specific classification of entity reference content, and accelerates the convergence of the model and enhances the training stability through the use of multi-task learning and weight reinitialization.
Drawings
FIG. 1 is a sample diagram of a similar reading understanding input constructed by way of concatenation with a question;
FIG. 2 is a sample diagram of a sentence described by effectively associating a sentence to be disambiguated with an entity.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail with reference to the following embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
A MRC-based corporate entity disambiguation method incorporating a knowledge base, comprising the steps of:
(1) acquiring a statement to be disambiguated;
(2) splicing the sentences to be disambiguated and the question sentences to obtain an MRC structure;
firstly, constructing input in a reading and understanding-like mode, namely splicing a sentence to be disambiguated as a reading and understanding text with a question to obtain an MRC structure and taking the MRC structure as input.
(3) Acquiring different entity description sentences corresponding to the ambiguity abbreviation in the sentences to be disambiguated from an entity knowledge base;
(4) splicing different entity description sentences at the end of the MRC structure;
entity description statements in which the same abbreviation is used but different references are concatenated at the end of the MRC structure input by means of the entity knowledge base for providing more detailed disambiguation information.
(5) Inputting the MRC structure spliced with different entity description sentences into a Bert model;
(6) and the Bert model outputs an ambiguity corresponding to the real entity for short, so as to realize statement disambiguation.
At the output end of the Bert model, two tasks are designed and losses are accumulated by utilizing the characteristic that multi-task learning can be mutually promoted, wherein one is binary loss for judging whether ambiguity exists or not, and the other is multi-classification loss for determining a specific ambiguity category.
At the initial training of the Bert model, only the model weight close to the input layer is reserved, and the model weight close to the output layer is initialized randomly again, so that the convergence speed of the model can be accelerated, the loss jitter in the training process is reduced, and the training stability is improved.
The method comprises the following specific steps:
step 1-construct input of MRC mode
The method comprises the steps of constructing model input in an MRC mode, using a sentence needing disambiguation as a reading text, and constructing input similar to reading understanding through a mode of splicing with a question sentence, wherein an example is shown in FIG. 1, the sentence to be disambiguated in FIG. 1 is 'apple futures continuous pressure bearing, but fluctuation is gradually reduced', a reading understanding problem is 'indication of an ambiguous main body', and a question and the sentence to be disambiguated 'pay attention to each other' through a Self-attention (Self-attention) mechanism, so that the model learns to extract information favorable for disambiguation from the sentence according to the question.
Step 2-input in conjunction with entity knowledge base
In order to fully utilize effective information in the structured knowledge base, a structure which can fuse the knowledge of the model and the knowledge base needs to be designed. The most effective information in the entity knowledge base comes from entity description sentences, all possible entity descriptions for ambiguity short are spliced at the end of MRC structure input in sequence, and the sentence to be disambiguated and the entity description sentences are effectively associated through an attention mechanism, for example, as shown in FIG. 2, the ambiguity in the sentence to be disambiguated is simply referred to as "apple", and the keyword has a plurality of corresponding entities in the structured entity knowledge base, such as "Malus plant", that is, the fruit that we eat usually, and "one high-tech company", that is, the apple company in the United states. Through the fusion mode, the structured information in the entity knowledge base can be effectively utilized, the model is helped to understand the semantics of the sentence to be disambiguated, and the disambiguation accuracy is improved.
Step3 multitask learning
At the output, the design of the loss function is also crucial. And designing two tasks by utilizing the characteristic that the multi-task learning can mutually promote, and accumulating the loss.
1) The first task is to distinguish whether the sentence to be disambiguated contains an ambiguous entity abbreviation;
output1=Sigmoid(W1×H[CLS])
loss1=binary_crossentropy(output1,label1)
in the formula, output1A model output representing task one; sigmoid () represents a logical (logical) function; w1A weight matrix representing an output of a task computation task; h【CLS】Representing semantic vectors at the beginning positions of the periods; loss1Represents a loss of task one; binary _ cross entropy () represents a two-class cross entropy loss calculation function; label1A true tag representing task one.
The second task is to determine which entity the ambiguous entity corresponds to in short;
entity_outputi=Sigmoid(Wentity_i×Hentity_i)
lossentity_i=binary_crossentropy(entity_outputi,labelentity_i)
Figure BDA0002713275530000071
in the formula, entity _ outputiRepresenting the model output at the ith entity; wentity_iRepresenting calculating a weight matrix of an output at an ith entity; hentity_iRepresenting a semantic vector at the ith entity location; lossentity_iRepresents the loss of the ith entity; labelentity_iA real tag representing an i-th entity; loss2Represents the loss of task two; n represents the number of ambiguous entities or simply the possible corresponding entities.
Step 4-weight reinitialization
Generally, for the model based on the Bert, only the structural parameters of all 12 layers of basic neural networks of the pre-trained Bert model need to be reserved, and fine tuning is directly performed on a downstream task. However, the training process of this method is unstable and the convergence rate is slow. The reason is that the weights of all 12-layer network structures of the pre-trained Bert model do not have positive effects on downstream tasks, general semantic information such as part of speech, syntax and the like is learned close to an input layer, and strongly related knowledge of the downstream tasks is learned close to an output layer. Obviously, the downstream task during the Bert pre-training is different from the disambiguation task in the scheme, so that the network weight close to the output layer in the pre-trained Bert has a negative effect on the training of the downstream task, and a weight reinitialization method is provided for solving the problem that the training of the downstream task is unstable:
1) copying all 12 layers of basic neural network structure parameters in the pretrained Bert into the model of the scheme;
2) replacing parameters of the last 3-layer network structure of the model in the scheme by an equal probability random initialization mode between 0 and 1;
3) training the model after the final 3-layer network structure parameters are reinitialized, and stopping training after the model loss function is converged to obtain a model after training optimization;
4) and (4) receiving input by using the trained model to obtain disambiguation output.
Step 1-construct a knowledge base containing information of company entities, which needs to contain various entities corresponding to the entities possibly causing ambiguity and description information thereof.
Step 2-label a certain amount of supervised corpora, including unambiguous corpora and ambiguous corpora, wherein the ambiguous corpora need to be labeled specifically for the corresponding ambiguous entity class.
Step 3-constructing input in MRC mode, splicing question sentences and sentences to be disambiguated, and splicing description sentences of all possible corresponding entities for short one by one at sentence tail.
Step 4-input Bert model, calculate two losses at the output and superimpose them, one is binary loss to determine if the sentence is ambiguous to the corporate entity description, and the other is multi-classification loss to determine the specific ambiguous class.
Step 5-training is initiated by keeping only the Bert weights close to the input layer and re-initializing the weights close to the output layer.
And Step 6-finishing training to obtain a complete company entity ambiguity resolution model, wherein in prediction, the input is consistent with that of Step3, and two outputs are provided, one is used for judging whether the sentence contains ambiguous company entities, and the other is used for determining the specific ambiguity categories corresponding to the entities.
The invention discloses a company Entity Disambiguation (Entity Disambiguation) method based on MRC (machine Reading company) combined with a knowledge base. Aiming at the ambiguity problem existing in company entity association, the technology is based on a Bert model, firstly a disambiguation question is constructed in an MRC (machine Reading compatibility) mode, then model input is constructed by combining entity description information in an entity knowledge base, at an output end, whether the entity binary loss and specifically the entity multi-classification loss are included is accumulated through multi-task learning, a loss function is constructed, finally the convergence speed of model training is accelerated through Weight Re-initialization (Weight Re-initialization), and the training stability is improved. The invention effectively solves the ambiguity problem of company entities which are mostly referred to as short in a large amount of financial news information, avoids semantic understanding deviation caused by company entities with various meanings, improves the accuracy rate of company association, and provides important basic technical support for various downstream financial analysis algorithms.
Compared with a regular expression matching-based method, the method has the advantages that by means of initial labeled linguistic data and by means of the generalization capability of the model, a large amount of follow-up manual rule maintenance work can be effectively avoided. Compared with a method based on unsupervised sample clustering, the introduction of the labeled data effectively improves the accuracy of model prediction, and the generalization capability of the supervised model also avoids the need of re-labeling and model training when company entities are added.
The present invention is not limited to the above-mentioned embodiments, and based on the technical solutions disclosed in the present invention, those skilled in the art can make some substitutions and modifications to some technical features without creative efforts according to the disclosed technical contents, and these substitutions and modifications are all within the protection scope of the present invention.

Claims (8)

1. A MRC-based corporate entity disambiguation method incorporating a knowledge base, comprising the steps of:
acquiring a statement to be disambiguated;
splicing the sentence to be disambiguated and the question sentence to obtain an MRC structure;
acquiring different entity description sentences corresponding to the ambiguity abbreviation in the sentences to be disambiguated from an entity knowledge base;
splicing different entity description sentences at the end of the MRC structure;
inputting the MRC structure spliced with different entity description sentences into a Bert model;
and the Bert model outputs an ambiguity corresponding to the real entity for short, so as to realize statement disambiguation.
2. The MRC-based corporate entity disambiguation method in combination with a knowledge base according to claim 1, characterised in that the output of the Bert model is provided with two loss functions; the loss function includes a first task loss function and a second task loss function.
3. The MRC-based corporate entity disambiguation method in combination with a knowledge base of claim 2, wherein said first mission loss function is a binary classification loss; the second task loss function is a multi-classification loss.
4. The MRC-based corporate entity disambiguation method in combination with a knowledge base of claim 2, wherein said first task loss function is represented by the following formula:
output1=Sigmoid(W1×H[CLS])
loss1=binary_crossentropy(output1,label1)
in the formula, output1A model output representing task one; sigmoid () represents a logical (logical) function; w1A weight matrix representing an output of a task computation task; h【CLS】Representing semantic vectors at the beginning positions of the periods; loss1Represents a loss of task one; binary _ cross entropy () represents a two-class cross entropy loss calculation function; label1A true tag representing task one.
5. The MRC-based corporate entity disambiguation method in combination with a knowledge base of claim 2, wherein said second task loss function is represented by the following formula:
entity_outputi=Sigmoid(Wentity_i×Hentity_i)
lossentity_i=binary_crossentropy(entity_outputi,labelentity_i)
Figure FDA0002713275520000021
in the formula, entity _ outputiRepresenting the model output at the ith entity; wentity_iRepresenting calculating a weight matrix of an output at an ith entity; hentity_iRepresenting a semantic vector at the ith entity location; lossentity_iIs shown asLoss of i entities; labelentity_iA real tag representing an i-th entity; loss2Represents the loss of task two; n represents the number of ambiguous entities or simply the possible corresponding entities.
6. The MRC based company entity disambiguation method in combination with knowledge base of claim 2, wherein the Bert model disambiguates MRC structure spliced with different entity description sentences by a first task loss function and a second task loss function, and the specific disambiguation process is as follows:
judging whether an ambiguity abbreviation exists in the statement to be disambiguated or not through the first task loss function;
and if so, acquiring the true entity corresponding to the ambiguity abbreviation from different entity description sentences through a second task loss function.
7. The MRC-based corporate entity disambiguation method in conjunction with a knowledge base of claim 1, wherein said Bert model is stacked through a 12-layer basis neural network structure.
8. The MRC based company entity disambiguation method in combination with knowledge base of claim 7, wherein the Bert model is trained, and the MRC structure is input into the trained Bert model to implement sentence disambiguation; the training method of the Bert model comprises the following steps:
setting parameters of a basic neural network structure in a Bert model;
initializing parameters of the last 3 layers of basic neural network structures at equal probability randomly;
and training the Bert model after the parameters are initialized randomly, and stopping training after the loss function of the Bert model is converged to obtain the Bert model after the training is optimized.
CN202011070276.9A 2020-09-30 2020-09-30 MRC-based company entity disambiguation method combined with knowledge base Active CN112257443B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011070276.9A CN112257443B (en) 2020-09-30 2020-09-30 MRC-based company entity disambiguation method combined with knowledge base

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011070276.9A CN112257443B (en) 2020-09-30 2020-09-30 MRC-based company entity disambiguation method combined with knowledge base

Publications (2)

Publication Number Publication Date
CN112257443A true CN112257443A (en) 2021-01-22
CN112257443B CN112257443B (en) 2024-04-02

Family

ID=74234991

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011070276.9A Active CN112257443B (en) 2020-09-30 2020-09-30 MRC-based company entity disambiguation method combined with knowledge base

Country Status (1)

Country Link
CN (1) CN112257443B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113051892A (en) * 2021-03-22 2021-06-29 哈尔滨理工大学 Chinese word sense disambiguation method based on transformer model
CN113065353A (en) * 2021-03-16 2021-07-02 北京金堤征信服务有限公司 Entity identification method and device
CN113128238A (en) * 2021-04-28 2021-07-16 安徽智侒信信息技术有限公司 Financial information semantic analysis method and system based on natural language processing technology
CN113158687A (en) * 2021-04-29 2021-07-23 新声科技(深圳)有限公司 Semantic disambiguation method and device, storage medium and electronic device
CN113220900A (en) * 2021-05-10 2021-08-06 深圳价值在线信息科技股份有限公司 Modeling method of entity disambiguation model and entity disambiguation prediction method

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107102989A (en) * 2017-05-24 2017-08-29 南京大学 A kind of entity disambiguation method based on term vector, convolutional neural networks
CN109101579A (en) * 2018-07-19 2018-12-28 深圳追科技有限公司 customer service robot knowledge base ambiguity detection method
CN110781670A (en) * 2019-10-28 2020-02-11 合肥工业大学 Chinese place name semantic disambiguation method based on encyclopedic knowledge base and word vector
CN111339778A (en) * 2020-03-13 2020-06-26 苏州跃盟信息科技有限公司 Text processing method, device, storage medium and processor
CN111401049A (en) * 2020-03-12 2020-07-10 京东方科技集团股份有限公司 Entity linking method and device
CN111523326A (en) * 2020-04-23 2020-08-11 北京百度网讯科技有限公司 Entity chain finger method, device, equipment and storage medium
CN111709243A (en) * 2020-06-19 2020-09-25 南京优慧信安科技有限公司 Knowledge extraction method and device based on deep learning

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107102989A (en) * 2017-05-24 2017-08-29 南京大学 A kind of entity disambiguation method based on term vector, convolutional neural networks
CN109101579A (en) * 2018-07-19 2018-12-28 深圳追科技有限公司 customer service robot knowledge base ambiguity detection method
CN110781670A (en) * 2019-10-28 2020-02-11 合肥工业大学 Chinese place name semantic disambiguation method based on encyclopedic knowledge base and word vector
CN111401049A (en) * 2020-03-12 2020-07-10 京东方科技集团股份有限公司 Entity linking method and device
CN111339778A (en) * 2020-03-13 2020-06-26 苏州跃盟信息科技有限公司 Text processing method, device, storage medium and processor
CN111523326A (en) * 2020-04-23 2020-08-11 北京百度网讯科技有限公司 Entity chain finger method, device, equipment and storage medium
CN111709243A (en) * 2020-06-19 2020-09-25 南京优慧信安科技有限公司 Knowledge extraction method and device based on deep learning

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
XIAOYA LI等: ""A Unified MRC Framework for Named Entity Recognition"", 《ARXIV》, pages 1 - 11 *
穆玲玲;程晓煜;昝红英;韩英杰;: "融合语言知识的神经网络中文词义消歧模型", 郑州大学学报(理学版), no. 03, pages 15 - 20 *

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113065353A (en) * 2021-03-16 2021-07-02 北京金堤征信服务有限公司 Entity identification method and device
CN113065353B (en) * 2021-03-16 2024-04-02 北京金堤征信服务有限公司 Entity identification method and device
CN113051892A (en) * 2021-03-22 2021-06-29 哈尔滨理工大学 Chinese word sense disambiguation method based on transformer model
CN113128238A (en) * 2021-04-28 2021-07-16 安徽智侒信信息技术有限公司 Financial information semantic analysis method and system based on natural language processing technology
CN113128238B (en) * 2021-04-28 2023-06-20 安徽智侒信信息技术有限公司 Financial information semantic analysis method and system based on natural language processing technology
CN113158687A (en) * 2021-04-29 2021-07-23 新声科技(深圳)有限公司 Semantic disambiguation method and device, storage medium and electronic device
CN113158687B (en) * 2021-04-29 2021-12-28 新声科技(深圳)有限公司 Semantic disambiguation method and device, storage medium and electronic device
CN113220900A (en) * 2021-05-10 2021-08-06 深圳价值在线信息科技股份有限公司 Modeling method of entity disambiguation model and entity disambiguation prediction method
CN113220900B (en) * 2021-05-10 2023-08-25 深圳价值在线信息科技股份有限公司 Modeling Method of Entity Disambiguation Model and Entity Disambiguation Prediction Method

Also Published As

Publication number Publication date
CN112257443B (en) 2024-04-02

Similar Documents

Publication Publication Date Title
Liu et al. Attention-based BiGRU-CNN for Chinese question classification
Zhang et al. A text sentiment classification modeling method based on coordinated CNN‐LSTM‐attention model
CN112257443A (en) MRC-based company entity disambiguation method combined with knowledge base
CN109325231B (en) Method for generating word vector by multitasking model
CN111325029B (en) Text similarity calculation method based on deep learning integrated model
CN108170848B (en) Chinese mobile intelligent customer service-oriented conversation scene classification method
CN111368542A (en) Text language association extraction method and system based on recurrent neural network
Wang et al. Text categorization with improved deep learning methods
CN111191464A (en) Semantic similarity calculation method based on combined distance
CN113515632A (en) Text classification method based on graph path knowledge extraction
CN112766359A (en) Word double-dimensional microblog rumor recognition method for food safety public sentiment
CN115392248A (en) Event extraction method based on context and drawing attention
CN114239828A (en) Supply chain affair map construction method based on causal relationship
Sarikaya et al. Shrinkage based features for slot tagging with conditional random fields.
CN113869040A (en) Voice recognition method for power grid dispatching
CN113486143A (en) User portrait generation method based on multi-level text representation and model fusion
CN116522165B (en) Public opinion text matching system and method based on twin structure
CN113239694A (en) Argument role identification method based on argument phrase
Neill et al. Meta-embedding as auxiliary task regularization
Cai et al. Multi-view and attention-based bi-lstm for weibo emotion recognition
CN110705277A (en) Chinese word sense disambiguation method based on cyclic neural network
Liao et al. The sg-cim entity linking method based on bert and entity name embeddings
CN115329075A (en) Text classification method based on distributed machine learning
Shi Using domain knowledge for low resource named entity recognition
Wang et al. BiLSTM-ATT Chinese sentiment classification model based on pre-training word vectors

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant