CN112906397B - Short text entity disambiguation method - Google Patents

Short text entity disambiguation method Download PDF

Info

Publication number
CN112906397B
CN112906397B CN202110366911.6A CN202110366911A CN112906397B CN 112906397 B CN112906397 B CN 112906397B CN 202110366911 A CN202110366911 A CN 202110366911A CN 112906397 B CN112906397 B CN 112906397B
Authority
CN
China
Prior art keywords
entity
sentence
model
training
word
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110366911.6A
Other languages
Chinese (zh)
Other versions
CN112906397A (en
Inventor
文万志
姜文轩
李喜凯
葛威
朱恺
吴雪斐
袁佳祺
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nantong University
Original Assignee
Nantong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nantong University filed Critical Nantong University
Priority to CN202110366911.6A priority Critical patent/CN112906397B/en
Publication of CN112906397A publication Critical patent/CN112906397A/en
Application granted granted Critical
Publication of CN112906397B publication Critical patent/CN112906397B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks

Abstract

The invention provides a short text entity disambiguation method based on deep learning, which is mainly used for solving the problem that entities in sentences have different meanings and different directions in different short texts, and comprises the following steps: step 1, segmenting words of sentences by using a jieba word segmentation technology, finding out entities to be disambiguated, and using listed company entities and abbreviations thereof as dictionaries; step 2, segmenting the sentence by taking the entity to be disambiguated as the center and the size of 32 characters; step 3, converting the statement containing the entity to be disambiguated into a Bidirectional Encoder reproduction from transformations (BERT) word vector model; and 4, putting the word vector model into a Long-Short Term Memory RNN (LSTM) model in batches, performing loss function calculation through cross entropy, and continuously optimizing parameters to obtain a final model. The invention can not only obtain good results in special fields such as company entities, but also obtain good results in general fields.

Description

Short text entity disambiguation method
Technical Field
The invention belongs to the field of natural language processing, and particularly relates to a Short text entity disambiguation method, which is an effective entity disambiguation technology based on deep learning Long-Short Term Memory RNN (LSTM) and Bidirectional Encoder retrieval from transformations (BERT) models, and is mainly used for solving the problem that company entities point to different meanings in different Short texts.
Background
In the internet era, information explosion and massive consultation, people hope that the advanced AI technology can associate texts with massive entity (company, name, etc.) information, improve reading fluency of users, realize accurate content recommendation, etc. The intelligent consultation treatment not only provides intelligent service for the financial industry, but also provides more innovation space for the financial business.
The text information is the main medium for information dissemination of company entities, and the company entities which generate news are accurately positioned to directly determine how to carry out downstream financial work. In the financial information, company entities (in tens of millions of company entities) appear in the form of domain names, which causes ambiguity. For example, apple is a commercially available company in the united states and is also a fruit. The object of entity disambiguation is to eliminate the problem of entity ambiguity in the information processing process and to purify the text information. Disambiguation is generally achieved by incorporating knowledge of entities. In recent years, the rapid development of artificial intelligence technology makes it possible to solve many problems, and people hope to apply the leading-edge artificial intelligence method to solve the problem of entity ambiguity in intelligent information.
The traditional entity disambiguation task is mainly based on a long text of a knowledge base, the knowledge base is complete, the long text has richer context information to assist entity disambiguation, and an entity disambiguation system based on vertical domain (company entity) disambiguation data is more challenging to construct.
The BERT model has the parallel capability, the capability of extracting features and modeling texts in a two-way mode, better results can be obtained with less data and shorter time, the long-term and short-term neural network can retain more important information and forget redundant information, the two technologies are combined and a binary technology is used for disambiguating entities, and a novel entity disambiguation technology based on deep learning is provided.
Disclosure of Invention
The technical problem to be solved by the invention is to provide a short text entity disambiguation method, which can effectively help natural language processing developers and related readers to judge whether a word to be disambiguated is a company name according to own requirements, and has higher accuracy and efficiency.
In order to solve the above technical problem, an embodiment of the present invention provides a short text entity disambiguation method, including the following steps:
s1, performing word segmentation on the training sample and the test sample;
s2, segmenting the sample by taking the entity to be disambiguated as the center;
s3, converting the sample containing the entity to be disambiguated into a word vector pre-trained by a BERT model;
s4, constructing a neural network model;
s5, calculating the value between the one-dimensional vector output by the neural network and the label vector of the sample by using the cross entropy as a loss function, and optimizing a neural network parameter model;
s6, using Microsoft Neural Network Intelligence (NNI) to search for parameters with higher training accuracy.
The specific steps of step S1 are:
s1.1, creating dictionaries for all entity names (including company full names and short names), and finding out all entities to be disambiguated by using a jieba word segmentation technology for training samples and testing samples;
s1.2, generating a prefix tree for a text to be segmented, and constructing a directed acyclic graph of a potential string order by using regular matching;
s1.3, finding out a word segmentation scheme of the maximum probability path through dynamic programming, solving an HMM (hidden Markov model) model by using a Viterbi algorithm in order to enable a word segmentation effect to be adaptive to a text, and mining a new word.
The specific steps of step S2 are:
s2.1, segmenting the sentence, and selecting only 32 characters when the sentence is coded;
s2.2, segmenting the sentence by taking the entity name as a center, finding the position of the entity name in the text, and dividing the first 13 characters and the last 14 characters of the entity name into one sentence, wherein the entity name fixedly occupies 5 bytes.
The specific steps of step S3 are:
s3.1, finding the id corresponding to the BERT pre-training model for each word in each sentence of the cut training and verification sample;
s3.2, identifying the length of each sentence, using 0 and 1 as masks, wherein 0 represents that no word exists in the position, and 1 represents that a word exists in the position, so that each sentence is converted into a binary vector group [ I, T, L, M ], wherein I identifies the BERT model id corresponding to each word; t identifies whether the sample is a company name, wherein 1 identifies the company name and 0 identifies the company name; l represents the length of the sentence; m is a mask for each sentence;
s3.3, performing batch processing on all the training sets, wherein every 32 samples serve as one batch, and optimizing parameters;
the specific steps of step S4 are: the neural network model is divided into three sub-modules:
s4.1, a BERT conversion module, which is used for converting the id in the step 3.1 into a BERT model vector which is actually pre-trained;
s4.2, an LSTM module which is used as a first layer training model and is convenient for learning information among the sentence sequences;
s4.3, a linear output module which is used as a final input vector.
Further, in step S4.1, for the BERT model, corresponding gradient information is retained in the calculation, and the formula is:
Figure BDA0003007916120000041
where loss is the loss function, w is the weight, yiIs the true value;
in step S4.2, the LSTM module uses a dropout algorithm, for each layer of neurons, the neurons are temporarily discarded from the network according to a certain probability, and different neurons are randomly selected during each iterative training, which is equivalent to performing training on different neural networks each time;
in step S4.3, the linear output module uses an Attention mechanism, and the Attention mechanism gives higher weight to the Tokens sequence which has important influence on each word in the sentence; the Attention score calculation formula for Tokens is as follows:
Figure BDA0003007916120000051
wherein f isTIs a linear layer, and the linear layer,
Figure BDA0003007916120000052
is the hidden layer state of the t-th Tokens, cTIs the context vector for Tokens.
The specific steps of step S5 are:
s5.1, calculating a neural network loss function by using cross entropy, and optimizing a neural network parameter model;
s5.2, for the entity name, the name is only an indication pronoun without meaning in the aspect of actual grammar, and the problem is simplified into a two-classification problem: the name of the entity is 1, and the name of the non-entity is 0; the cross entropy is a tool for two categories, can measure slight differences, finds an optimal solution by using a gradient descent method, and defines a cross entropy loss function as follows:
Figure BDA0003007916120000053
wherein, yiLabel representing a sample i, a positive class representing 1, and a negative class representing 0; y isiRepresents the probability that sample i is predicted to be positive;
s5.3, optimizing parameters by using Adam as a gradient descent algorithm, wherein the Adam algorithm not only performs exponential weighted average processing on the gradient during each training, but also updates the weight W and the constant term b by using the obtained gradient value, and reduces the updating speed of the direction if the direction has large oscillation, so as to reduce the oscillation; the exponentially weighted average formula is as follows:
Vt=βvt-1+(1-β)θt
wherein beta represents hyper-ginseng, vtRepresents the average value of the t-th order, thetatRepresents the value of the t-th time.
The specific steps of step S6 are:
microsoft Neural Network Intelligence (NNI) is a lightweight but powerful tool kit, which can adjust the super-parameters and adjust the batch size, learning rate, length processed by each sentence, cycle number, and number of convolution kernels, wherein F1 formula is as follows, taking F1 value as the basis for judgment:
Figure BDA0003007916120000061
Figure BDA0003007916120000062
Figure BDA0003007916120000063
where TP represents the number of positive samples determined to be positive, FP represents the number of negative samples determined to be positive, and FN represents the number of positive samples determined to be negative.
The technical scheme of the invention has the following beneficial effects:
the invention provides an entity disambiguation method based on the combination of a Bidirectional Encoder reproduction from transformations (BERT) model and a Long-Short Term Memory RNN (LSTM) model, which can effectively help natural language processing developers and related readers to judge whether a word to be disambiguated is a company name according to the requirements of the developers and the related readers, and has higher accuracy and efficiency.
Drawings
FIG. 1 is an overall framework of the present invention;
FIG. 2 is a flow chart of the jieba word segmentation work in the present invention;
FIG. 3 is a graph of a sentence segmentation algorithm in the present invention;
FIG. 4 is a general framework diagram of a neural network in the present invention;
FIG. 5 is a graph of the value of F1 obtained using the three word vectors in the present invention;
FIG. 6 is a graph of F1 values obtained using three neural networks in the present invention;
fig. 7 shows the values of F1 obtained in the present invention using three text lengths.
Detailed Description
In order to make the technical problems, technical solutions and advantages of the present invention more apparent, the following detailed description is given with reference to the accompanying drawings and specific embodiments.
The invention provides a short text entity disambiguation technology based on deep learning, which is mainly used for helping natural language processing developers and related readers judge whether a word to be disambiguated is a company name according to own requirements. The technology firstly finds out an entity to be disambiguated through jieba word segmentation and segments a long text into a short text, so that the scale of a neural network is reduced; secondly, the text uses a BERT model as a word vector pre-training model, converts each word in each sentence into an id corresponding to the BERT model, and records the length, the mask and whether the sentence is a company name; and finally, constructing and training a deep neural network by adopting long-short term neural network technology, an Attention mechanism, cross entropy and other technologies to obtain better parameters.
The invention provides a short text entity disambiguation method, which comprises the following steps:
s1, performing word segmentation on the training sample and the test sample; the method comprises the following specific steps:
s1.1, creating dictionaries for all entity names (including company full names and short names), and finding out all entities to be disambiguated by using a jieba word segmentation technology for training samples and testing samples; FIG. 2 is a jieba word segmentation workflow diagram, in which the loaded dictionary is the entity name, so as to find out the word to be disambiguated conveniently and quickly.
S1.2, generating a prefix tree for a text to be segmented, and constructing a directed acyclic graph of a potential string order by using regular matching;
s1.3, finding out a word segmentation scheme of the maximum probability path through dynamic programming, solving an HMM (hidden Markov model) model by using a Viterbi algorithm in order to enable a word segmentation effect to be adaptive to a text, and mining a new word.
S2, segmenting the sample by taking the entity to be disambiguated as the center; the method comprises the following specific steps:
s2.1, segmenting the sentences, and simultaneously, only selecting 32 characters when encoding the sentences, so that the training speed of the neural network is reduced as much as possible on the basis of ensuring the accuracy;
s2.2, segmenting the sentence by taking the entity name as the center, finding the position of the entity name in the text, and dividing the first 13 characters and the last 14 characters of the entity name into one sentence, wherein the entity name fixedly occupies 5 bytes, as shown in figure 3.
S3, converting the sample containing the entity to be disambiguated into a word vector pre-trained by a BERT model; the method comprises the following specific steps:
s3.1, finding the id corresponding to the BERT pre-training model for each word in each sentence of the cut training and verification sample;
and S3.2, because the step 2 can only ensure that the lengths of long sentences are equal, for sentences with smaller lengths, the lengths of the sentences cannot be ensured. Therefore, the length of each sentence must be identified, 0 and 1 are used as masks, 0 represents that there is no word at the position, 1 represents that there is a word at the position, and each sentence is converted into a binary vector group [ I, T, L, M ], wherein I identifies the BERT model id corresponding to each word; t identifies whether the sample is a company name, wherein 1 identifies the company name and 0 identifies the company name; l represents the length of the sentence; m is a mask for each sentence;
and S3.3, carrying out batch processing on all the training sets, taking 32 samples as one batch, and optimizing parameters.
S4, constructing a neural network model, wherein the overall framework of the neural network is shown in FIG. 4, and the neural network model is divided into three sub-modules:
s4.1, a BERT conversion module, which is used for converting the id in the step 3.1 into a BERT model vector which is actually pre-trained;
s4.2, an LSTM module which is used as a first layer training model and is convenient for learning information among the sentence sequences;
s4.3, a linear output module which is used as a final input vector.
For the BERT model, corresponding gradient information is retained in the calculation, and the formula is as follows:
Figure BDA0003007916120000091
where loss is the loss function, w is the weight, yiAre true values.
Using a dropout algorithm for an LSTM module, temporarily discarding the neurons of each layer from the network according to a certain probability, randomly selecting different neurons during each iterative training, and equivalently, training on different neural networks each time;
since the important part of a sentence is usually on the key words, the linear output module, using the Attention mechanism, gives higher weight to the token sequence that has important influence on each word in the sentence; the Attention score calculation formula for Tokens is as follows:
Figure BDA0003007916120000092
wherein f isTIs a linear layer, and the linear layer,
Figure BDA0003007916120000101
is the hidden layer state of the t-th Tokens, cTIs the context vector for Tokens.
S5, calculating the value between the one-dimensional vector output by the neural network and the label vector of the sample by using the cross entropy as a loss function, and optimizing a neural network parameter model; the method comprises the following specific steps:
s5.1, calculating a neural network loss function by using cross entropy, and optimizing a neural network parameter model;
s5.2, for the entity name, the name is only an indication pronoun without meaning in the aspect of actual grammar, and the problem is simplified into a two-classification problem: the name of the entity is 1, and the name of the non-entity is 0; the cross entropy is a tool for two categories, can measure slight differences, finds an optimal solution by using a gradient descent method, and defines a cross entropy loss function as follows:
Figure BDA0003007916120000102
wherein, yiLabel representing a sample i, a positive class representing 1, and a negative class representing 0; y isiRepresents the probability that sample i is predicted to be positive;
s5.3, optimizing parameters by using Adam as a gradient descent algorithm, wherein the Adam algorithm not only performs exponential weighted average processing on the gradient during each training, but also updates the weight W and the constant term b by using the obtained gradient value, and reduces the updating speed of the direction if the direction has large oscillation, so as to reduce the oscillation; the exponentially weighted average formula is as follows:
Vt=βvt-1+(1-β)θt
wherein beta represents hyper-ginseng, vtRepresents the average value of the t-th order, thetatRepresents the value of the t-th time.
S6, searching parameters with higher training accuracy by using Microsoft Neural Network Intelligence (NNI); the method comprises the following specific steps:
microsoft Neural Network Intelligency (NNI) is a lightweight but powerful tool kit that can adjust the hyper-parameters and make reference to batch size, learning rate, length processed into each sentence, cycle number, and number of convolution kernels. Wherein, taking the F1 value as the judgment basis, the formula of F1 is as follows:
Figure BDA0003007916120000111
Figure BDA0003007916120000112
Figure BDA0003007916120000113
where TP represents the number of positive samples determined to be positive, FP represents the number of negative samples determined to be positive, and FN represents the number of positive samples determined to be negative.
The general framework of the method provided by the invention is shown in figure 1, the BERT model and the LSTM model are combined, the BERT model can use predecessors to obtain information relation between sentences through mass data and trained vector parameters, and the LSTM model can obtain information relation between sentences through an update gate, an output gate and a forget gate.
Model comparisons are performed below, with analysis being performed for the word vector model, the neural network, and the text length, respectively.
Comparison 1: the results corresponding to the values of the test set F1 obtained by comparing the Word2vec, BERT and ERNIE models are shown in fig. 5, which shows that the best results of BERT and ERNIE are obtained, but the BERT model curve is smoother.
Comparison 2: comparing three neural network models of general neural network, Convolutional Neural Network (CNN) and long-short term neural network (LSTM), as shown in fig. 6, it can be shown that LSTM converges more smoothly.
Comparison 3: for different text length comparisons, as shown in fig. 7, the length effect is not too great for the same period of training.
Through experimental results and analysis, the invention uses the BERT model to effectively obtain the relation between words and avoid the import of redundant information. For neural networks, the use of LSTM solves the long text information preservation problem. In addition, the reasonable segmentation of the text length can obtain enough information and improve the training speed.
While the foregoing is directed to the preferred embodiment of the present invention, it will be understood by those skilled in the art that various changes and modifications may be made without departing from the spirit and scope of the invention as defined in the appended claims.

Claims (5)

1. A short text entity disambiguation method, comprising the steps of:
s1, performing word segmentation on the training sample and the test sample;
s2, segmenting the sample by taking the entity to be disambiguated as the center;
s3, converting the sample containing the entity to be disambiguated into a word vector pre-trained by a BERT model;
s4, constructing a neural network model;
s5, calculating the value between the one-dimensional vector output by the neural network and the label vector of the sample by using the cross entropy as a loss function, and optimizing a neural network parameter model;
s6, searching parameters with higher training accuracy by using Microsoft Neural Network Intelligence (NNI);
the specific steps of step S3 are:
s3.1, finding the id corresponding to the BERT pre-training model for each word in each sentence of the cut training and verification sample;
s3.2, identifying the length of each sentence, using 0 and 1 as masks, wherein 0 represents that no word exists in the position, and 1 represents that a word exists in the position, and then converting each sentence into a binary vector group [ I, T, L, M ], wherein I identifies a BERT model id corresponding to each word; t identifies whether the sample is a company name, wherein 1 identifies the company name and 0 identifies the company name; l represents the length of the sentence; m is a mask for each sentence;
s3.3, performing batch processing on all the training sets, wherein every 32 samples serve as one batch, and optimizing parameters;
the specific steps of step S4 are: the neural network model is divided into three sub-modules:
s4.1, a BERT conversion module, which is used for converting the id in the step 3.1 into a BERT model vector which is actually pre-trained;
s4.2, an LSTM module which is used as a first layer training model and is convenient for learning information among the sentence sequences;
s4.3, a linear output module which is used as a final input vector;
in step S4.1, for the BERT model, corresponding gradient information is retained in the calculation, and the formula is:
Figure FDA0003274861750000021
where loss is the loss function, w is the weight,yiis the true value;
in step S4.2, the LSTM module uses a dropout algorithm, for each layer of neurons, the neurons are temporarily discarded from the network according to a certain probability, and different neurons are randomly selected during each iterative training, which is equivalent to performing training on different neural networks each time;
in step S4.3, the linear output module uses an Attention mechanism, and the Attention mechanism gives higher weight to the Tokens sequence which has important influence on each word in the sentence; the Attention score calculation formula for Tokens is as follows:
Figure FDA0003274861750000022
wherein f isTIs a linear layer, and the linear layer,
Figure FDA0003274861750000023
is the hidden layer state of the t-th Tokens, cTIs the context vector for Tokens.
2. The method for disambiguating an entity of short text as claimed in claim 1, wherein the specific steps of step S1 are:
s1.1, creating dictionaries for all entity names, and finding out all entities to be disambiguated by using a jieba word segmentation technology for training samples and testing samples;
s1.2, generating a prefix tree for a text to be segmented, and constructing a directed acyclic graph of a potential string order by using regular matching;
s1.3, finding out a word segmentation scheme of the maximum probability path through dynamic programming, solving an HMM (hidden Markov model) model by using a Viterbi algorithm in order to enable a word segmentation effect to be adaptive to a text, and mining a new word.
3. The method for disambiguating an entity of short text as claimed in claim 1, wherein the specific steps of step S2 are:
s2.1, segmenting the sentence, and selecting only 32 characters when the sentence is coded;
s2.2, segmenting the sentence by taking the entity name as a center, finding the position of the entity name in the text, and dividing the first 13 characters and the last 14 characters of the entity name into one sentence, wherein the entity name fixedly occupies 5 bytes.
4. The method for disambiguating an entity of short text as claimed in claim 1, wherein the specific steps of step S5 are:
s5.1, calculating a neural network loss function by using cross entropy, and optimizing a neural network parameter model;
s5.2, for the entity name, the name is only an indication pronoun without meaning in the aspect of actual grammar, and the problem is simplified into a two-classification problem: the name of the entity is 1, and the name of the non-entity is 0; the cross entropy is a tool for two categories, can measure slight differences, finds an optimal solution by using a gradient descent method, and defines a cross entropy loss function as follows:
Figure FDA0003274861750000041
wherein, yiLabel representing a sample i, a positive class representing 1, and a negative class representing 0; y isiRepresents the probability that sample i is predicted to be positive;
s5.3, optimizing parameters by using Adam as a gradient descent algorithm, wherein the Adam algorithm not only performs exponential weighted average processing on the gradient during each training, but also updates the weight W and the constant term b by using the obtained gradient value, and reduces the updating speed of the direction if the direction has large oscillation, so as to reduce the oscillation; the exponentially weighted average formula is as follows:
vt=βvt-1+(1-β)θt
wherein beta represents hyper-ginseng, vtRepresents the average value of the t-th order, thetatRepresents the value of the t-th time.
5. The method for disambiguating an entity of short text as claimed in claim 1, wherein the specific steps of step S6 are:
the microsoft Neural Network Intelligence toolkit can adjust the hyper-parameters and adjust the parameters of batch size, learning rate, length processed by each sentence, cycle times and convolution kernel number, wherein the F1 formula is as follows by taking an F1 value as a judgment basis:
Figure FDA0003274861750000042
Figure FDA0003274861750000043
Figure FDA0003274861750000044
where TP represents the number of positive samples determined to be positive, FP represents the number of negative samples determined to be positive, and FN represents the number of positive samples determined to be negative.
CN202110366911.6A 2021-04-06 2021-04-06 Short text entity disambiguation method Active CN112906397B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110366911.6A CN112906397B (en) 2021-04-06 2021-04-06 Short text entity disambiguation method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110366911.6A CN112906397B (en) 2021-04-06 2021-04-06 Short text entity disambiguation method

Publications (2)

Publication Number Publication Date
CN112906397A CN112906397A (en) 2021-06-04
CN112906397B true CN112906397B (en) 2021-11-19

Family

ID=76109966

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110366911.6A Active CN112906397B (en) 2021-04-06 2021-04-06 Short text entity disambiguation method

Country Status (1)

Country Link
CN (1) CN112906397B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113449516A (en) * 2021-06-07 2021-09-28 深延科技(北京)有限公司 Disambiguation method, system, electronic device and storage medium for acronyms
CN113779959B (en) * 2021-08-31 2023-06-06 西南电子技术研究所(中国电子科技集团公司第十研究所) Small sample text data mixing enhancement method
CN113704416B (en) * 2021-10-26 2022-03-04 深圳市北科瑞声科技股份有限公司 Word sense disambiguation method and device, electronic equipment and computer-readable storage medium
CN114818736B (en) * 2022-05-31 2023-06-09 北京百度网讯科技有限公司 Text processing method, chain finger method and device for short text and storage medium
CN115238701B (en) * 2022-09-21 2023-01-10 北京融信数联科技有限公司 Multi-field named entity recognition method and system based on subword level adapter

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108566627A (en) * 2017-11-27 2018-09-21 浙江鹏信信息科技股份有限公司 A kind of method and system identifying fraud text message using deep learning
CN111581973A (en) * 2020-04-24 2020-08-25 中国科学院空天信息创新研究院 Entity disambiguation method and system
CN112069826A (en) * 2020-07-15 2020-12-11 浙江工业大学 Vertical domain entity disambiguation method fusing topic model and convolutional neural network
CN112464669A (en) * 2020-12-07 2021-03-09 宁波深擎信息科技有限公司 Stock entity word disambiguation method, computer device and storage medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108566627A (en) * 2017-11-27 2018-09-21 浙江鹏信信息科技股份有限公司 A kind of method and system identifying fraud text message using deep learning
CN111581973A (en) * 2020-04-24 2020-08-25 中国科学院空天信息创新研究院 Entity disambiguation method and system
CN112069826A (en) * 2020-07-15 2020-12-11 浙江工业大学 Vertical domain entity disambiguation method fusing topic model and convolutional neural network
CN112464669A (en) * 2020-12-07 2021-03-09 宁波深擎信息科技有限公司 Stock entity word disambiguation method, computer device and storage medium

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding;Jacob Devlin等;《arXiv:1810.04805v1》;20181011;全文 *
GlossBERT: BERT for word sense disambiguation with gloss knowledge;Huang L等;《arXiv preprint arXiv:1908.07245》;20191231;全文 *
Using bert for word sense disambiguation;Du J等;《arXiv preprint arXiv:1909.08358》;20191231;全文 *

Also Published As

Publication number Publication date
CN112906397A (en) 2021-06-04

Similar Documents

Publication Publication Date Title
CN109992782B (en) Legal document named entity identification method and device and computer equipment
CN108628823B (en) Named entity recognition method combining attention mechanism and multi-task collaborative training
CN112906397B (en) Short text entity disambiguation method
CN108920622B (en) Training method, training device and recognition device for intention recognition
CN106776581B (en) Subjective text emotion analysis method based on deep learning
CN111931506B (en) Entity relationship extraction method based on graph information enhancement
CN111209401A (en) System and method for classifying and processing sentiment polarity of online public opinion text information
CN111709242B (en) Chinese punctuation mark adding method based on named entity recognition
CN113591483A (en) Document-level event argument extraction method based on sequence labeling
CN114757182A (en) BERT short text sentiment analysis method for improving training mode
CN110968660A (en) Information extraction method and system based on joint training model
CN115392259B (en) Microblog text sentiment analysis method and system based on confrontation training fusion BERT
WO2023134083A1 (en) Text-based sentiment classification method and apparatus, and computer device and storage medium
CN114416979A (en) Text query method, text query equipment and storage medium
CN112163089A (en) Military high-technology text classification method and system fusing named entity recognition
CN115831102A (en) Speech recognition method and device based on pre-training feature representation and electronic equipment
Chen et al. Chinese Weibo sentiment analysis based on character embedding with dual-channel convolutional neural network
CN116522165B (en) Public opinion text matching system and method based on twin structure
CN113239694A (en) Argument role identification method based on argument phrase
CN112883713A (en) Evaluation object extraction method and device based on convolutional neural network
CN110377753B (en) Relation extraction method and device based on relation trigger word and GRU model
Arbaatun et al. Hate Speech Detection on Twitter through Natural Language Processing using LSTM Model
Sinapoy et al. Comparison of lstm and indobert method in identifying hoax on twitter
WO2023159759A1 (en) Model training method and apparatus, emotion message generation method and apparatus, device and medium
CN115204143A (en) Method and system for calculating text similarity based on prompt

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant