CN110110335A - A kind of name entity recognition method based on Overlay model - Google Patents
A kind of name entity recognition method based on Overlay model Download PDFInfo
- Publication number
- CN110110335A CN110110335A CN201910384659.4A CN201910384659A CN110110335A CN 110110335 A CN110110335 A CN 110110335A CN 201910384659 A CN201910384659 A CN 201910384659A CN 110110335 A CN110110335 A CN 110110335A
- Authority
- CN
- China
- Prior art keywords
- model
- name entity
- result
- entity
- label
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
- G06F40/295—Named entity recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Abstract
Complexity Chinese name entity recognition method based on Overlay model, 1) model training stage: a is by having the name entity corpus of mark to train low layer BiLSTM-CRF model under the calculating of improved loss function and saving;B is by having the name Entity recognition corpus of mark training high level BiLSTM-CRF model and saving;2) the model prediction stage: it will be sent into low layer model with prediction corpus, and will identify that the name entity of coarseness will be passed to high-level model as PRELIMINARY RESULTS.High-level model continues to identify to PRELIMINARY RESULTS, and result is re-entered high-level model if recognition result is not single name entity, it is known that all results are single name entity;3) it exports result: collecting corpus and pass through all name entities i.e. name entity of all outputs of upper layer network that Overlay model obtains, as the final result identified in entire identification process.
Description
Technical field
The present invention relates to a kind of name entity recognition method based on Overlay model, this method solve internet text rings
Under border, the identification problem of complicated Chinese name entity.
Background technique
Natural language processing (Natural Language Processing) technology is a son of computerized information engineering
Field, target are to mass text Data Management Analysis, so that computer program can use the information such as morphology, grammer, semanteme
Identification is completed to natural language text, is understood and the tasks such as output, such as word segmentation, name Entity recognition, Relation extraction, machine
Device translation, spatial term, question answering system, sentiment analysis etc..Natural language processing technique is in rule learning, statistical learning
The methods of exploration and research under it is ripe day by day.In recent years, indicate study, deep neural network class machine learning method to nature
Language processing techniques bring new direction and development, can achieve in part natural language processing problem good and stable
As a result.Natural language processing technique has a variety of applications in all trades and professions: the comment text data in social media can be used to
The trend of auxiliary monitoring public sentiment public opinion;Include many economic datas, company operation situation in financial and economic news, utilizes these textual datas
According to the execution that can be traded with aided quantification;Using the mass text data in news media, user interest topic can be carried out
Modeling efficiently carries out information filtering for reader and interest is recommended;Different language can be the text of carrier by machine translation mothod
Offer automatic translation, promote intercultural communication with exchange;Knowledge mapping technology can link different people and tissue, construction
Knowledge base serves a variety of business applications.
It names Entity recognition (Named-Entity Recognition), also known as entity extraction technique, entity partition,
It is a subdomains of natural language processing technique.The name entity that will be referred in non-structured text is aimed to extract
Come, including name, organization name, location name, medical terms, regulation term, time, quantity, monetary value etc..Such as in finance and economics
Need accurately to extract the name entity such as enterprise name, very important person's title, monetary value in article;It is needed in political news
Accurately extract the name entity such as politician's title, national geography title, organization's title, event title;In judgement text
In book text, need to extract the information such as principal name, penalty clause, measurement of penalty situation, association tissue.It can be said that name is real
Body identifies that problem is one of most basic task of natural language processing, and the height of the accuracy rate, recall rate of naming Entity recognition is straight
Connect the subsequent natural language processing task that affects, such as information extraction, text classification, text snippet, question answering system etc. research
Direction.
In actual engineering field, there are also many good problems to study for Chinese name entity recognition techniques.In project work
The problem of much rarely encountering or will not encounter in standard data set experiment can be encountered using name entity recognition system in journey:
(1) the name entity that will appear many place names, name, organization name nesting in practical application, when encountering such entity, model
Accuracy rate can decline;(2) text message structure in internet is mixed and disorderly, and form is changeable, directly gives Chinese name entity recognition system
Effect it is bad;(3) when input text size is long, the ability of Named Entity Extraction Model can be decreased obviously, and be needed using one
A little reasonable methods carry out reasonable cutting to text to improve recognition effect.We to the name entity in the case of these one by one
Analysis:
1) nested name entity.Such as the name entity of target identification is " Shanghai agriculture firm ", but in the name entity
There are also sub- place name naming entity " Shanghai ", after the probability that all kinds of labels are identified by BiLSTM, by condition random field to each
The score of kind sequence is compared, and there is no " Shanghai agriculture firm " is identified as an entirety for final system.The result is carried out
It analyzes and determines, " B-LOC " " I-LOC " label score of "upper" " sea " two word is very high, even if plus the mark of non-name entity hereinafter
Label score still exceeds the sequence label of Shanghai agriculture firm " B-ORG " " I-ORG " " I-ORG " " I-ORG " " I-ORG " " I-ORG "
Score affects the success rate that integral entity is identified as organization name.
2) entity context erroneous association is named.Above with have being associated in meaning hereinafter, by name entity recognition system
Identify that mistake boundary is also a kind of common classification error.As table ref { mix }, " Nanjing banking operation " center " name entity
It is unknown by error label with boundary above.When model analysis text is longer, due to the association of context semanteme, training mark collection
The factors such as data are insufficient, name entity recognition system usually hold the boundary for naming entity inaccurate.
3) sentence length long component is complicated.Too long of text difficulty for Named Entity Extraction Model is bigger, especially
It is the systematicness of text in network in practical engineering application, the normative comparison with standard data set that punctuation works use is poor very
It is more, when text size is very long, maximum sub-path conduct that condition random field algorithm will be calculated by Viterbi Dynamic Programming
Final output is as a result, often there are many accuracy rate decline from the point of view of effect.As table shown in ref { long }, when inputting compared with long text
There is name Entity recognition mistake, and by text cutting input compared with short text, system can successfully identify correct name
Entity.This shows that a biggish needs solve the too long influence name entity recognition system of text size in practical applications
Problem.
Summary of the invention
Based on the above reasons, object of the present invention is to a kind of name entity recognition method based on Overlay model, is a kind of base
Entity recognition problem is named in the Chinese Named Entity Extraction Model of Overlay model come the Chinese handled under complex situations.This method
It solves under internet text environments, the identification problem of complicated Chinese name entity.Overlay model is ordered by two layers of BiLSTM-CRF
Name physical model is constituted, and makes different improvement respectively to low layer and high-level model based on different purposes.
The technical scheme is that a kind of complexity Chinese name entity recognition method based on Overlay model, feature
It is, includes the following steps: 1) model training stage: low layer BiLSTM-CRF is respectively trained using Chinese name entity data set
Model and high level BiLSTM-CRF model simultaneously save, and two-layer model stacking is named Entity recognition;A is by there is the life of mark
Name entity corpus is trained low layer BiLSTM-CRF model and is saved under the calculating of improved loss function;B is by there is mark
Name Entity recognition corpus training high level BiLSTM-CRF model simultaneously saves;2) it the model prediction stage: will be sent into prediction corpus
Low layer model identifies that the name entity of coarseness is passed to high-level model as PRELIMINARY RESULTS.High-level model to PRELIMINARY RESULTS after
Result is re-entered high-level model if recognition result is not single name entity, it is known that all results are single by continuous identification
Name entity;Corpus to be predicted is sent into low layer model by a, optimizes coding/decoding method, and coarseness result is sent into high-level model;B will
Coarseness name entity is sent into upper layer network and is identified;The high-rise output of c judgement returns to 2) b, such as can not as a result, as that can divide again
Divide again, exports result;3) it exports result: collecting corpus and pass through all name entities i.e. upper layer network institute that Overlay model obtains
There is the name entity of output, as the final result identified in entire identification process.Fig. 4 gives basic subrack of the invention
Frame.
Low layer Named Entity Extraction Model and high-rise Named Entity Extraction Model are respectively trained in step 1).Two-layer model is all
For BiLSTM-CRF model, but training method is different from purpose.
The utility model has the advantages that the success rate that integral entity is identified as organization name can be improved in the present invention.The present invention is for working as mould
When type analysis text is longer, training mark collection data staging is reliable, boundary handle of the name entity recognition system for name entity
It is accurate to hold.And solve text size it is very long when also guarantee name entity recognition system it is accurate.
Detailed description of the invention
Fig. 1 is BiLSTM Named Entity Extraction Model flow chart.
Fig. 2 is Overlay model high-level model structure chart.
Fig. 3 is whole stacking model flow figure;
Fig. 4 is basic procedure block diagram of the invention.
Specific embodiment
As shown in Fig. 1, BiLSTM-CRF name physical model structure by distributed embeding layer, deep neural network layer and
Condition random field layer composition.Distribution insertion module trains this method of term vector by the distributed table of text using word2vec
The phenomenon that showing that the meaning between words connects, eliminating word wide gap.The term vector for using pre-training good is as depth
The input for practising processing natural language problem has become a classical mature method.Many work show using preparatory training
Good term vector is compared with being randomly-embedded, and entire neural network convergence rate is faster;Trained model is in accuracy and recalls
There is biggish promotion on degree;It is more obvious using the method advantage of word2vec especially in the lesser situation of data volume.
There is complicated temporal associativity each other for information sequence, between information, it is often more important that real for name
Message length is different for body identification mission, and Recognition with Recurrent Neural Network (RNN) is a good scheme.And LSTM model is
A mutation of RNN, while being good at modeling sequence problem, which, which also has, is easy to solve, being capable of long-term preservation weight
The advantages of wanting information.And two-way shot and long term memory network (BiLSTM) is a modified version of LSTM model, traditional RNN is defeated
Entering is that above, output is hereafter, to be released hereafter according to above, and two-way RNN utilizes reversed information simultaneously, allows model from both direction
Study, the word-building that this concept also complies with Chinese natural language send the thought of sentence, and BiLSTM is the bi-directional version of LSTM.
Condition random field layer (CRF) separates the relevance for the level that exports, and can fully consider in prediction label
Context relation, it is often more important that the solution viterbi algorithm of CRF is the road that maximum probability is found out using the method for Dynamic Programming
Diameter, this and name Entity recognition task agree with it is more preferable, can to avoid in result occur " B-LOC " label be followed by " I-ORG "
The problem of this illegal sequence of label.Thus this paper sequence labelling module selects CRF model.
Low layer model in the training process, improves the loss function of BiLSTM-CRF model, the mesh of this layer model
Be by corpus carry out coarseness identification, do not lose as far as possible it is potential name entity information.Thus in traditional BiLSTM-CRF
On improve.The loss function for being model training is improved, steps are as follows:
For list entries X=(x1,x2,…,xn), it is defeated after BiLSTM network query function if the sentence is embedded in by distributed
Matrix out is P, and the dimension of P matrix is n × k, and k is different label number.Pi,jAs i-th of character marking is j-th of mark
The score of label, referred to as emission probability.For a potential forecasting sequence y=(y1,y,…,yn), define obtaining for this sequence
Point:
Wherein A is transition probability matrix, and size is k × k, Ai,jIndicate that label i is transferred to the transition probability of label j.In order to
Reach under the premise of not omitting entity information as far as possible, text tentatively identified, by score formula optimization are as follows:
λ is penalty factor in formula, and value is between 0 to 1.Adjust in this way be meant that calculate sequence label path obtain
Timesharing is multiplied by a penalty coefficient and counts sequence label path score when true label is O " (not being name entity).Cause
It is smaller for the relatively entire data set of the name entity that we often pay close attention in the corpus in reality, so that model is biased to
Non- name entity tag is predicted, so that the penalty values of model are smaller.But this preference with it is desirable that finding out all names realities
The target of body is disagreed.Here penalty factor reduces the weight that authentic signature is the training examples of " O ", and authentic signature
It is not " O " in contrast the weight that is, label belongs to any kind name entity sample is improved.Calculating loss in this way
When, true tag is that the prediction result of the characters such as " B-PER ", " I-PER ", " B-ORG " is bigger for network training influence.In order to
So that lower layer network is more likely to output name entity tag in decoding sequence rather than the non-name entity tag of output, herein
All characters are being belonged to label " O " in lower layer network solution Calculative Process, i.e., are being not the probability of name entity multiplied by punishment
Factor mu, μ value is between 0 to 1, so that the sequence containing more name entity tags is easier to obtain high score, by conduct
As a result it exports.
In design conditions random field path score, dropped using prediction weighted score of the λ penalty factor to non-name entity
It is low, achieve the purpose that improve recall rate.
Coding/decoding method is improved in decoding, steps are as follows:
S01: the emission probability matrix that text word vector matrix to be predicted obtains after model calculates;
S02: being the non-probability for naming entity multiplied by penalty factor μ by current emission probability matrix label;
S03: creation one sequence length × label number null matrix S records each subpath score of Dynamic Programming;
S04: the path clue in creation one sequence length × label number matrix B record s-matrix uses current node
A upper node carry out record path;
S05: from first node to a last node traverses: by emission probability matrix and transition probability matrix, in S
It calculates in matrix from starting point to the maximum probability path of the corresponding each label of each node, while recording road in B;
S06: the score value in maximum probability path is found out in last column of S, and traverses B matrix with backtracking method, finds out this most
The sequence label in maximum probability path is as final output.
As shown in Fig. 2, upper layer network model receives the output of lower layer network model, received text is further processed,
Key is to find the boundary of name entity accurately.Here in training high level BiLSTM-CRF model, after the insertion of character distribution
The ability that convolutional neural networks model improves high-level model judgement name entity boundary is added.
Whole Overlay model constructs as shown in Figure 3.It is low using penalty factor λ optimization loss function training in training process
Layer BiLSTM-CRF model;Convolutional layer is added and pays close attention to local message, training high level BiLSTM-CRF model saves two layers of mould respectively
Type.During prediction, testing material is sent into the low layer model saved, optimizes decoding process using penalty factor μ, extracts coarse grain
The name entity of degree, send result into high-level model.Name entity in the careful identification corpus of high-level model, judges high-level model
As a result, terminating to predict when the output of high-rise result only single name entity.By the single name entity of output and its side
Boundary's information is exported as final recognition result.
Model prediction phase characteristic is: step 2 will be sent into low layer model with prediction corpus, optimize coding/decoding method, identification
The name entity of coarseness out.PRELIMINARY RESULTS is sent into higher layer entities identification model.High-level model accurately identifies, obtain as a result,
Whether judging result is single name entity, if then exporting, if not upper layer network is then passed to again, until output is single
Name entity.
The model prediction stage, lower layer network coding/decoding method are characterized in that: in order to enable lower layer network is decoding
Output name entity tag is more likely to when sequence rather than the non-name entity tag of output, the application is in lower layer network decoding meter
All characters are being belonged to label " O " during calculation, i.e., are being not the probability of name entity multiplied by penalty factor μ, μ value is 0 to 1
Between, so that the sequence containing more name entity tags is easier to obtain high score, exported as a result.
The model prediction stage, upper layer network prediction technique are characterized in that: the conjunction that will be identified in lower layer network
Method names recognition result of the entity sequence as coarseness, is passed to upper layer network.For upper layer network, Veterbi decoding side
The constant Stringency to guarantee the accuracy and boundary of result of formula.The text that upper layer network receives lower layer network input carries out pre-
Survey, prediction result has following situations: 1) upper layer network identifies single entity, and the entity that is accurately identified using high level is as finally
Export result.2) upper layer network identifies multiple entities, and multiple entities are re-used as the incoming upper layer network of input respectively, are repeated
Above step.
The output result stage, by the output result of the output result of final upper layer network Overlay model as a whole.
The legitimate name entity and its boundary that all laminated networks receive are received with stack data structure, using name name entity sets as pre-
Survey the prediction result of corpus.
In conclusion a kind of Chinese name entity recognition method based on Overlay model of the invention is known using low layer model
The entity information of other coarseness rationally cuts text under the premise of not omitting name entity information, is high-level model
Offer is accurately identified effectively to help.Convolution pond process is added in high-level model, improves the judgement to name entity boundary.
The above is only a preferred embodiment of the present invention, it should be pointed out that: for the ordinary skill people of the art
For member, various improvements and modifications may be made without departing from the principle of the present invention, these improvements and modifications are also answered
It is considered as protection scope of the present invention.
Claims (10)
1. a kind of complexity Chinese name entity recognition method based on Overlay model, which comprises the steps of: 1) mould
The type training stage: low layer BiLSTM-CRF model and high level BiLSTM-CRF mould is respectively trained using Chinese name entity data set
Type simultaneously saves, and two-layer model stacking is named Entity recognition;A is by there is the name entity corpus of mark in improved damage
Function is lost to calculate lower training low layer BiLSTM-CRF model and save;B is high by there is the training of the name Entity recognition corpus of mark
Layer BiLSTM-CRF model simultaneously saves;2) the model prediction stage: it will be sent into low layer model with prediction corpus, and will identify coarseness
Entity is named to be passed to high-level model as PRELIMINARY RESULTS;High-level model continues to identify to PRELIMINARY RESULTS, if recognition result is not single
Result is then re-entered high-level model by a name entity, it is known that all results are single name entity;A send corpus to be predicted
Enter low layer model, optimize coding/decoding method, coarseness result is sent into high-level model;Coarseness name entity is sent into high-rise net by b
Network is identified;The high-rise output of c judgement returns to 2) b, as that can not divide again, exports result as a result, as that can divide again;3) output knot
Fruit: it collects corpus and passes through all name entities i.e. name entity of all outputs of upper layer network that Overlay model obtains, as whole
The final result identified in a identification process.
2. the complexity Chinese name entity recognition method based on Overlay model according to claim 1, it is characterised in that: step
1) low layer Named Entity Extraction Model and high-rise Named Entity Extraction Model are respectively trained in, two-layer model is BiLSTM-CRF
Model.
3. the complexity Chinese name entity recognition method according to claim 2 based on Overlay model, it is characterised in that: the low layer
Corpus is carried out the identification of coarseness by Named Entity Extraction Model, the information without losing potential name entity;In tradition
It is improved on BiLSTM-CRF, improves the loss function for being model training, steps are as follows:
For list entries X=(x1,x2,…,xn), if the sentence is embedded in by distributed, exported after BiLSTM network query function
Matrix is P, and the dimension of P matrix is n × k, and k is different label number;Pi,jAs i-th of character marking is j-th of label
Score, referred to as emission probability;For a potential forecasting sequence y=(y1,y,…,yn), define the score of this sequence:
Wherein A is transition probability matrix, and size is k × k, Ai,jIndicate that label i is transferred to the transition probability of label j;
Under the premise of reaching and not omitting entity information, text is tentatively identified, by score formula optimization are as follows:
λ is penalty factor in formula, and value is between 0 to 1;It adjusts and is meant that in calculating sequence label path score in this way
When, when true label is O " (not being name entity), it is multiplied by a penalty coefficient and counts sequence label path score;Here
Penalty factor the weight that authentic signature is the training examples of " O " is reduced, and authentic signature is not " O ", i.e., label belongs to
In contrast the weight of any kind name entity sample is improved;In this way when calculating loss, true tag is " B-
PER ", " I-PER ", the prediction result of " B-ORG " character are bigger for network training influence;In order to enable lower layer network is decoding
Output name entity tag is more likely to when sequence rather than the non-name entity tag of output, in lower layer network solution Calculative Process
It is middle to belong to all characters label " O ", i.e., for name entity probability multiplied by penalty factor μ, μ value between 0 to 1,
So that the sequence containing more name entity tags is easier to obtain high score, exported as a result.
4. the complexity Chinese name entity recognition method based on Overlay model according to claim 2, it is characterised in that: described
High-rise Named Entity Extraction Model receives the output of lower layer network model, and received text is further processed, and key is to look for
The boundary of quasi- name entity;In training high level BiLSTM-CRF model, convolutional Neural net is added after the insertion of character distribution
Network model (CNN) improves the ability on high-level model judgement name entity boundary;Be added convolutional neural networks MODEL C NN purpose be
Finer feature extraction is carried out to the feature that character distribution indicates, so that local message generates more effectively connection, to reality
The identification on body boundary is more accurate.
5. the complexity Chinese name entity recognition method based on Overlay model according to claim 1, it is characterised in that: model
In forecast period: step 2) will be sent into low layer model with prediction corpus, optimize coding/decoding method, identify that the name of coarseness is real
Body;PRELIMINARY RESULTS is sent into higher layer entities identification model;High-level model accurately identifies, and obtains as a result, whether judging result is single
A name entity, if then exporting, if not upper layer network is then passed to again, until output is single name entity.
6. the complexity Chinese name entity recognition method based on Overlay model according to claim 5, it is characterised in that: described
Model prediction stage neural network forecast method and step on the middle and senior level: using the legitimate name entity sequence identified in lower layer network as
The recognition result of coarseness is passed to upper layer network;For upper layer network, Veterbi decoding mode is constant to guarantee result
The Stringency of accuracy and boundary;The text that upper layer network receives lower layer network input is predicted that prediction result has following
Situation: 1) upper layer network identifies single entity, using the entity that high level accurately identifies as final output result;2) high-rise net
Network identifies multiple entities, and multiple entities are re-used as the incoming upper layer network of input respectively, repeat above step.
7. the complexity Chinese name entity recognition method based on Overlay model according to claim 1, it is characterised in that: described
The result stage is exported, by the output result of the output result of final upper layer network Overlay model as a whole;Use stack data structure
The legitimate name entity and its boundary that all laminated networks receive are received, using name name entity sets as the prediction of prediction corpus
As a result.
8. the complexity Chinese name entity recognition method based on Overlay model according to claim 1, it is characterised in that:
BiLSTM-CRF name physical model structure is made of distributed embeding layer, deep neural network layer and condition random field layer;Point
The distributed of text is indicated containing between words using word2vec training this method of term vector by the module of cloth embeding layer
The phenomenon that justice connects, and eliminates word wide gap;The term vector for using pre-training good is as the depth of deep neural network layer
Practise the input of processing natural language problem.
9. the complexity Chinese name entity recognition method based on Overlay model according to claim 1, it is characterised in that: two-way
Shot and long term memory network (BiLSTM) is a modified version of LSTM model, and traditional RNN input is above, under output is
Text is released hereafter according to above, and two-way RNN utilizes reversed information simultaneously, and model is allowed to learn from both direction, this concept also accords with
The word-building for closing Chinese natural language sends the thought of sentence, and BiLSTM is the bi-directional version of LSTM.
10. the complexity Chinese name entity recognition method based on Overlay model according to claim 1, it is characterised in that: item
Part random field layer (CRF) separates the relevance for the level that exports, and context relation can be fully considered in prediction label,
Avoid the problem that occurring " B-LOC " label in result is followed by this illegal sequence of " I-ORG " label;
In design conditions random field path score, is reduced, reached using prediction weighted score of the λ penalty factor to non-name entity
To the purpose for improving recall rate;
Coding/decoding method is improved in decoding, steps are as follows:
S01: the emission probability matrix that text word vector matrix to be predicted obtains after model calculates;
S02: being the non-probability for naming entity multiplied by penalty factor μ by current emission probability matrix label;
S03: creation one sequence length × label number null matrix S records each subpath score of Dynamic Programming;
S04: the path clue in creation one sequence length × label number matrix B record s-matrix, it is upper with current node
One node carrys out record path;
S05: from first node to a last node traverses: by emission probability matrix and transition probability matrix, in s-matrix
Middle calculating records road from starting point to the maximum probability path of the corresponding each label of each node in B;
S06: the score value in maximum probability path is found out in last column of S, and traverses B matrix with backtracking method, it is most general to find out this
The sequence label in rate path is as final output.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910384659.4A CN110110335B (en) | 2019-05-09 | 2019-05-09 | Named entity identification method based on stack model |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910384659.4A CN110110335B (en) | 2019-05-09 | 2019-05-09 | Named entity identification method based on stack model |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110110335A true CN110110335A (en) | 2019-08-09 |
CN110110335B CN110110335B (en) | 2023-01-06 |
Family
ID=67489125
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910384659.4A Active CN110110335B (en) | 2019-05-09 | 2019-05-09 | Named entity identification method based on stack model |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110110335B (en) |
Cited By (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110688854A (en) * | 2019-09-02 | 2020-01-14 | 平安科技(深圳)有限公司 | Named entity recognition method, device and computer readable storage medium |
CN110866402A (en) * | 2019-11-18 | 2020-03-06 | 北京香侬慧语科技有限责任公司 | Named entity identification method and device, storage medium and electronic equipment |
CN110929521A (en) * | 2019-12-06 | 2020-03-27 | 北京知道智慧信息技术有限公司 | Model generation method, entity identification method, device and storage medium |
CN111090981A (en) * | 2019-12-06 | 2020-05-01 | 中国人民解放军战略支援部队信息工程大学 | Method and system for building Chinese text automatic sentence-breaking and punctuation generation model based on bidirectional long-time and short-time memory network |
CN111209362A (en) * | 2020-01-07 | 2020-05-29 | 苏州城方信息技术有限公司 | Address data analysis method based on deep learning |
CN111460821A (en) * | 2020-03-13 | 2020-07-28 | 云知声智能科技股份有限公司 | Entity identification and linking method and device |
CN111597804A (en) * | 2020-05-15 | 2020-08-28 | 腾讯科技(深圳)有限公司 | Entity recognition model training method and related device |
CN111753840A (en) * | 2020-06-18 | 2020-10-09 | 北京同城必应科技有限公司 | Ordering technology for business cards in same city logistics distribution |
CN112035635A (en) * | 2020-08-28 | 2020-12-04 | 康键信息技术(深圳)有限公司 | Medical field intention recognition method, device, equipment and storage medium |
CN112699682A (en) * | 2020-12-11 | 2021-04-23 | 山东大学 | Named entity identification method and device based on combinable weak authenticator |
CN112784602A (en) * | 2020-12-03 | 2021-05-11 | 南京理工大学 | News emotion entity extraction method based on remote supervision |
CN112800768A (en) * | 2021-02-03 | 2021-05-14 | 北京金山数字娱乐科技有限公司 | Training method and device for nested named entity recognition model |
CN113051918A (en) * | 2019-12-26 | 2021-06-29 | 北京中科闻歌科技股份有限公司 | Named entity identification method, device, equipment and medium based on ensemble learning |
CN113779992A (en) * | 2021-07-19 | 2021-12-10 | 西安理工大学 | Method for realizing BcBERT-SW-BilSTM-CRF model based on vocabulary enhancement and pre-training |
CN114118093A (en) * | 2022-01-27 | 2022-03-01 | 华东交通大学 | Method and system for identifying flat mark enhanced nested named entity |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20170060835A1 (en) * | 2015-08-27 | 2017-03-02 | Xerox Corporation | Document-specific gazetteers for named entity recognition |
CN106569998A (en) * | 2016-10-27 | 2017-04-19 | 浙江大学 | Text named entity recognition method based on Bi-LSTM, CNN and CRF |
CN107844474A (en) * | 2017-09-29 | 2018-03-27 | 华南师范大学 | Disease data name entity recognition method and system based on stacking condition random field |
CN109359293A (en) * | 2018-09-13 | 2019-02-19 | 内蒙古大学 | Mongolian name entity recognition method neural network based and its identifying system |
-
2019
- 2019-05-09 CN CN201910384659.4A patent/CN110110335B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20170060835A1 (en) * | 2015-08-27 | 2017-03-02 | Xerox Corporation | Document-specific gazetteers for named entity recognition |
CN106569998A (en) * | 2016-10-27 | 2017-04-19 | 浙江大学 | Text named entity recognition method based on Bi-LSTM, CNN and CRF |
CN107844474A (en) * | 2017-09-29 | 2018-03-27 | 华南师范大学 | Disease data name entity recognition method and system based on stacking condition random field |
CN109359293A (en) * | 2018-09-13 | 2019-02-19 | 内蒙古大学 | Mongolian name entity recognition method neural network based and its identifying system |
Non-Patent Citations (2)
Title |
---|
买买提阿依甫等: "基于BiLSTM-CNN-CRF模型的维吾尔文命名实体识别", 《计算机工程》 * |
高强 等: "基于层叠模型的国防领域命名实体识别研究", 《现代图书情报技术》 * |
Cited By (20)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110688854A (en) * | 2019-09-02 | 2020-01-14 | 平安科技(深圳)有限公司 | Named entity recognition method, device and computer readable storage medium |
CN110866402B (en) * | 2019-11-18 | 2023-11-28 | 北京香侬慧语科技有限责任公司 | Named entity identification method and device, storage medium and electronic equipment |
CN110866402A (en) * | 2019-11-18 | 2020-03-06 | 北京香侬慧语科技有限责任公司 | Named entity identification method and device, storage medium and electronic equipment |
CN111090981A (en) * | 2019-12-06 | 2020-05-01 | 中国人民解放军战略支援部队信息工程大学 | Method and system for building Chinese text automatic sentence-breaking and punctuation generation model based on bidirectional long-time and short-time memory network |
CN110929521A (en) * | 2019-12-06 | 2020-03-27 | 北京知道智慧信息技术有限公司 | Model generation method, entity identification method, device and storage medium |
CN110929521B (en) * | 2019-12-06 | 2023-10-27 | 北京知道创宇信息技术股份有限公司 | Model generation method, entity identification method, device and storage medium |
CN113051918A (en) * | 2019-12-26 | 2021-06-29 | 北京中科闻歌科技股份有限公司 | Named entity identification method, device, equipment and medium based on ensemble learning |
CN111209362A (en) * | 2020-01-07 | 2020-05-29 | 苏州城方信息技术有限公司 | Address data analysis method based on deep learning |
CN111460821A (en) * | 2020-03-13 | 2020-07-28 | 云知声智能科技股份有限公司 | Entity identification and linking method and device |
CN111460821B (en) * | 2020-03-13 | 2023-08-29 | 云知声智能科技股份有限公司 | Entity identification and linking method and device |
CN111597804A (en) * | 2020-05-15 | 2020-08-28 | 腾讯科技(深圳)有限公司 | Entity recognition model training method and related device |
CN111597804B (en) * | 2020-05-15 | 2023-03-10 | 腾讯科技(深圳)有限公司 | Method and related device for training entity recognition model |
CN111753840A (en) * | 2020-06-18 | 2020-10-09 | 北京同城必应科技有限公司 | Ordering technology for business cards in same city logistics distribution |
CN112035635A (en) * | 2020-08-28 | 2020-12-04 | 康键信息技术(深圳)有限公司 | Medical field intention recognition method, device, equipment and storage medium |
CN112784602A (en) * | 2020-12-03 | 2021-05-11 | 南京理工大学 | News emotion entity extraction method based on remote supervision |
CN112699682A (en) * | 2020-12-11 | 2021-04-23 | 山东大学 | Named entity identification method and device based on combinable weak authenticator |
CN112800768A (en) * | 2021-02-03 | 2021-05-14 | 北京金山数字娱乐科技有限公司 | Training method and device for nested named entity recognition model |
CN113779992A (en) * | 2021-07-19 | 2021-12-10 | 西安理工大学 | Method for realizing BcBERT-SW-BilSTM-CRF model based on vocabulary enhancement and pre-training |
CN114118093B (en) * | 2022-01-27 | 2022-04-15 | 华东交通大学 | Method and system for identifying flat mark enhanced nested named entity |
CN114118093A (en) * | 2022-01-27 | 2022-03-01 | 华东交通大学 | Method and system for identifying flat mark enhanced nested named entity |
Also Published As
Publication number | Publication date |
---|---|
CN110110335B (en) | 2023-01-06 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110110335A (en) | A kind of name entity recognition method based on Overlay model | |
CN110633409B (en) | Automobile news event extraction method integrating rules and deep learning | |
CN110427623B (en) | Semi-structured document knowledge extraction method and device, electronic equipment and storage medium | |
CN104881401B (en) | A kind of patent document clustering method | |
CN108416384A (en) | A kind of image tag mask method, system, equipment and readable storage medium storing program for executing | |
CN107861951A (en) | Session subject identifying method in intelligent customer service | |
CN108804512A (en) | Generating means, method and the computer readable storage medium of textual classification model | |
CN109241255A (en) | A kind of intension recognizing method based on deep learning | |
CN107729309A (en) | A kind of method and device of the Chinese semantic analysis based on deep learning | |
CN109165950A (en) | A kind of abnormal transaction identification method based on financial time series feature, equipment and readable storage medium storing program for executing | |
CN108763510A (en) | Intension recognizing method, device, equipment and storage medium | |
CN109255027B (en) | E-commerce comment sentiment analysis noise reduction method and device | |
CN104408153A (en) | Short text hash learning method based on multi-granularity topic models | |
CN107145573A (en) | The problem of artificial intelligence customer service robot, answers method and system | |
CN109710768A (en) | A kind of taxpayer's industry two rank classification method based on MIMO recurrent neural network | |
CN109902285A (en) | Corpus classification method, device, computer equipment and storage medium | |
CN108470022A (en) | A kind of intelligent work order quality detecting method based on operation management | |
CN108090223A (en) | A kind of opening scholar portrait method based on internet information | |
CN109271546A (en) | The foundation of image retrieval Feature Selection Model, Database and search method | |
CN112016313A (en) | Spoken language element identification method and device and alarm situation analysis system | |
CN111191051A (en) | Method and system for constructing emergency knowledge map based on Chinese word segmentation technology | |
CN110008699A (en) | A kind of software vulnerability detection method neural network based and device | |
CN110019820A (en) | Main suit and present illness history symptom Timing Coincidence Detection method in a kind of case history | |
CN112668323B (en) | Text element extraction method based on natural language processing and text examination system thereof | |
CN108241867A (en) | A kind of sorting technique and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |