CN107797992A - Name entity recognition method and device - Google Patents
Name entity recognition method and device Download PDFInfo
- Publication number
- CN107797992A CN107797992A CN201711102742.5A CN201711102742A CN107797992A CN 107797992 A CN107797992 A CN 107797992A CN 201711102742 A CN201711102742 A CN 201711102742A CN 107797992 A CN107797992 A CN 107797992A
- Authority
- CN
- China
- Prior art keywords
- character
- vector
- list entries
- sequence
- obtains
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
- G06F40/295—Named entity recognition
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Character Discrimination (AREA)
Abstract
The embodiment of the present application, which provides a kind of name entity recognition method and device, this method, to be included:Obtain list entries;Vectorization processing is carried out to the character in the list entries, obtains character vector sequence corresponding to the list entries;The character vector sequence is handled using neural network algorithm, obtains the text feature sequence of the list entries;The use condition random field processing text feature sequence, obtain name Entity recognition result corresponding to the list entries.Because character can characterize the quantity that more fine-grained feature and character quantity be much smaller than word, neural network algorithm it is contemplated that in list entries each character contextual information, and condition random field can avoid marking bias problem, therefore, technical scheme is by way of character vector, neural network algorithm and condition random field this three is combined, to realize name Entity recognition, preferable recognition effect can be reached.
Description
Technical field
The application is related to field of computer technology, more particularly to name entity recognition method and device.
Background technology
Natural language processing (Natural Language Processing, NLP) is computer science, artificial intelligence, language
The field of the interphase interaction of computer and human language of interest Yan Xue, is computer science and artificial intelligence field
In an important directions.NLP research categories, which are covered, can realize and carry out efficient communication with natural language between people and computer
Various theoretical and methods, the field being related to includes nature semantic understanding, retrieval, information extraction, machine translation, automatic question answering
System etc..
As a basic task in NLP, name Entity recognition (Named Entity Recognition, NER) is
Finger identifies technology of the entity with particular category such as name, place name, mechanism name, proper noun from text.NER is
The important foundation instrument of the application fields such as information retrieval, inquiry classification, question answering system, syntactic analysis, machine translation, it identifies effect
Fruit directly affects the subsequent treatment effect in aforementioned applications field.Therefore it provides a kind of recognition effect preferably names identification technology,
Those skilled in the art's technical problem urgently to be resolved hurrily is turned into.
The content of the invention
The purpose of the embodiment of the present application is to provide a kind of name entity recognition method and device, real to reach preferably name
Body recognition effect.
To reach above-mentioned technical purpose, what the embodiment of the present application was realized in:
According to the first aspect of the embodiment of the present application, there is provided one kind name entity recognition method, methods described include:
Obtain list entries;
Vectorization processing is carried out to the character in the list entries, obtains character vector sequence corresponding to the list entries
Row;
The character vector sequence is handled using neural network algorithm, obtains the text feature sequence of the list entries;
The use condition random field processing text feature sequence, obtain name Entity recognition corresponding to the list entries
As a result.
It is described to handle the character vector sequence using neural network algorithm in one embodiment of the application, obtain
The text feature sequence of the list entries, including:
The character vector sequence is handled using two-way shot and long term Memory Neural Networks, obtains the text of the list entries
Characteristic sequence.
In one embodiment of the application, the character in the list entries carries out vectorization processing, obtains
Character vector sequence corresponding to the list entries, including:
Character-DUAL PROBLEMS OF VECTOR MAPPING dictionary is obtained, wherein, record has character and vector in the character-DUAL PROBLEMS OF VECTOR MAPPING dictionary
Corresponding relation;
The vector corresponding to the character in the list entries is searched from the character-DUAL PROBLEMS OF VECTOR MAPPING dictionary;
The vector corresponding to the character in the list entries is handled using notice mechanism, obtains weighing corresponding to each vector
Weight values;
Vector weighted value corresponding with the vector corresponding to character in the list entries is subjected to point multiplication operation, obtained
To character vector sequence corresponding to the list entries.
In one embodiment of the application, the generating process of the character-DUAL PROBLEMS OF VECTOR MAPPING dictionary, including:
Obtain training corpus;
In units of character, the training corpus is split, obtains split result;
At least one of following pretreatments are carried out to the split result:Filtering spam character, filtering deactivation character, filtering are low
Frequency character and the meaningless symbol of filtering, obtain pre-processed results;
Using pre-processed results described in word2vec Algorithm for Training, obtain obtaining character-DUAL PROBLEMS OF VECTOR MAPPING dictionary.
It is described using pre-processed results described in word2vec Algorithm for Training in one embodiment of the application, obtained
Character-DUAL PROBLEMS OF VECTOR MAPPING dictionary is taken, including:
Using pre-processed results described in skip-gram model trainings, obtain obtaining character-DUAL PROBLEMS OF VECTOR MAPPING dictionary.
According to the second aspect of the embodiment of the present application, there is provided one kind name entity recognition device, described device include:
Acquiring unit, for obtaining list entries;
First processing units, for carrying out vectorization processing to the character in the list entries, obtain the input sequence
Character vector sequence corresponding to row;
Second processing unit, for handling the character vector sequence using neural network algorithm, obtain the input sequence
The text feature sequence of row;
3rd processing unit, for the use condition random field processing text feature sequence, obtain the list entries
Corresponding name Entity recognition result.
In one embodiment of the application, the second processing unit, including:
Character vector series processing subelement, for handling the character vector using two-way shot and long term Memory Neural Networks
Sequence, obtain the text feature sequence of the list entries.
In one embodiment of the application, the first processing units, including:
Map dictionary and obtain subelement, for obtaining character-DUAL PROBLEMS OF VECTOR MAPPING dictionary, wherein, the character-DUAL PROBLEMS OF VECTOR MAPPING word
Record has the corresponding relation of character and vector in allusion quotation;
Subelement is searched, it is right for searching the institute of the character in the list entries from the character-DUAL PROBLEMS OF VECTOR MAPPING dictionary
The vector answered;
Notice mechanism handles subelement, for being handled using notice mechanism corresponding to the character in the list entries
Vector, obtain it is each vector corresponding to weighted value;
Character vector sequence obtain subelement, for by corresponding to the character in the list entries vector with the vector
Corresponding weighted value carries out point multiplication operation, obtains character vector sequence corresponding to the list entries.
In one embodiment of the application, described device also includes:Map dictionary creation unit;
The mapping dictionary creation unit, including:
Training corpus obtains subelement, for obtaining training corpus;
Character splits subelement, in units of character, being split to the training corpus, obtaining split result;
Subelement is pre-processed, for carrying out at least one of following pretreatments to the split result:Filtering spam character, mistake
Filter disables character, filtering low character and the meaningless symbol of filtering, obtains pre-processed results;
Dictionary training subelement is mapped, for using pre-processed results described in word2vec Algorithm for Training, obtaining obtaining word
Symbol-DUAL PROBLEMS OF VECTOR MAPPING dictionary.
In one embodiment of the application, the mapping dictionary training subelement, it is specifically used for:
Using pre-processed results described in skip-gram model trainings, obtain obtaining character-DUAL PROBLEMS OF VECTOR MAPPING dictionary.
According to the third aspect of the embodiment of the present application, there is provided a kind of electronic equipment, including:Processor;And
It is arranged to store the memory of computer executable instructions, the executable instruction makes the place when executed
Manage device and perform following operate:
Obtain list entries;
Vectorization processing is carried out to the character in the list entries, obtains character vector sequence corresponding to the list entries
Row;
The character vector sequence is handled using neural network algorithm, obtains the text feature sequence of the list entries;
The use condition random field processing text feature sequence, obtain name Entity recognition corresponding to the list entries
As a result.
According to the fourth aspect of the embodiment of the present application, there is provided a kind of computer-readable storage medium, the computer-readable storage
Media storage one or more program, one or more of programs perform when the electronic equipment for being included multiple application programs
When so that the electronic equipment performs following operate:
Obtain list entries;
Vectorization processing is carried out to the character in the list entries, obtains character vector sequence corresponding to the list entries
Row;
The character vector sequence is handled using neural network algorithm, obtains the text feature sequence of the list entries;
The use condition random field processing text feature sequence, obtain name Entity recognition corresponding to the list entries
As a result.
The technical scheme provided from above the embodiment of the present application, the embodiment of the present application can be by input sequences to be identified
Each character in row is converted to corresponding vector, vector corresponding to each character is handled using neural network algorithm, to extract
The text feature sequence of list entries to be identified, last use condition random field handle text feature sequence, obtained to be identified
List entries corresponding to name Entity recognition result.Because character can characterize more fine-grained feature and character quantity is remote
Less than the quantity of word, neural network algorithm it is contemplated that in list entries each character contextual information, and condition with
Airport can avoid marking bias problem, and therefore, the embodiment of the present application is by by character vector, neural network algorithm and condition
The mode that this three of random field is combined, to realize name Entity recognition, preferable recognition effect can be reached.
Brief description of the drawings
, below will be to embodiment or existing in order to illustrate more clearly of the embodiment of the present application or technical scheme of the prior art
There is the required accompanying drawing used in technology description to be briefly described, it should be apparent that, drawings in the following description are only this
Some embodiments described in specification, for those of ordinary skill in the art, before creative labor is not paid
Put, other accompanying drawings can also be obtained according to these accompanying drawings.
Fig. 1 is the schematic diagram for the CBOW models that the application provides;
Fig. 2 is the schematic diagram for the Skip-Gram models that the application provides;
Fig. 3 is the flow chart of the name entity recognition method of one embodiment of the application;
Fig. 4 is the flow chart of the character-DUAL PROBLEMS OF VECTOR MAPPING dictionary creation method of one embodiment of the application;
Fig. 5 is the structural representation of the name entity recognition device of one embodiment of the application;
Fig. 6 is the structural representation of the electronic equipment of one embodiment of the application.
Embodiment
In order that those skilled in the art more fully understand the technical scheme in this specification, below in conjunction with the application
Accompanying drawing in embodiment, the technical scheme in the embodiment of the present application is clearly and completely described, it is clear that described reality
It is only this specification part of the embodiment to apply example, rather than whole embodiments.Based on the embodiment in this specification, ability
The every other embodiment that domain those of ordinary skill is obtained under the premise of creative work is not made, should all belong to this theory
The scope of bright book protection.
The embodiment of the present application provides a kind of name entity recognition method and device.
In order to make it easy to understand, some technical terms and concept for being related in the embodiment of the present application are situated between first below
Continue.
Entity recognition (Named Entity Recognition, NER) is named, refers to identify with specific from text
Technology of the entity of classification such as name, place name, mechanism name, proper noun.It is that sequence labelling is asked on NER process nature
Topic, i.e., for given input text sequence, mark is stamped to each word (or word).
Mark can use defined below:It is identified as with name (PER) in NER, place name (LOC), mechanism name (ORG)
Example, then for following input text:" Zhang San comes from Xi'an, graduates from Peking University ", its sequence labelling result are:
/ tri-/I-PER of B-PER come/O finishes from/O west/B-LOC peaces/I-LOC ,/O/O industry/O is in/O north/B-ORG capital/I-
ORG is big/I-ORG/I-ORG;
After parsing, NER results are:
Zhang San/PER comes from Xi'an/LOC, graduates from Peking University/ORG;Wherein, the meaning of above mark refers to table 1 below:
Mark | Lexical or textual analysis |
B-PER | The bebinning character of name |
I-PER | The centre of name and termination character |
B-LOC | The bebinning character of place name |
I-LOC | The centre of place name and termination character |
B-ORG | The bebinning character of institution term |
I-ORG | The centre of institution term and termination character |
O | Other characters |
Table 1
Word2vector algorithms, be the algorithm of Google companies exploitation, by unsupervised training, by word become one it is several
The vector of hundred dimensions, this vector can catch the semantic dependency between word (or character), and also known as term vector or word is embedding
Enter.
Skip-gram models, are one kind of word2vector algorithms, and it predicts the word of surrounding by current word, especially
It is applied to the prediction under the conditions of big data, as shown in figure 1, the word w (t-2), w (t-1), w that are gone using word w (t) around prediction
And w (t+2) (t+1).
CBOW models, it is one kind of word2vector algorithms, the word of its based on context pre- measured center, as shown in Fig. 2
According to word w (t-2), w (t-1), w (t+1) and w (t+2) the prediction word w (t) around word w (t), the vector of these words is done and connected
Connect, contextual information can be sufficiently reserved.
What notice mechanism (Attention Mechanism) was simulated is the attention model of human brain, when we read one
During article, the word being just only currently seen of eye focus in fact, this when, the brain of people was primarily upon in this part
On word, that is to say, that this when, concern of the human brain to entire article was not balanced, was to have what certain weight was distinguished.
Notice mechanism has huge castering action in Sequence Learning task, in codec framework, by coding stage
Attention model is added, data weighting conversion is carried out to source data sequence, the natural way of sequence pair sequence can be effectively improved
Under system performance.
Shot and long term memory network (Long Short-Term Memory, LSTM), is a kind of time recurrent neural network, is fitted
Together in being spaced in processing and predicted time sequence and postpone relatively long critical event.It passes through " Memory-Gate " and " forgetting door "
To control the going or staying of historical information, the long Route Dependence of conventional recycle neutral net is efficiently solved the problems, such as.
Condition random field (Conditional Random Field, CRF), is that natural language processing field was commonly used in recent years
One of algorithm, be usually used in syntactic analysis, name Entity recognition, part-of-speech tagging etc..CRF is become using Markov Chain as implicit
The probability metastasis model of amount, variable is implied by Observable condition discrimination, belongs to discrimination model.
Fig. 3 is the flow chart of the name entity recognition method of one embodiment of the application, and this method can be by service end
Perform, can also be performed by terminal device, the service end can include:Server or server cluster, the terminal device can be with
Including:Smart mobile phone, tablet personal computer, notebook/desktop computer etc., as shown in figure 3, this method may comprise steps of:
In step 301, list entries is obtained.
In the embodiment of the present application, list entries can be text sequence, or sound bite.
In step 302, vectorization processing is carried out to the character in list entries, obtains character corresponding to the list entries
Sequence vector.
In the embodiment of the present application, when list entries is text sequence, list entries is directly subjected to Character segmentation processing,
Obtain the character string (x of list entries1,x2,...,xn), wherein, xiFor i-th of character in list entries, n is input sequence
The character number of row.
In the embodiment of the present application, when list entries is sound bite, sound bite is first converted into corresponding text sequence
Row, then Character segmentation processing is carried out to text sequence, obtain the character string (x of text sequence1,x2,...,xn), also
It is the character string (x of list entries1,x2,...,xn), wherein, xiFor i-th of character in text sequence, n is text sequence
Character number, 1≤i≤n.
In an optional embodiment, above-mentioned steps 302 can include:S31 and S32, wherein,
In S31, obtain character-DUAL PROBLEMS OF VECTOR MAPPING dictionary, wherein, in the character-DUAL PROBLEMS OF VECTOR MAPPING dictionary record have character with
The corresponding relation of vector;
In S32, the vector corresponding to the character in list entries is searched from the character-DUAL PROBLEMS OF VECTOR MAPPING dictionary, will be looked into
The sequence that the vector found is formed is defined as character vector sequence corresponding to the list entries.
In the present embodiment, the character string (x of list entries is being obtained1,x2,...,xn) after, obtain character-vector
Dictionary is mapped, the character x in list entries is searched from the character-DUAL PROBLEMS OF VECTOR MAPPING dictionaryiCorresponding vector vi, will find
Vector viSequence (the v formed1,v2,...,vn), it is defined as character vector sequence (v ' corresponding to the list entries1,v
′2,...,v′n), wherein, v 'i=vi。
In one preferred embodiment, above-mentioned steps 302 can include:S33, S34, S35 and S36, wherein,
In S33, obtain character-DUAL PROBLEMS OF VECTOR MAPPING dictionary, wherein, in the character-DUAL PROBLEMS OF VECTOR MAPPING dictionary record have character with
The corresponding relation of vector;
In S34, the vector corresponding to the character in list entries is searched from the character-DUAL PROBLEMS OF VECTOR MAPPING dictionary;
In S35, the vector corresponding to the character in the list entries is handled using notice mechanism, it is right to obtain each vector
The weighted value answered;
It should be noted that the weighted value of vector reflects the vectorial significance level, weighted value is bigger, illustrates that vector is got over
It is important.
In present embodiment, can use Bi-LSTM (Bi-directional Recurrent Neural Network,
Two-way long short-term memory Recognition with Recurrent Neural Network) realize notice mechanism.
In the present embodiment, the character string (x of list entries is being obtained1,x2,...,xn) after, obtain character-vector
Dictionary is mapped, the character x in list entries is searched from the character-DUAL PROBLEMS OF VECTOR MAPPING dictionaryiCorresponding vector vi, will look into afterwards
The vector v foundiSequence (the v formed1,v2,...,vn) input in the correlation model of notice mechanism, export (at1,
at2,...,atn), wherein, atiFor viCorresponding weighted value.
In S36, the vector weighted value corresponding with the vector corresponding to the character in the list entries is subjected to dot product
Computing, obtain character vector sequence corresponding to the list entries.
Character x in list entries is obtainediCorresponding vector vi, and viCorresponding weighted value atiAfterwards, v is calculatedi*
ati, by (v1*at1,v2*at2,...,vn*atn) it is defined as character vector sequence (v ' corresponding to list entries1,v′2,...,v
′n) wherein, v 'i=vi*ati。
In the embodiment of the present application, character-DUAL PROBLEMS OF VECTOR MAPPING dictionary can be previously generated, it is contemplated that word2vector algorithm energy
Enough vectors (usual hundreds of dimension) become each character in one lower dimensional space, semantic dependency between such character can be with
With the distance of vector come approximate description, therefore training corpus can be trained using word2vector algorithms, generate word
Symbol-DUAL PROBLEMS OF VECTOR MAPPING dictionary;Now, as shown in figure 4, Fig. 4 shows the character based on Skip-gram models-DUAL PROBLEMS OF VECTOR MAPPING dictionary
The flow chart of generation method, may comprise steps of:
In S401, training corpus is obtained.
In the embodiment of the present application, training corpus includes a plurality of sentence.
In the embodiment of the present application, it is contemplated that word2vec algorithms are unsupervised learning algorithms, therefore are collecting related instruction
When practicing language material, the data volume of training corpus is the bigger the better, in addition, these language materials are mainly for corresponding application scenarios, and to the greatest extent
Amount covers most of data type of the scene.In actual applications, training corpus can be the language material marked, can also
For the language material not marked, the embodiment of the present application is not construed as limiting to this.
In S402, in units of character, training corpus is split, obtains split result.
In the embodiment of the present application, every sentence in training corpus is divided into character one by one.
In S403, at least one of following pretreatments are carried out to split result:Filtering spam character, filtering disable character,
Filtering low character and the meaningless symbol of filtering, obtain pre-processed results.
In the embodiment of the present application, in order to improve treatment effeciency and effect, the garbage character in split result can be filtered out,
Filtering stops low-frequency word and meaningless symbol, is organized into the requirement form of word2vec algorithms, that is, represents input and output, to establish
Training objective is prepared.
In S404, using word2vec Algorithm for Training pre-processed results, obtain obtaining character-DUAL PROBLEMS OF VECTOR MAPPING dictionary.
In the embodiment of the present application, the CBOW model preprocessing results in word2vector algorithms can be used, are obtained
Character-DUAL PROBLEMS OF VECTOR MAPPING dictionary;The Skip-gram model preprocessing results in word2vector algorithms can also be used, are obtained
Take character-DUAL PROBLEMS OF VECTOR MAPPING dictionary.
Bigger in view of the data volume of training corpus, the content trained in obtained character-DUAL PROBLEMS OF VECTOR MAPPING dictionary is more complete
Face and accurate, and Skip-gram models are particularly well suited to big data, therefore, prioritizing selection uses Skip-gram model preprocessings
As a result, obtain obtaining character-DUAL PROBLEMS OF VECTOR MAPPING dictionary.
Compared to the term vector in correlation technique, vectorization technology of the embodiment of the present application based on character can be brought following
Advantage:More fine-grained character feature can be characterized;Because character quantity is much smaller than word quantity, obtained model space-consuming
It is minimum, greatly improve model loading velocity;Over time, neologisms can continue to bring out, the term vector trained before
Increasingly severe feature hit rate downslide problem occurs in model, and the vector based on character then effectively prevent this problem,
Because it is relatively seldom to be continuously created the fresh character come every year.
In step 303, the character vector sequence is handled using neural network algorithm, the text for obtaining list entries is special
Levy sequence.
It is understood that stamping the action of mark to each character in list entries, a sequence can be abstracted as
Mark problem, its essence are also a classification task in fact, i.e., it needs to be determined that the class categories of each character.
In the embodiment of the present application, used neural network algorithm, its core concept is, for point of current each character
Class differentiates, considers historical information above as input, solve thes problems, such as character independence.
In the embodiment of the present application, by character vector sequence (v ' corresponding to list entries1,v′2,...,v′n) it is input to nerve
Handled in network algorithm, obtain the text feature sequence (h of list entries1,h2,...,hn), wherein, hiFor xiCharacter feature to
Amount, hiContain xiCharacteristic information.
As an example, character vector sequence corresponding to Recognition with Recurrent Neural Network (RNN) processing list entries can be used.
In view of standard RNN due to the problem of producing long Route Dependence, i.e., it is right for the longer historical information in path
It is smaller in the classification results influence of current character, even if these information have direct correlation with current problem.Illustrate:
Consider input " I studies abroad in the U.S. ... ..., can say a bite idiomatic English ", wherein, ellipsis represents other
Longer contextual information, italics represent the vocabulary for being currently needed for prediction.When we have seen that before " English " this word, we
It may be a kind of name of language that next word, which may be predicted, but be specifically which kind of language needs based on context to come really
It is fixed.The RNN of standard is due to its structure problem, and possibly can not remembering previously mentioned " I studies abroad in the U.S. ", this is highly useful
Information, so as to which the word of unpredictable next appearance is probably " English ", this phenomenon is referred to as " long Route Dependence ".LSTM's goes out
Now solves above mentioned problem, its main thought is, on the basis of standard RNN, goes to control the defeated of contextual information using " door "
Enter, specifically, controlling the input and output of historical information by several " doors ", each " door " is done non-linear by sigmoid functions
Change is normalized between 0~1, and its value then shows that less historical information passes through " door " closer to 0;On the contrary, closer to 1, then table
It is bright to there are more information to pass through " door ".These " doors " " can both remember " useful information, " can forget " useless information again.
In this way, the relevant information meeting selective retention outside relatively long distance is got off, so that current character mark classification refers to, so as to carry
Rise prediction effect.
Further, in order to improve treatment effect, two-way shot and long term Memory Neural Networks processing character vector can be used
Sequence, obtain the text feature sequence of list entries.
Specifically, by character vector sequence (v ' corresponding to list entries1,v′2,...,v′n) be input in two-way LSTM,
Forward direction LSTM output is (hf1,hf2,...,hfn), backward LSTM outputs are (hb1,hb2,...,hbn), the two enters row vector spelling
(h is obtained after connecing1,h2,...,hn), wherein, vi' corresponding forward direction LSTM outputs are hfi, v 'iCorresponding backward LSTM, which is exported, is
hbi, hfiCharacterize xiHistory context information, and export h backwardbiThen characterize xiFollowing contextual information.
In step 304, use condition random field processing text feature sequence, name entity corresponding to list entries is obtained
Recognition result.
Conditional random field models are a kind of undirected graph models, it be marked in given needs observation sequence (word, sentence,
Numerical value etc.) under conditions of, calculate the joint probability distribution of whole flag sequence.
In the embodiment of the present application, the sequence learning algorithm of condition random field can be improved iteration method of scales, condition random
The prediction algorithm of field can be viterbi algorithm.
, can be by text feature sequence (h corresponding to list entries in the embodiment of the present application1,h2,...,hn) it is input to line
Property chain condition random field, specifically, study when, utilize text feature sequence (h1,h2,...,hn) pass through condition random field
Learning algorithm (such as improved iteration method of scales) obtains output sequence (s1,s2,...,sn) and state-transition matrix, wherein, si
For hiCorresponding output, siFor 1*K vector, siIn each vector value represent xiRelative to the confidence score of different labeled,
Transition probability of the state-transition matrix between each mark;In prediction, maximum probability routing problem is converted into, utilizes Viterbi
Algorithm is to output sequence (s1,s2,...,sn) and state-transition matrix handled, obtain annotated sequence (y1,y2,...,yn),
Further parsing according to demand afterwards, obtains finally naming Entity recognition result, wherein, yiWith xiIt is corresponding.
As seen from the above-described embodiment, the embodiment can be converted to each character in list entries to be identified correspondingly
Vector, handled using neural network algorithm corresponding to each character it is vectorial, to extract the text of list entries to be identified spy
Sequence is levied, last use condition random field handles text feature sequence, obtains name entity corresponding to list entries to be identified
Recognition result.Because character can characterize the quantity that more fine-grained feature and character quantity be much smaller than word, neutral net is calculated
Method it is contemplated that in list entries each character contextual information, and condition random field can avoid marking bias problem,
Therefore, the embodiment of the present application passes through the side that is combined character vector, neural network algorithm and condition random field this three
Formula, to realize name Entity recognition, preferable recognition effect can be reached.
Fig. 5 is the structural representation of the name entity recognition device of one embodiment of the application.Fig. 5 is refer to, one
In kind Software Implementation, entity recognition device 500 is named, can be included:
Acquiring unit 501, for obtaining list entries;
First processing units 502, for carrying out vectorization processing to the character in the list entries, obtain the input
Character vector sequence corresponding to sequence;
Second processing unit 503, for handling the character vector sequence using neural network algorithm, obtain the input
The text feature sequence of sequence;
3rd processing unit 504, for the use condition random field processing text feature sequence, obtain the input sequence
Name Entity recognition result corresponding to row.
As seen from the above-described embodiment, the embodiment can be converted to each character in list entries to be identified correspondingly
Vector, handled using neural network algorithm corresponding to each character it is vectorial, to extract the text of list entries to be identified spy
Sequence is levied, last use condition random field handles text feature sequence, obtains name entity corresponding to list entries to be identified
Recognition result.Because character can characterize the quantity that more fine-grained feature and character quantity be much smaller than word, neutral net is calculated
Method it is contemplated that in list entries each character contextual information, and condition random field can avoid marking bias problem,
Therefore, the embodiment of the present application passes through the side that is combined character vector, neural network algorithm and condition random field this three
Formula, to realize name Entity recognition, preferable recognition effect can be reached.
Alternatively, as one embodiment, the second processing unit 503, can include:
Character vector series processing subelement, for handling the character vector using two-way shot and long term Memory Neural Networks
Sequence, obtain the text feature sequence of the list entries.
Alternatively, as one embodiment, the first processing units 502, can include:
Map dictionary and obtain subelement, for obtaining character-DUAL PROBLEMS OF VECTOR MAPPING dictionary, wherein, the character-DUAL PROBLEMS OF VECTOR MAPPING word
Record has the corresponding relation of character and vector in allusion quotation;
Subelement is searched, it is right for searching the institute of the character in the list entries from the character-DUAL PROBLEMS OF VECTOR MAPPING dictionary
The vector answered;
Notice mechanism handles subelement, for being handled using notice mechanism corresponding to the character in the list entries
Vector, obtain it is each vector corresponding to weighted value;
Character vector sequence obtain subelement, for by corresponding to the character in the list entries vector with the vector
Corresponding weighted value carries out point multiplication operation, obtains character vector sequence corresponding to the list entries.
Alternatively, as one embodiment, the name entity recognition device 500, can also include:Map dictionary creation
Unit;
The mapping dictionary creation unit, can include:
Training corpus obtains subelement, for obtaining training corpus;
Character splits subelement, in units of character, being split to the training corpus, obtaining split result;
Subelement is pre-processed, for carrying out at least one of following pretreatments to the split result:Filtering spam character, mistake
Filter disables character, filtering low character and the meaningless symbol of filtering, obtains pre-processed results;
Dictionary training subelement is mapped, for using pre-processed results described in word2vec Algorithm for Training, obtaining obtaining word
Symbol-DUAL PROBLEMS OF VECTOR MAPPING dictionary.
Alternatively, as one embodiment, the mapping dictionary training subelement, it is specifically used for:
Using pre-processed results described in skip-gram model trainings, obtain obtaining character-DUAL PROBLEMS OF VECTOR MAPPING dictionary.
Name entity recognition device 500 can also carry out the method for embodiment illustrated in fig. 3, and realize name entity recognition device
In the function of embodiment illustrated in fig. 5, the embodiment of the present application will not be repeated here.
Fig. 6 is the structural representation of the electronic equipment of one embodiment of the application.Fig. 6 is refer to, should in hardware view
Electronic equipment includes processor, alternatively also includes internal bus, network interface, memory.Wherein, memory may include interior
Deposit, such as high-speed random access memory (Random-Access Memory, RAM), it is also possible to also including non-volatile memories
Device (non-volatile memory), for example, at least 1 magnetic disk storage etc..Certainly, the electronic equipment is also possible that other
Hardware required for business.
Processor, network interface and memory can be connected with each other by internal bus, and the internal bus can be ISA
(Industry Standard Architecture, industry standard architecture) bus, PCI (Peripheral
Component Interconnect, Peripheral Component Interconnect standard) bus or EISA (Extended Industry Standard
Architecture, EISA) bus etc..The bus can be divided into address bus, data/address bus, control always
Line etc..For ease of representing, only represented in Fig. 6 with a four-headed arrow, it is not intended that an only bus or a type of
Bus.
Memory, for depositing program.Specifically, program can include program code, and described program code includes calculating
Machine operational order.Memory can include internal memory and nonvolatile memory, and provide instruction and data to processor.
Processor read from nonvolatile memory corresponding to computer program into internal memory then run, in logical layer
Name entity recognition device is formed on face.Processor, the program that memory is deposited is performed, and specifically for performing following grasp
Make:
Obtain list entries;
Vectorization processing is carried out to the character in the list entries, obtains character vector sequence corresponding to the list entries
Row;
The character vector sequence is handled using neural network algorithm, obtains the text feature sequence of the list entries;
The use condition random field processing text feature sequence, obtain name Entity recognition corresponding to the list entries
As a result.
The method that name entity recognition device disclosed in the above-mentioned embodiment illustrated in fig. 6 such as the application performs can apply to locate
Manage in device, or realized by processor.Processor is probably a kind of IC chip, has the disposal ability of signal.In reality
During existing, each step of the above method can pass through the integrated logic circuit of the hardware in processor or the finger of software form
Order is completed.Above-mentioned processor can be general processor, including central processing unit (Central Processing Unit,
CPU), network processing unit (Network Processor, NP) etc.;It can also be digital signal processor (Digital Signal
Processor, DSP), it is application specific integrated circuit (Application Specific Integrated Circuit, ASIC), existing
Field programmable gate array (Field-Programmable Gate Array, FPGA) or other PLDs, divide
Vertical door or transistor logic, discrete hardware components.It can realize or perform and be in the embodiment of the present application disclosed each
Method, step and logic diagram.General processor can be microprocessor or the processor can also be any conventional place
Manage device etc..The step of method with reference to disclosed in the embodiment of the present application, can be embodied directly in hardware decoding processor and perform
Into, or combined with the hardware in decoding processor and software module and perform completion.Software module can be located at random access memory,
This area such as flash memory, read-only storage, programmable read only memory or electrically erasable programmable memory, register maturation
In storage medium.The storage medium is located at memory, and processor reads the information in memory, and above-mentioned side is completed with reference to its hardware
The step of method.
The electronic equipment can also carry out Fig. 3 method, and realize work(of the name entity recognition device in embodiment illustrated in fig. 3
Can, the embodiment of the present application will not be repeated here.
Certainly, in addition to software realization mode, the electronic equipment of this specification is not precluded from other implementations, such as
Mode of logical device or software and hardware combining etc., that is to say, that the executive agent of following handling process is not limited to each
Logic unit or hardware or logical device.
The embodiment of the present application additionally provides a kind of computer-readable recording medium, the computer-readable recording medium storage one
Individual or multiple programs, one or more programs include instruction, and the instruction is when the portable electronic for being included multiple application programs
When equipment performs, the method for portable electric appts execution embodiment illustrated in fig. 3 can be made, and specifically for performing with lower section
Method:
Obtain list entries;
Vectorization processing is carried out to the character in the list entries, obtains character vector sequence corresponding to the list entries
Row;
The character vector sequence is handled using neural network algorithm, obtains the text feature sequence of the list entries;
The use condition random field processing text feature sequence, obtain name Entity recognition corresponding to the list entries
As a result.
In a word, the preferred embodiment of this specification is the foregoing is only, is not intended to limit the protection of this specification
Scope.All spirit in this specification any modification, equivalent substitution and improvements made etc., should be included in this with principle
Within the protection domain of specification.
System, device, module or the unit that above-described embodiment illustrates, it can specifically be realized by computer chip or entity,
Or realized by the product with certain function.One kind typically realizes that equipment is computer.Specifically, computer for example may be used
Think personal computer, laptop computer, cell phone, camera phone, smart phone, personal digital assistant, media play
It is any in device, navigation equipment, electronic mail equipment, game console, tablet PC, wearable device or these equipment
The combination of equipment.
Computer-readable medium includes permanent and non-permanent, removable and non-removable media can be by any method
Or technology come realize information store.Information can be computer-readable instruction, data structure, the module of program or other data.
The example of the storage medium of computer includes, but are not limited to phase transition internal memory (PRAM), static RAM (SRAM), moved
State random access memory (DRAM), other kinds of random access memory (RAM), read-only storage (ROM), electric erasable
Programmable read only memory (EEPROM), fast flash memory bank or other memory techniques, read-only optical disc read-only storage (CD-ROM),
Digital versatile disc (DVD) or other optical storages, magnetic cassette tape, the storage of tape magnetic rigid disk or other magnetic storage apparatus
Or any other non-transmission medium, the information that can be accessed by a computing device available for storage.Define, calculate according to herein
Machine computer-readable recording medium does not include temporary computer readable media (transitory media), such as data-signal and carrier wave of modulation.
It should also be noted that, term " comprising ", "comprising" or its any other variant are intended to nonexcludability
Comprising so that process, method, commodity or equipment including a series of elements not only include those key elements, but also wrapping
Include the other element being not expressly set out, or also include for this process, method, commodity or equipment intrinsic want
Element.In the absence of more restrictions, the key element limited by sentence "including a ...", it is not excluded that wanted including described
Other identical element also be present in the process of element, method, commodity or equipment.
Each embodiment in this specification is described by the way of progressive, identical similar portion between each embodiment
Divide mutually referring to what each embodiment stressed is the difference with other embodiment.It is real especially for system
For applying example, because it is substantially similar to embodiment of the method, so description is fairly simple, related part is referring to embodiment of the method
Part explanation.
Claims (10)
1. one kind name entity recognition method, it is characterised in that methods described includes:
Obtain list entries;
Vectorization processing is carried out to the character in the list entries, obtains character vector sequence corresponding to the list entries;
The character vector sequence is handled using neural network algorithm, obtains the text feature sequence of the list entries;
The use condition random field processing text feature sequence, obtain name Entity recognition knot corresponding to the list entries
Fruit.
2. according to the method for claim 1, it is characterised in that described to handle the character vector using neural network algorithm
Sequence, the text feature sequence of the list entries is obtained, including:
The character vector sequence is handled using two-way shot and long term Memory Neural Networks, obtains the text feature of the list entries
Sequence.
3. according to the method for claim 1, it is characterised in that the character in the list entries carries out vectorization
Processing, obtains character vector sequence corresponding to the list entries, including:
Character-DUAL PROBLEMS OF VECTOR MAPPING dictionary is obtained, wherein, record has character corresponding with vector in the character-DUAL PROBLEMS OF VECTOR MAPPING dictionary
Relation;
The vector corresponding to the character in the list entries is searched from the character-DUAL PROBLEMS OF VECTOR MAPPING dictionary;
The vector corresponding to the character in the list entries is handled using notice mechanism, obtains weight corresponding to each vector
Value;
Vector weighted value corresponding with the vector corresponding to character in the list entries is subjected to point multiplication operation, obtains institute
State character vector sequence corresponding to list entries.
4. according to the method for claim 3, it is characterised in that the generating process of the character-DUAL PROBLEMS OF VECTOR MAPPING dictionary, bag
Include:
Obtain training corpus;
In units of character, the training corpus is split, obtains split result;
At least one of following pretreatments are carried out to the split result:Filtering spam character, filtering disable character, filtering low word
Meaningless symbol is accorded with and filtered, obtains pre-processed results;
Using pre-processed results described in word2vec Algorithm for Training, obtain obtaining character-DUAL PROBLEMS OF VECTOR MAPPING dictionary.
5. according to the method for claim 4, it is characterised in that described to use pretreatment knot described in word2vec Algorithm for Training
Fruit, obtain obtaining character-DUAL PROBLEMS OF VECTOR MAPPING dictionary, including:
Using pre-processed results described in skip-gram model trainings, obtain obtaining character-DUAL PROBLEMS OF VECTOR MAPPING dictionary.
6. one kind name entity recognition device, it is characterised in that described device includes:
Acquiring unit, for obtaining list entries;
First processing units, for carrying out vectorization processing to the character in the list entries, obtain the list entries pair
The character vector sequence answered;
Second processing unit, for handling the character vector sequence using neural network algorithm, obtain the list entries
Text feature sequence;
3rd processing unit, for the use condition random field processing text feature sequence, it is corresponding to obtain the list entries
Name Entity recognition result.
7. device according to claim 6, it is characterised in that the second processing unit, including:
Character vector series processing subelement, for handling the character vector sequence using two-way shot and long term Memory Neural Networks
Row, obtain the text feature sequence of the list entries.
8. device according to claim 6, it is characterised in that the first processing units, including:
Map dictionary and obtain subelement, for obtaining character-DUAL PROBLEMS OF VECTOR MAPPING dictionary, wherein, in the character-DUAL PROBLEMS OF VECTOR MAPPING dictionary
Record has the corresponding relation of character and vector;
Subelement is searched, for being searched from the character-DUAL PROBLEMS OF VECTOR MAPPING dictionary corresponding to the character in the list entries
Vector;
Notice mechanism handles subelement, for handled using notice mechanism corresponding to the character in the list entries to
Amount, obtain weighted value corresponding to each vector;
Character vector sequence obtains subelement, for the vector corresponding to the character in the list entries is corresponding with the vector
Weighted value carry out point multiplication operation, obtain character vector sequence corresponding to the list entries.
9. a kind of electronic equipment, it is characterised in that including:
Processor;And
It is arranged to store the memory of computer executable instructions, the executable instruction makes the processor when executed
The step of performing any one of claim 1-5 methods described.
A kind of 10. computer-readable storage medium, it is characterised in that the computer-readable recording medium storage one or more journey
Sequence, one or more of programs are when the electronic equipment for being included multiple application programs performs so that the electronic equipment is held
The step of any one of row claim 1-5 methods described.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711102742.5A CN107797992A (en) | 2017-11-10 | 2017-11-10 | Name entity recognition method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711102742.5A CN107797992A (en) | 2017-11-10 | 2017-11-10 | Name entity recognition method and device |
Publications (1)
Publication Number | Publication Date |
---|---|
CN107797992A true CN107797992A (en) | 2018-03-13 |
Family
ID=61534832
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201711102742.5A Pending CN107797992A (en) | 2017-11-10 | 2017-11-10 | Name entity recognition method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107797992A (en) |
Cited By (37)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108536679A (en) * | 2018-04-13 | 2018-09-14 | 腾讯科技(成都)有限公司 | Name entity recognition method, device, equipment and computer readable storage medium |
CN108628823A (en) * | 2018-03-14 | 2018-10-09 | 中山大学 | In conjunction with the name entity recognition method of attention mechanism and multitask coordinated training |
CN108874997A (en) * | 2018-06-13 | 2018-11-23 | 广东外语外贸大学 | A kind of name name entity recognition method towards film comment |
CN108920445A (en) * | 2018-04-23 | 2018-11-30 | 华中科技大学鄂州工业技术研究院 | A kind of name entity recognition method and device based on Bi-LSTM-CRF model |
CN109002436A (en) * | 2018-07-12 | 2018-12-14 | 上海金仕达卫宁软件科技有限公司 | Medical text terms automatic identifying method and system based on shot and long term memory network |
CN109241330A (en) * | 2018-08-20 | 2019-01-18 | 北京百度网讯科技有限公司 | The method, apparatus, equipment and medium of key phrase in audio for identification |
CN109241275A (en) * | 2018-07-05 | 2019-01-18 | 广东工业大学 | A kind of text subject clustering algorithm based on natural language processing |
CN109389091A (en) * | 2018-10-22 | 2019-02-26 | 重庆邮电大学 | The character identification system and method combined based on neural network and attention mechanism |
CN109446514A (en) * | 2018-09-18 | 2019-03-08 | 平安科技(深圳)有限公司 | Construction method, device and the computer equipment of news property identification model |
CN109614614A (en) * | 2018-12-03 | 2019-04-12 | 焦点科技股份有限公司 | A kind of BILSTM-CRF name of product recognition methods based on from attention |
CN109657239A (en) * | 2018-12-12 | 2019-04-19 | 电子科技大学 | The Chinese name entity recognition method learnt based on attention mechanism and language model |
CN109858041A (en) * | 2019-03-07 | 2019-06-07 | 北京百分点信息科技有限公司 | A kind of name entity recognition method of semi-supervised learning combination Custom Dictionaries |
CN109871535A (en) * | 2019-01-16 | 2019-06-11 | 四川大学 | A kind of French name entity recognition method based on deep neural network |
CN109871538A (en) * | 2019-02-18 | 2019-06-11 | 华南理工大学 | A kind of Chinese electronic health record name entity recognition method |
CN109918680A (en) * | 2019-03-28 | 2019-06-21 | 腾讯科技(上海)有限公司 | Entity recognition method, device and computer equipment |
CN110321547A (en) * | 2018-03-30 | 2019-10-11 | 北京四维图新科技股份有限公司 | A kind of name entity determines method and device |
CN110348016A (en) * | 2019-07-15 | 2019-10-18 | 昆明理工大学 | Text snippet generation method based on sentence association attention mechanism |
CN110348021A (en) * | 2019-07-17 | 2019-10-18 | 湖北亿咖通科技有限公司 | Character string identification method, electronic equipment, storage medium based on name physical model |
CN110362597A (en) * | 2019-06-28 | 2019-10-22 | 华为技术有限公司 | A kind of structured query language SQL injection detection method and device |
CN110516228A (en) * | 2019-07-04 | 2019-11-29 | 湖南星汉数智科技有限公司 | Name entity recognition method, device, computer installation and computer readable storage medium |
CN110543638A (en) * | 2019-09-10 | 2019-12-06 | 杭州橙鹰数据技术有限公司 | Named entity identification method and device |
WO2020048292A1 (en) * | 2018-09-04 | 2020-03-12 | 腾讯科技(深圳)有限公司 | Method and apparatus for generating network representation of neural network, storage medium, and device |
CN111079437A (en) * | 2019-12-20 | 2020-04-28 | 深圳前海达闼云端智能科技有限公司 | Entity identification method, electronic equipment and storage medium |
CN111222335A (en) * | 2019-11-27 | 2020-06-02 | 上海眼控科技股份有限公司 | Corpus correction method and device, computer equipment and computer-readable storage medium |
CN111222334A (en) * | 2019-11-15 | 2020-06-02 | 广州洪荒智能科技有限公司 | Named entity identification method, device, equipment and medium |
CN111291566A (en) * | 2020-01-21 | 2020-06-16 | 北京明略软件系统有限公司 | Event subject identification method and device and storage medium |
CN111782768A (en) * | 2020-06-30 | 2020-10-16 | 首都师范大学 | Fine-grained entity identification method based on hyperbolic space representation and label text interaction |
CN111885000A (en) * | 2020-06-22 | 2020-11-03 | 网宿科技股份有限公司 | Network attack detection method, system and device based on graph neural network |
CN112115258A (en) * | 2019-06-20 | 2020-12-22 | 腾讯科技(深圳)有限公司 | User credit evaluation method, device, server and storage medium |
CN112154509A (en) * | 2018-04-19 | 2020-12-29 | 皇家飞利浦有限公司 | Machine learning model with evolving domain-specific dictionary features for text annotation |
CN112215005A (en) * | 2020-10-12 | 2021-01-12 | 小红书科技有限公司 | Entity identification method and device |
WO2021146831A1 (en) * | 2020-01-20 | 2021-07-29 | 京东方科技集团股份有限公司 | Entity recognition method and apparatus, dictionary creation method, device, and medium |
CN113221884A (en) * | 2021-05-13 | 2021-08-06 | 中国科学技术大学 | Text recognition method and system based on low-frequency word storage memory |
CN113221885A (en) * | 2021-05-13 | 2021-08-06 | 中国科学技术大学 | Hierarchical modeling method and system based on whole words and radicals |
CN113362540A (en) * | 2021-06-11 | 2021-09-07 | 江苏苏云信息科技有限公司 | Traffic ticket business processing device, system and method based on multimode interaction |
CN113570480A (en) * | 2021-07-19 | 2021-10-29 | 北京华宇元典信息服务有限公司 | Judging document address information identification method and device and electronic equipment |
CN117034942A (en) * | 2023-10-07 | 2023-11-10 | 之江实验室 | Named entity recognition method, device, equipment and readable storage medium |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106569998A (en) * | 2016-10-27 | 2017-04-19 | 浙江大学 | Text named entity recognition method based on Bi-LSTM, CNN and CRF |
-
2017
- 2017-11-10 CN CN201711102742.5A patent/CN107797992A/en active Pending
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106569998A (en) * | 2016-10-27 | 2017-04-19 | 浙江大学 | Text named entity recognition method based on Bi-LSTM, CNN and CRF |
Non-Patent Citations (4)
Title |
---|
AKASH BHARADWAJ等: ""Phonologically Aware Neural Model for Named Entity Recognition in Low Resource Transfer Settings"", 《PROCEEDINGS OF THE 2016 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING》 * |
CHUANHAI DONG等: ""Character-Based LSTM-CRF with Radical-Level Features for Chinese Named Entity Recognition"", 《NATURAL LANGUAGE UNDERSTANDING AND INTELLIGENT APPLICATIONS(NLPCC 2016)》 * |
ROBERT_AI: "" 神经网络结构在命名实体识别(NER)中的应用"", 《HTTPS://WWW.CNBLOGS.COM/ROBERT-DLUT/P/6847401.HTML》 * |
机器之心: ""如何用深度学习做自然语言处理?这里有份最佳实践清单"", 《HTTPS://WWW.JIQIZHIXIN.COM/ARTICLES/2017-07-26-5》 * |
Cited By (57)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108628823A (en) * | 2018-03-14 | 2018-10-09 | 中山大学 | In conjunction with the name entity recognition method of attention mechanism and multitask coordinated training |
CN108628823B (en) * | 2018-03-14 | 2022-07-01 | 中山大学 | Named entity recognition method combining attention mechanism and multi-task collaborative training |
CN110321547B (en) * | 2018-03-30 | 2024-06-11 | 北京四维图新科技股份有限公司 | Named entity determination method and device |
CN110321547A (en) * | 2018-03-30 | 2019-10-11 | 北京四维图新科技股份有限公司 | A kind of name entity determines method and device |
CN108536679B (en) * | 2018-04-13 | 2022-05-20 | 腾讯科技(成都)有限公司 | Named entity recognition method, device, equipment and computer readable storage medium |
CN108536679A (en) * | 2018-04-13 | 2018-09-14 | 腾讯科技(成都)有限公司 | Name entity recognition method, device, equipment and computer readable storage medium |
CN112154509A (en) * | 2018-04-19 | 2020-12-29 | 皇家飞利浦有限公司 | Machine learning model with evolving domain-specific dictionary features for text annotation |
CN108920445A (en) * | 2018-04-23 | 2018-11-30 | 华中科技大学鄂州工业技术研究院 | A kind of name entity recognition method and device based on Bi-LSTM-CRF model |
CN108920445B (en) * | 2018-04-23 | 2022-06-17 | 华中科技大学鄂州工业技术研究院 | Named entity identification method and device based on Bi-LSTM-CRF model |
CN108874997A (en) * | 2018-06-13 | 2018-11-23 | 广东外语外贸大学 | A kind of name name entity recognition method towards film comment |
CN109241275A (en) * | 2018-07-05 | 2019-01-18 | 广东工业大学 | A kind of text subject clustering algorithm based on natural language processing |
CN109241275B (en) * | 2018-07-05 | 2022-02-11 | 广东工业大学 | Text topic clustering algorithm based on natural language processing |
CN109002436A (en) * | 2018-07-12 | 2018-12-14 | 上海金仕达卫宁软件科技有限公司 | Medical text terms automatic identifying method and system based on shot and long term memory network |
US11308937B2 (en) * | 2018-08-20 | 2022-04-19 | Beijing Baidu Netcom Science And Technology Co., Ltd. | Method and apparatus for identifying key phrase in audio, device and medium |
CN109241330A (en) * | 2018-08-20 | 2019-01-18 | 北京百度网讯科技有限公司 | The method, apparatus, equipment and medium of key phrase in audio for identification |
US11875220B2 (en) | 2018-09-04 | 2024-01-16 | Tencent Technology (Shenzhen) Company Limited | Method, apparatus, and storage medium for generating network representation for neural network |
WO2020048292A1 (en) * | 2018-09-04 | 2020-03-12 | 腾讯科技(深圳)有限公司 | Method and apparatus for generating network representation of neural network, storage medium, and device |
CN109446514A (en) * | 2018-09-18 | 2019-03-08 | 平安科技(深圳)有限公司 | Construction method, device and the computer equipment of news property identification model |
CN109389091A (en) * | 2018-10-22 | 2019-02-26 | 重庆邮电大学 | The character identification system and method combined based on neural network and attention mechanism |
CN109389091B (en) * | 2018-10-22 | 2022-05-03 | 重庆邮电大学 | Character recognition system and method based on combination of neural network and attention mechanism |
CN109614614A (en) * | 2018-12-03 | 2019-04-12 | 焦点科技股份有限公司 | A kind of BILSTM-CRF name of product recognition methods based on from attention |
CN109614614B (en) * | 2018-12-03 | 2021-04-02 | 焦点科技股份有限公司 | BILSTM-CRF product name identification method based on self-attention |
CN109657239A (en) * | 2018-12-12 | 2019-04-19 | 电子科技大学 | The Chinese name entity recognition method learnt based on attention mechanism and language model |
CN109657239B (en) * | 2018-12-12 | 2020-04-21 | 电子科技大学 | Chinese named entity recognition method based on attention mechanism and language model learning |
CN109871535B (en) * | 2019-01-16 | 2020-01-10 | 四川大学 | French named entity recognition method based on deep neural network |
CN109871535A (en) * | 2019-01-16 | 2019-06-11 | 四川大学 | A kind of French name entity recognition method based on deep neural network |
CN109871538A (en) * | 2019-02-18 | 2019-06-11 | 华南理工大学 | A kind of Chinese electronic health record name entity recognition method |
CN109858041B (en) * | 2019-03-07 | 2023-02-17 | 北京百分点科技集团股份有限公司 | Named entity recognition method combining semi-supervised learning with user-defined dictionary |
CN109858041A (en) * | 2019-03-07 | 2019-06-07 | 北京百分点信息科技有限公司 | A kind of name entity recognition method of semi-supervised learning combination Custom Dictionaries |
CN109918680A (en) * | 2019-03-28 | 2019-06-21 | 腾讯科技(上海)有限公司 | Entity recognition method, device and computer equipment |
CN112115258A (en) * | 2019-06-20 | 2020-12-22 | 腾讯科技(深圳)有限公司 | User credit evaluation method, device, server and storage medium |
CN112115258B (en) * | 2019-06-20 | 2023-09-26 | 腾讯科技(深圳)有限公司 | Credit evaluation method and device for user, server and storage medium |
CN110362597A (en) * | 2019-06-28 | 2019-10-22 | 华为技术有限公司 | A kind of structured query language SQL injection detection method and device |
CN110516228A (en) * | 2019-07-04 | 2019-11-29 | 湖南星汉数智科技有限公司 | Name entity recognition method, device, computer installation and computer readable storage medium |
CN110348016A (en) * | 2019-07-15 | 2019-10-18 | 昆明理工大学 | Text snippet generation method based on sentence association attention mechanism |
CN110348016B (en) * | 2019-07-15 | 2022-06-14 | 昆明理工大学 | Text abstract generation method based on sentence correlation attention mechanism |
CN110348021A (en) * | 2019-07-17 | 2019-10-18 | 湖北亿咖通科技有限公司 | Character string identification method, electronic equipment, storage medium based on name physical model |
CN110543638B (en) * | 2019-09-10 | 2022-12-27 | 杭州橙鹰数据技术有限公司 | Named entity identification method and device |
CN110543638A (en) * | 2019-09-10 | 2019-12-06 | 杭州橙鹰数据技术有限公司 | Named entity identification method and device |
CN111222334A (en) * | 2019-11-15 | 2020-06-02 | 广州洪荒智能科技有限公司 | Named entity identification method, device, equipment and medium |
CN111222335A (en) * | 2019-11-27 | 2020-06-02 | 上海眼控科技股份有限公司 | Corpus correction method and device, computer equipment and computer-readable storage medium |
CN111079437A (en) * | 2019-12-20 | 2020-04-28 | 深圳前海达闼云端智能科技有限公司 | Entity identification method, electronic equipment and storage medium |
WO2021146831A1 (en) * | 2020-01-20 | 2021-07-29 | 京东方科技集团股份有限公司 | Entity recognition method and apparatus, dictionary creation method, device, and medium |
CN111291566A (en) * | 2020-01-21 | 2020-06-16 | 北京明略软件系统有限公司 | Event subject identification method and device and storage medium |
CN111291566B (en) * | 2020-01-21 | 2023-04-28 | 北京明略软件系统有限公司 | Event main body recognition method, device and storage medium |
CN111885000A (en) * | 2020-06-22 | 2020-11-03 | 网宿科技股份有限公司 | Network attack detection method, system and device based on graph neural network |
CN111782768A (en) * | 2020-06-30 | 2020-10-16 | 首都师范大学 | Fine-grained entity identification method based on hyperbolic space representation and label text interaction |
WO2022001333A1 (en) * | 2020-06-30 | 2022-01-06 | 首都师范大学 | Hyperbolic space representation and label text interaction-based fine-grained entity recognition method |
CN112215005A (en) * | 2020-10-12 | 2021-01-12 | 小红书科技有限公司 | Entity identification method and device |
CN113221885A (en) * | 2021-05-13 | 2021-08-06 | 中国科学技术大学 | Hierarchical modeling method and system based on whole words and radicals |
CN113221884A (en) * | 2021-05-13 | 2021-08-06 | 中国科学技术大学 | Text recognition method and system based on low-frequency word storage memory |
CN113221885B (en) * | 2021-05-13 | 2022-09-06 | 中国科学技术大学 | Hierarchical modeling method and system based on whole words and radicals |
CN113221884B (en) * | 2021-05-13 | 2022-09-06 | 中国科学技术大学 | Text recognition method and system based on low-frequency word storage memory |
CN113362540A (en) * | 2021-06-11 | 2021-09-07 | 江苏苏云信息科技有限公司 | Traffic ticket business processing device, system and method based on multimode interaction |
CN113570480A (en) * | 2021-07-19 | 2021-10-29 | 北京华宇元典信息服务有限公司 | Judging document address information identification method and device and electronic equipment |
CN117034942A (en) * | 2023-10-07 | 2023-11-10 | 之江实验室 | Named entity recognition method, device, equipment and readable storage medium |
CN117034942B (en) * | 2023-10-07 | 2024-01-09 | 之江实验室 | Named entity recognition method, device, equipment and readable storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107797992A (en) | Name entity recognition method and device | |
CN107679039B (en) | Method and device for determining statement intention | |
CN108629687B (en) | Anti-money laundering method, device and equipment | |
CN110276023B (en) | POI transition event discovery method, device, computing equipment and medium | |
CN110619044B (en) | Emotion analysis method, system, storage medium and equipment | |
Wang et al. | Dialogue intent classification with character-CNN-BGRU networks | |
Li et al. | A method of emotional analysis of movie based on convolution neural network and bi-directional LSTM RNN | |
Alexandridis et al. | A knowledge-based deep learning architecture for aspect-based sentiment analysis | |
CN109271624A (en) | A kind of target word determines method, apparatus and storage medium | |
CN108694183A (en) | A kind of search method and device | |
CN112132238A (en) | Method, device, equipment and readable medium for identifying private data | |
CN113947086A (en) | Sample data generation method, training method, corpus generation method and apparatus | |
CN116975271A (en) | Text relevance determining method, device, computer equipment and storage medium | |
Park et al. | Sensitive data identification in structured data through genner model based on text generation and ner | |
US20200110834A1 (en) | Dynamic Linguistic Assessment and Measurement | |
CN114328841A (en) | Question-answer model training method and device, question-answer method and device | |
CN116702784B (en) | Entity linking method, entity linking device, computer equipment and storage medium | |
CN103514194B (en) | Determine method and apparatus and the classifier training method of the dependency of language material and entity | |
CN116955579A (en) | Chat reply generation method and device based on keyword knowledge retrieval | |
CN116719915A (en) | Intelligent question-answering method, device, equipment and storage medium | |
CN117216617A (en) | Text classification model training method, device, computer equipment and storage medium | |
Arbaatun et al. | Hate speech detection on Twitter through Natural Language Processing using LSTM model | |
Zhu et al. | A named entity recognition model based on ensemble learning | |
Porjazovski et al. | Attention-based end-to-end named entity recognition from speech | |
CN115129885A (en) | Entity chain pointing method, device, equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
CB02 | Change of applicant information |
Address after: 100081 No.101, 1st floor, building 14, 27 Jiancai Chengzhong Road, Haidian District, Beijing Applicant after: Beijing PERCENT Technology Group Co.,Ltd. Address before: 100081 16 / F, block a, Beichen Century Center, building 2, courtyard 8, Beichen West Road, Chaoyang District, Beijing Applicant before: BEIJING BAIFENDIAN INFORMATION SCIENCE & TECHNOLOGY Co.,Ltd. |
|
CB02 | Change of applicant information | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20180313 |
|
RJ01 | Rejection of invention patent application after publication |