CN107797992A - Name entity recognition method and device - Google Patents

Name entity recognition method and device Download PDF

Info

Publication number
CN107797992A
CN107797992A CN201711102742.5A CN201711102742A CN107797992A CN 107797992 A CN107797992 A CN 107797992A CN 201711102742 A CN201711102742 A CN 201711102742A CN 107797992 A CN107797992 A CN 107797992A
Authority
CN
China
Prior art keywords
character
vector
list entries
sequence
obtains
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201711102742.5A
Other languages
Chinese (zh)
Inventor
苏海波
刘钰
刘译璟
杨哲铭
张康利
宋青原
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baifendian Information Science & Technology Co Ltd
Original Assignee
Beijing Baifendian Information Science & Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baifendian Information Science & Technology Co Ltd filed Critical Beijing Baifendian Information Science & Technology Co Ltd
Priority to CN201711102742.5A priority Critical patent/CN107797992A/en
Publication of CN107797992A publication Critical patent/CN107797992A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Character Discrimination (AREA)

Abstract

The embodiment of the present application, which provides a kind of name entity recognition method and device, this method, to be included:Obtain list entries;Vectorization processing is carried out to the character in the list entries, obtains character vector sequence corresponding to the list entries;The character vector sequence is handled using neural network algorithm, obtains the text feature sequence of the list entries;The use condition random field processing text feature sequence, obtain name Entity recognition result corresponding to the list entries.Because character can characterize the quantity that more fine-grained feature and character quantity be much smaller than word, neural network algorithm it is contemplated that in list entries each character contextual information, and condition random field can avoid marking bias problem, therefore, technical scheme is by way of character vector, neural network algorithm and condition random field this three is combined, to realize name Entity recognition, preferable recognition effect can be reached.

Description

Name entity recognition method and device
Technical field
The application is related to field of computer technology, more particularly to name entity recognition method and device.
Background technology
Natural language processing (Natural Language Processing, NLP) is computer science, artificial intelligence, language The field of the interphase interaction of computer and human language of interest Yan Xue, is computer science and artificial intelligence field In an important directions.NLP research categories, which are covered, can realize and carry out efficient communication with natural language between people and computer Various theoretical and methods, the field being related to includes nature semantic understanding, retrieval, information extraction, machine translation, automatic question answering System etc..
As a basic task in NLP, name Entity recognition (Named Entity Recognition, NER) is Finger identifies technology of the entity with particular category such as name, place name, mechanism name, proper noun from text.NER is The important foundation instrument of the application fields such as information retrieval, inquiry classification, question answering system, syntactic analysis, machine translation, it identifies effect Fruit directly affects the subsequent treatment effect in aforementioned applications field.Therefore it provides a kind of recognition effect preferably names identification technology, Those skilled in the art's technical problem urgently to be resolved hurrily is turned into.
The content of the invention
The purpose of the embodiment of the present application is to provide a kind of name entity recognition method and device, real to reach preferably name Body recognition effect.
To reach above-mentioned technical purpose, what the embodiment of the present application was realized in:
According to the first aspect of the embodiment of the present application, there is provided one kind name entity recognition method, methods described include:
Obtain list entries;
Vectorization processing is carried out to the character in the list entries, obtains character vector sequence corresponding to the list entries Row;
The character vector sequence is handled using neural network algorithm, obtains the text feature sequence of the list entries;
The use condition random field processing text feature sequence, obtain name Entity recognition corresponding to the list entries As a result.
It is described to handle the character vector sequence using neural network algorithm in one embodiment of the application, obtain The text feature sequence of the list entries, including:
The character vector sequence is handled using two-way shot and long term Memory Neural Networks, obtains the text of the list entries Characteristic sequence.
In one embodiment of the application, the character in the list entries carries out vectorization processing, obtains Character vector sequence corresponding to the list entries, including:
Character-DUAL PROBLEMS OF VECTOR MAPPING dictionary is obtained, wherein, record has character and vector in the character-DUAL PROBLEMS OF VECTOR MAPPING dictionary Corresponding relation;
The vector corresponding to the character in the list entries is searched from the character-DUAL PROBLEMS OF VECTOR MAPPING dictionary;
The vector corresponding to the character in the list entries is handled using notice mechanism, obtains weighing corresponding to each vector Weight values;
Vector weighted value corresponding with the vector corresponding to character in the list entries is subjected to point multiplication operation, obtained To character vector sequence corresponding to the list entries.
In one embodiment of the application, the generating process of the character-DUAL PROBLEMS OF VECTOR MAPPING dictionary, including:
Obtain training corpus;
In units of character, the training corpus is split, obtains split result;
At least one of following pretreatments are carried out to the split result:Filtering spam character, filtering deactivation character, filtering are low Frequency character and the meaningless symbol of filtering, obtain pre-processed results;
Using pre-processed results described in word2vec Algorithm for Training, obtain obtaining character-DUAL PROBLEMS OF VECTOR MAPPING dictionary.
It is described using pre-processed results described in word2vec Algorithm for Training in one embodiment of the application, obtained Character-DUAL PROBLEMS OF VECTOR MAPPING dictionary is taken, including:
Using pre-processed results described in skip-gram model trainings, obtain obtaining character-DUAL PROBLEMS OF VECTOR MAPPING dictionary.
According to the second aspect of the embodiment of the present application, there is provided one kind name entity recognition device, described device include:
Acquiring unit, for obtaining list entries;
First processing units, for carrying out vectorization processing to the character in the list entries, obtain the input sequence Character vector sequence corresponding to row;
Second processing unit, for handling the character vector sequence using neural network algorithm, obtain the input sequence The text feature sequence of row;
3rd processing unit, for the use condition random field processing text feature sequence, obtain the list entries Corresponding name Entity recognition result.
In one embodiment of the application, the second processing unit, including:
Character vector series processing subelement, for handling the character vector using two-way shot and long term Memory Neural Networks Sequence, obtain the text feature sequence of the list entries.
In one embodiment of the application, the first processing units, including:
Map dictionary and obtain subelement, for obtaining character-DUAL PROBLEMS OF VECTOR MAPPING dictionary, wherein, the character-DUAL PROBLEMS OF VECTOR MAPPING word Record has the corresponding relation of character and vector in allusion quotation;
Subelement is searched, it is right for searching the institute of the character in the list entries from the character-DUAL PROBLEMS OF VECTOR MAPPING dictionary The vector answered;
Notice mechanism handles subelement, for being handled using notice mechanism corresponding to the character in the list entries Vector, obtain it is each vector corresponding to weighted value;
Character vector sequence obtain subelement, for by corresponding to the character in the list entries vector with the vector Corresponding weighted value carries out point multiplication operation, obtains character vector sequence corresponding to the list entries.
In one embodiment of the application, described device also includes:Map dictionary creation unit;
The mapping dictionary creation unit, including:
Training corpus obtains subelement, for obtaining training corpus;
Character splits subelement, in units of character, being split to the training corpus, obtaining split result;
Subelement is pre-processed, for carrying out at least one of following pretreatments to the split result:Filtering spam character, mistake Filter disables character, filtering low character and the meaningless symbol of filtering, obtains pre-processed results;
Dictionary training subelement is mapped, for using pre-processed results described in word2vec Algorithm for Training, obtaining obtaining word Symbol-DUAL PROBLEMS OF VECTOR MAPPING dictionary.
In one embodiment of the application, the mapping dictionary training subelement, it is specifically used for:
Using pre-processed results described in skip-gram model trainings, obtain obtaining character-DUAL PROBLEMS OF VECTOR MAPPING dictionary.
According to the third aspect of the embodiment of the present application, there is provided a kind of electronic equipment, including:Processor;And
It is arranged to store the memory of computer executable instructions, the executable instruction makes the place when executed Manage device and perform following operate:
Obtain list entries;
Vectorization processing is carried out to the character in the list entries, obtains character vector sequence corresponding to the list entries Row;
The character vector sequence is handled using neural network algorithm, obtains the text feature sequence of the list entries;
The use condition random field processing text feature sequence, obtain name Entity recognition corresponding to the list entries As a result.
According to the fourth aspect of the embodiment of the present application, there is provided a kind of computer-readable storage medium, the computer-readable storage Media storage one or more program, one or more of programs perform when the electronic equipment for being included multiple application programs When so that the electronic equipment performs following operate:
Obtain list entries;
Vectorization processing is carried out to the character in the list entries, obtains character vector sequence corresponding to the list entries Row;
The character vector sequence is handled using neural network algorithm, obtains the text feature sequence of the list entries;
The use condition random field processing text feature sequence, obtain name Entity recognition corresponding to the list entries As a result.
The technical scheme provided from above the embodiment of the present application, the embodiment of the present application can be by input sequences to be identified Each character in row is converted to corresponding vector, vector corresponding to each character is handled using neural network algorithm, to extract The text feature sequence of list entries to be identified, last use condition random field handle text feature sequence, obtained to be identified List entries corresponding to name Entity recognition result.Because character can characterize more fine-grained feature and character quantity is remote Less than the quantity of word, neural network algorithm it is contemplated that in list entries each character contextual information, and condition with Airport can avoid marking bias problem, and therefore, the embodiment of the present application is by by character vector, neural network algorithm and condition The mode that this three of random field is combined, to realize name Entity recognition, preferable recognition effect can be reached.
Brief description of the drawings
, below will be to embodiment or existing in order to illustrate more clearly of the embodiment of the present application or technical scheme of the prior art There is the required accompanying drawing used in technology description to be briefly described, it should be apparent that, drawings in the following description are only this Some embodiments described in specification, for those of ordinary skill in the art, before creative labor is not paid Put, other accompanying drawings can also be obtained according to these accompanying drawings.
Fig. 1 is the schematic diagram for the CBOW models that the application provides;
Fig. 2 is the schematic diagram for the Skip-Gram models that the application provides;
Fig. 3 is the flow chart of the name entity recognition method of one embodiment of the application;
Fig. 4 is the flow chart of the character-DUAL PROBLEMS OF VECTOR MAPPING dictionary creation method of one embodiment of the application;
Fig. 5 is the structural representation of the name entity recognition device of one embodiment of the application;
Fig. 6 is the structural representation of the electronic equipment of one embodiment of the application.
Embodiment
In order that those skilled in the art more fully understand the technical scheme in this specification, below in conjunction with the application Accompanying drawing in embodiment, the technical scheme in the embodiment of the present application is clearly and completely described, it is clear that described reality It is only this specification part of the embodiment to apply example, rather than whole embodiments.Based on the embodiment in this specification, ability The every other embodiment that domain those of ordinary skill is obtained under the premise of creative work is not made, should all belong to this theory The scope of bright book protection.
The embodiment of the present application provides a kind of name entity recognition method and device.
In order to make it easy to understand, some technical terms and concept for being related in the embodiment of the present application are situated between first below Continue.
Entity recognition (Named Entity Recognition, NER) is named, refers to identify with specific from text Technology of the entity of classification such as name, place name, mechanism name, proper noun.It is that sequence labelling is asked on NER process nature Topic, i.e., for given input text sequence, mark is stamped to each word (or word).
Mark can use defined below:It is identified as with name (PER) in NER, place name (LOC), mechanism name (ORG) Example, then for following input text:" Zhang San comes from Xi'an, graduates from Peking University ", its sequence labelling result are:
/ tri-/I-PER of B-PER come/O finishes from/O west/B-LOC peaces/I-LOC ,/O/O industry/O is in/O north/B-ORG capital/I- ORG is big/I-ORG/I-ORG;
After parsing, NER results are:
Zhang San/PER comes from Xi'an/LOC, graduates from Peking University/ORG;Wherein, the meaning of above mark refers to table 1 below:
Mark Lexical or textual analysis
B-PER The bebinning character of name
I-PER The centre of name and termination character
B-LOC The bebinning character of place name
I-LOC The centre of place name and termination character
B-ORG The bebinning character of institution term
I-ORG The centre of institution term and termination character
O Other characters
Table 1
Word2vector algorithms, be the algorithm of Google companies exploitation, by unsupervised training, by word become one it is several The vector of hundred dimensions, this vector can catch the semantic dependency between word (or character), and also known as term vector or word is embedding Enter.
Skip-gram models, are one kind of word2vector algorithms, and it predicts the word of surrounding by current word, especially It is applied to the prediction under the conditions of big data, as shown in figure 1, the word w (t-2), w (t-1), w that are gone using word w (t) around prediction And w (t+2) (t+1).
CBOW models, it is one kind of word2vector algorithms, the word of its based on context pre- measured center, as shown in Fig. 2 According to word w (t-2), w (t-1), w (t+1) and w (t+2) the prediction word w (t) around word w (t), the vector of these words is done and connected Connect, contextual information can be sufficiently reserved.
What notice mechanism (Attention Mechanism) was simulated is the attention model of human brain, when we read one During article, the word being just only currently seen of eye focus in fact, this when, the brain of people was primarily upon in this part On word, that is to say, that this when, concern of the human brain to entire article was not balanced, was to have what certain weight was distinguished. Notice mechanism has huge castering action in Sequence Learning task, in codec framework, by coding stage Attention model is added, data weighting conversion is carried out to source data sequence, the natural way of sequence pair sequence can be effectively improved Under system performance.
Shot and long term memory network (Long Short-Term Memory, LSTM), is a kind of time recurrent neural network, is fitted Together in being spaced in processing and predicted time sequence and postpone relatively long critical event.It passes through " Memory-Gate " and " forgetting door " To control the going or staying of historical information, the long Route Dependence of conventional recycle neutral net is efficiently solved the problems, such as.
Condition random field (Conditional Random Field, CRF), is that natural language processing field was commonly used in recent years One of algorithm, be usually used in syntactic analysis, name Entity recognition, part-of-speech tagging etc..CRF is become using Markov Chain as implicit The probability metastasis model of amount, variable is implied by Observable condition discrimination, belongs to discrimination model.
Fig. 3 is the flow chart of the name entity recognition method of one embodiment of the application, and this method can be by service end Perform, can also be performed by terminal device, the service end can include:Server or server cluster, the terminal device can be with Including:Smart mobile phone, tablet personal computer, notebook/desktop computer etc., as shown in figure 3, this method may comprise steps of:
In step 301, list entries is obtained.
In the embodiment of the present application, list entries can be text sequence, or sound bite.
In step 302, vectorization processing is carried out to the character in list entries, obtains character corresponding to the list entries Sequence vector.
In the embodiment of the present application, when list entries is text sequence, list entries is directly subjected to Character segmentation processing, Obtain the character string (x of list entries1,x2,...,xn), wherein, xiFor i-th of character in list entries, n is input sequence The character number of row.
In the embodiment of the present application, when list entries is sound bite, sound bite is first converted into corresponding text sequence Row, then Character segmentation processing is carried out to text sequence, obtain the character string (x of text sequence1,x2,...,xn), also It is the character string (x of list entries1,x2,...,xn), wherein, xiFor i-th of character in text sequence, n is text sequence Character number, 1≤i≤n.
In an optional embodiment, above-mentioned steps 302 can include:S31 and S32, wherein,
In S31, obtain character-DUAL PROBLEMS OF VECTOR MAPPING dictionary, wherein, in the character-DUAL PROBLEMS OF VECTOR MAPPING dictionary record have character with The corresponding relation of vector;
In S32, the vector corresponding to the character in list entries is searched from the character-DUAL PROBLEMS OF VECTOR MAPPING dictionary, will be looked into The sequence that the vector found is formed is defined as character vector sequence corresponding to the list entries.
In the present embodiment, the character string (x of list entries is being obtained1,x2,...,xn) after, obtain character-vector Dictionary is mapped, the character x in list entries is searched from the character-DUAL PROBLEMS OF VECTOR MAPPING dictionaryiCorresponding vector vi, will find Vector viSequence (the v formed1,v2,...,vn), it is defined as character vector sequence (v ' corresponding to the list entries1,v ′2,...,v′n), wherein, v 'i=vi
In one preferred embodiment, above-mentioned steps 302 can include:S33, S34, S35 and S36, wherein,
In S33, obtain character-DUAL PROBLEMS OF VECTOR MAPPING dictionary, wherein, in the character-DUAL PROBLEMS OF VECTOR MAPPING dictionary record have character with The corresponding relation of vector;
In S34, the vector corresponding to the character in list entries is searched from the character-DUAL PROBLEMS OF VECTOR MAPPING dictionary;
In S35, the vector corresponding to the character in the list entries is handled using notice mechanism, it is right to obtain each vector The weighted value answered;
It should be noted that the weighted value of vector reflects the vectorial significance level, weighted value is bigger, illustrates that vector is got over It is important.
In present embodiment, can use Bi-LSTM (Bi-directional Recurrent Neural Network, Two-way long short-term memory Recognition with Recurrent Neural Network) realize notice mechanism.
In the present embodiment, the character string (x of list entries is being obtained1,x2,...,xn) after, obtain character-vector Dictionary is mapped, the character x in list entries is searched from the character-DUAL PROBLEMS OF VECTOR MAPPING dictionaryiCorresponding vector vi, will look into afterwards The vector v foundiSequence (the v formed1,v2,...,vn) input in the correlation model of notice mechanism, export (at1, at2,...,atn), wherein, atiFor viCorresponding weighted value.
In S36, the vector weighted value corresponding with the vector corresponding to the character in the list entries is subjected to dot product Computing, obtain character vector sequence corresponding to the list entries.
Character x in list entries is obtainediCorresponding vector vi, and viCorresponding weighted value atiAfterwards, v is calculatedi* ati, by (v1*at1,v2*at2,...,vn*atn) it is defined as character vector sequence (v ' corresponding to list entries1,v′2,...,v ′n) wherein, v 'i=vi*ati
In the embodiment of the present application, character-DUAL PROBLEMS OF VECTOR MAPPING dictionary can be previously generated, it is contemplated that word2vector algorithm energy Enough vectors (usual hundreds of dimension) become each character in one lower dimensional space, semantic dependency between such character can be with With the distance of vector come approximate description, therefore training corpus can be trained using word2vector algorithms, generate word Symbol-DUAL PROBLEMS OF VECTOR MAPPING dictionary;Now, as shown in figure 4, Fig. 4 shows the character based on Skip-gram models-DUAL PROBLEMS OF VECTOR MAPPING dictionary The flow chart of generation method, may comprise steps of:
In S401, training corpus is obtained.
In the embodiment of the present application, training corpus includes a plurality of sentence.
In the embodiment of the present application, it is contemplated that word2vec algorithms are unsupervised learning algorithms, therefore are collecting related instruction When practicing language material, the data volume of training corpus is the bigger the better, in addition, these language materials are mainly for corresponding application scenarios, and to the greatest extent Amount covers most of data type of the scene.In actual applications, training corpus can be the language material marked, can also For the language material not marked, the embodiment of the present application is not construed as limiting to this.
In S402, in units of character, training corpus is split, obtains split result.
In the embodiment of the present application, every sentence in training corpus is divided into character one by one.
In S403, at least one of following pretreatments are carried out to split result:Filtering spam character, filtering disable character, Filtering low character and the meaningless symbol of filtering, obtain pre-processed results.
In the embodiment of the present application, in order to improve treatment effeciency and effect, the garbage character in split result can be filtered out, Filtering stops low-frequency word and meaningless symbol, is organized into the requirement form of word2vec algorithms, that is, represents input and output, to establish Training objective is prepared.
In S404, using word2vec Algorithm for Training pre-processed results, obtain obtaining character-DUAL PROBLEMS OF VECTOR MAPPING dictionary.
In the embodiment of the present application, the CBOW model preprocessing results in word2vector algorithms can be used, are obtained Character-DUAL PROBLEMS OF VECTOR MAPPING dictionary;The Skip-gram model preprocessing results in word2vector algorithms can also be used, are obtained Take character-DUAL PROBLEMS OF VECTOR MAPPING dictionary.
Bigger in view of the data volume of training corpus, the content trained in obtained character-DUAL PROBLEMS OF VECTOR MAPPING dictionary is more complete Face and accurate, and Skip-gram models are particularly well suited to big data, therefore, prioritizing selection uses Skip-gram model preprocessings As a result, obtain obtaining character-DUAL PROBLEMS OF VECTOR MAPPING dictionary.
Compared to the term vector in correlation technique, vectorization technology of the embodiment of the present application based on character can be brought following Advantage:More fine-grained character feature can be characterized;Because character quantity is much smaller than word quantity, obtained model space-consuming It is minimum, greatly improve model loading velocity;Over time, neologisms can continue to bring out, the term vector trained before Increasingly severe feature hit rate downslide problem occurs in model, and the vector based on character then effectively prevent this problem, Because it is relatively seldom to be continuously created the fresh character come every year.
In step 303, the character vector sequence is handled using neural network algorithm, the text for obtaining list entries is special Levy sequence.
It is understood that stamping the action of mark to each character in list entries, a sequence can be abstracted as Mark problem, its essence are also a classification task in fact, i.e., it needs to be determined that the class categories of each character.
In the embodiment of the present application, used neural network algorithm, its core concept is, for point of current each character Class differentiates, considers historical information above as input, solve thes problems, such as character independence.
In the embodiment of the present application, by character vector sequence (v ' corresponding to list entries1,v′2,...,v′n) it is input to nerve Handled in network algorithm, obtain the text feature sequence (h of list entries1,h2,...,hn), wherein, hiFor xiCharacter feature to Amount, hiContain xiCharacteristic information.
As an example, character vector sequence corresponding to Recognition with Recurrent Neural Network (RNN) processing list entries can be used.
In view of standard RNN due to the problem of producing long Route Dependence, i.e., it is right for the longer historical information in path It is smaller in the classification results influence of current character, even if these information have direct correlation with current problem.Illustrate:
Consider input " I studies abroad in the U.S. ... ..., can say a bite idiomatic English ", wherein, ellipsis represents other Longer contextual information, italics represent the vocabulary for being currently needed for prediction.When we have seen that before " English " this word, we It may be a kind of name of language that next word, which may be predicted, but be specifically which kind of language needs based on context to come really It is fixed.The RNN of standard is due to its structure problem, and possibly can not remembering previously mentioned " I studies abroad in the U.S. ", this is highly useful Information, so as to which the word of unpredictable next appearance is probably " English ", this phenomenon is referred to as " long Route Dependence ".LSTM's goes out Now solves above mentioned problem, its main thought is, on the basis of standard RNN, goes to control the defeated of contextual information using " door " Enter, specifically, controlling the input and output of historical information by several " doors ", each " door " is done non-linear by sigmoid functions Change is normalized between 0~1, and its value then shows that less historical information passes through " door " closer to 0;On the contrary, closer to 1, then table It is bright to there are more information to pass through " door ".These " doors " " can both remember " useful information, " can forget " useless information again. In this way, the relevant information meeting selective retention outside relatively long distance is got off, so that current character mark classification refers to, so as to carry Rise prediction effect.
Further, in order to improve treatment effect, two-way shot and long term Memory Neural Networks processing character vector can be used Sequence, obtain the text feature sequence of list entries.
Specifically, by character vector sequence (v ' corresponding to list entries1,v′2,...,v′n) be input in two-way LSTM, Forward direction LSTM output is (hf1,hf2,...,hfn), backward LSTM outputs are (hb1,hb2,...,hbn), the two enters row vector spelling (h is obtained after connecing1,h2,...,hn), wherein, vi' corresponding forward direction LSTM outputs are hfi, v 'iCorresponding backward LSTM, which is exported, is hbi, hfiCharacterize xiHistory context information, and export h backwardbiThen characterize xiFollowing contextual information.
In step 304, use condition random field processing text feature sequence, name entity corresponding to list entries is obtained Recognition result.
Conditional random field models are a kind of undirected graph models, it be marked in given needs observation sequence (word, sentence, Numerical value etc.) under conditions of, calculate the joint probability distribution of whole flag sequence.
In the embodiment of the present application, the sequence learning algorithm of condition random field can be improved iteration method of scales, condition random The prediction algorithm of field can be viterbi algorithm.
, can be by text feature sequence (h corresponding to list entries in the embodiment of the present application1,h2,...,hn) it is input to line Property chain condition random field, specifically, study when, utilize text feature sequence (h1,h2,...,hn) pass through condition random field Learning algorithm (such as improved iteration method of scales) obtains output sequence (s1,s2,...,sn) and state-transition matrix, wherein, si For hiCorresponding output, siFor 1*K vector, siIn each vector value represent xiRelative to the confidence score of different labeled, Transition probability of the state-transition matrix between each mark;In prediction, maximum probability routing problem is converted into, utilizes Viterbi Algorithm is to output sequence (s1,s2,...,sn) and state-transition matrix handled, obtain annotated sequence (y1,y2,...,yn), Further parsing according to demand afterwards, obtains finally naming Entity recognition result, wherein, yiWith xiIt is corresponding.
As seen from the above-described embodiment, the embodiment can be converted to each character in list entries to be identified correspondingly Vector, handled using neural network algorithm corresponding to each character it is vectorial, to extract the text of list entries to be identified spy Sequence is levied, last use condition random field handles text feature sequence, obtains name entity corresponding to list entries to be identified Recognition result.Because character can characterize the quantity that more fine-grained feature and character quantity be much smaller than word, neutral net is calculated Method it is contemplated that in list entries each character contextual information, and condition random field can avoid marking bias problem, Therefore, the embodiment of the present application passes through the side that is combined character vector, neural network algorithm and condition random field this three Formula, to realize name Entity recognition, preferable recognition effect can be reached.
Fig. 5 is the structural representation of the name entity recognition device of one embodiment of the application.Fig. 5 is refer to, one In kind Software Implementation, entity recognition device 500 is named, can be included:
Acquiring unit 501, for obtaining list entries;
First processing units 502, for carrying out vectorization processing to the character in the list entries, obtain the input Character vector sequence corresponding to sequence;
Second processing unit 503, for handling the character vector sequence using neural network algorithm, obtain the input The text feature sequence of sequence;
3rd processing unit 504, for the use condition random field processing text feature sequence, obtain the input sequence Name Entity recognition result corresponding to row.
As seen from the above-described embodiment, the embodiment can be converted to each character in list entries to be identified correspondingly Vector, handled using neural network algorithm corresponding to each character it is vectorial, to extract the text of list entries to be identified spy Sequence is levied, last use condition random field handles text feature sequence, obtains name entity corresponding to list entries to be identified Recognition result.Because character can characterize the quantity that more fine-grained feature and character quantity be much smaller than word, neutral net is calculated Method it is contemplated that in list entries each character contextual information, and condition random field can avoid marking bias problem, Therefore, the embodiment of the present application passes through the side that is combined character vector, neural network algorithm and condition random field this three Formula, to realize name Entity recognition, preferable recognition effect can be reached.
Alternatively, as one embodiment, the second processing unit 503, can include:
Character vector series processing subelement, for handling the character vector using two-way shot and long term Memory Neural Networks Sequence, obtain the text feature sequence of the list entries.
Alternatively, as one embodiment, the first processing units 502, can include:
Map dictionary and obtain subelement, for obtaining character-DUAL PROBLEMS OF VECTOR MAPPING dictionary, wherein, the character-DUAL PROBLEMS OF VECTOR MAPPING word Record has the corresponding relation of character and vector in allusion quotation;
Subelement is searched, it is right for searching the institute of the character in the list entries from the character-DUAL PROBLEMS OF VECTOR MAPPING dictionary The vector answered;
Notice mechanism handles subelement, for being handled using notice mechanism corresponding to the character in the list entries Vector, obtain it is each vector corresponding to weighted value;
Character vector sequence obtain subelement, for by corresponding to the character in the list entries vector with the vector Corresponding weighted value carries out point multiplication operation, obtains character vector sequence corresponding to the list entries.
Alternatively, as one embodiment, the name entity recognition device 500, can also include:Map dictionary creation Unit;
The mapping dictionary creation unit, can include:
Training corpus obtains subelement, for obtaining training corpus;
Character splits subelement, in units of character, being split to the training corpus, obtaining split result;
Subelement is pre-processed, for carrying out at least one of following pretreatments to the split result:Filtering spam character, mistake Filter disables character, filtering low character and the meaningless symbol of filtering, obtains pre-processed results;
Dictionary training subelement is mapped, for using pre-processed results described in word2vec Algorithm for Training, obtaining obtaining word Symbol-DUAL PROBLEMS OF VECTOR MAPPING dictionary.
Alternatively, as one embodiment, the mapping dictionary training subelement, it is specifically used for:
Using pre-processed results described in skip-gram model trainings, obtain obtaining character-DUAL PROBLEMS OF VECTOR MAPPING dictionary.
Name entity recognition device 500 can also carry out the method for embodiment illustrated in fig. 3, and realize name entity recognition device In the function of embodiment illustrated in fig. 5, the embodiment of the present application will not be repeated here.
Fig. 6 is the structural representation of the electronic equipment of one embodiment of the application.Fig. 6 is refer to, should in hardware view Electronic equipment includes processor, alternatively also includes internal bus, network interface, memory.Wherein, memory may include interior Deposit, such as high-speed random access memory (Random-Access Memory, RAM), it is also possible to also including non-volatile memories Device (non-volatile memory), for example, at least 1 magnetic disk storage etc..Certainly, the electronic equipment is also possible that other Hardware required for business.
Processor, network interface and memory can be connected with each other by internal bus, and the internal bus can be ISA (Industry Standard Architecture, industry standard architecture) bus, PCI (Peripheral Component Interconnect, Peripheral Component Interconnect standard) bus or EISA (Extended Industry Standard Architecture, EISA) bus etc..The bus can be divided into address bus, data/address bus, control always Line etc..For ease of representing, only represented in Fig. 6 with a four-headed arrow, it is not intended that an only bus or a type of Bus.
Memory, for depositing program.Specifically, program can include program code, and described program code includes calculating Machine operational order.Memory can include internal memory and nonvolatile memory, and provide instruction and data to processor.
Processor read from nonvolatile memory corresponding to computer program into internal memory then run, in logical layer Name entity recognition device is formed on face.Processor, the program that memory is deposited is performed, and specifically for performing following grasp Make:
Obtain list entries;
Vectorization processing is carried out to the character in the list entries, obtains character vector sequence corresponding to the list entries Row;
The character vector sequence is handled using neural network algorithm, obtains the text feature sequence of the list entries;
The use condition random field processing text feature sequence, obtain name Entity recognition corresponding to the list entries As a result.
The method that name entity recognition device disclosed in the above-mentioned embodiment illustrated in fig. 6 such as the application performs can apply to locate Manage in device, or realized by processor.Processor is probably a kind of IC chip, has the disposal ability of signal.In reality During existing, each step of the above method can pass through the integrated logic circuit of the hardware in processor or the finger of software form Order is completed.Above-mentioned processor can be general processor, including central processing unit (Central Processing Unit, CPU), network processing unit (Network Processor, NP) etc.;It can also be digital signal processor (Digital Signal Processor, DSP), it is application specific integrated circuit (Application Specific Integrated Circuit, ASIC), existing Field programmable gate array (Field-Programmable Gate Array, FPGA) or other PLDs, divide Vertical door or transistor logic, discrete hardware components.It can realize or perform and be in the embodiment of the present application disclosed each Method, step and logic diagram.General processor can be microprocessor or the processor can also be any conventional place Manage device etc..The step of method with reference to disclosed in the embodiment of the present application, can be embodied directly in hardware decoding processor and perform Into, or combined with the hardware in decoding processor and software module and perform completion.Software module can be located at random access memory, This area such as flash memory, read-only storage, programmable read only memory or electrically erasable programmable memory, register maturation In storage medium.The storage medium is located at memory, and processor reads the information in memory, and above-mentioned side is completed with reference to its hardware The step of method.
The electronic equipment can also carry out Fig. 3 method, and realize work(of the name entity recognition device in embodiment illustrated in fig. 3 Can, the embodiment of the present application will not be repeated here.
Certainly, in addition to software realization mode, the electronic equipment of this specification is not precluded from other implementations, such as Mode of logical device or software and hardware combining etc., that is to say, that the executive agent of following handling process is not limited to each Logic unit or hardware or logical device.
The embodiment of the present application additionally provides a kind of computer-readable recording medium, the computer-readable recording medium storage one Individual or multiple programs, one or more programs include instruction, and the instruction is when the portable electronic for being included multiple application programs When equipment performs, the method for portable electric appts execution embodiment illustrated in fig. 3 can be made, and specifically for performing with lower section Method:
Obtain list entries;
Vectorization processing is carried out to the character in the list entries, obtains character vector sequence corresponding to the list entries Row;
The character vector sequence is handled using neural network algorithm, obtains the text feature sequence of the list entries;
The use condition random field processing text feature sequence, obtain name Entity recognition corresponding to the list entries As a result.
In a word, the preferred embodiment of this specification is the foregoing is only, is not intended to limit the protection of this specification Scope.All spirit in this specification any modification, equivalent substitution and improvements made etc., should be included in this with principle Within the protection domain of specification.
System, device, module or the unit that above-described embodiment illustrates, it can specifically be realized by computer chip or entity, Or realized by the product with certain function.One kind typically realizes that equipment is computer.Specifically, computer for example may be used Think personal computer, laptop computer, cell phone, camera phone, smart phone, personal digital assistant, media play It is any in device, navigation equipment, electronic mail equipment, game console, tablet PC, wearable device or these equipment The combination of equipment.
Computer-readable medium includes permanent and non-permanent, removable and non-removable media can be by any method Or technology come realize information store.Information can be computer-readable instruction, data structure, the module of program or other data. The example of the storage medium of computer includes, but are not limited to phase transition internal memory (PRAM), static RAM (SRAM), moved State random access memory (DRAM), other kinds of random access memory (RAM), read-only storage (ROM), electric erasable Programmable read only memory (EEPROM), fast flash memory bank or other memory techniques, read-only optical disc read-only storage (CD-ROM), Digital versatile disc (DVD) or other optical storages, magnetic cassette tape, the storage of tape magnetic rigid disk or other magnetic storage apparatus Or any other non-transmission medium, the information that can be accessed by a computing device available for storage.Define, calculate according to herein Machine computer-readable recording medium does not include temporary computer readable media (transitory media), such as data-signal and carrier wave of modulation.
It should also be noted that, term " comprising ", "comprising" or its any other variant are intended to nonexcludability Comprising so that process, method, commodity or equipment including a series of elements not only include those key elements, but also wrapping Include the other element being not expressly set out, or also include for this process, method, commodity or equipment intrinsic want Element.In the absence of more restrictions, the key element limited by sentence "including a ...", it is not excluded that wanted including described Other identical element also be present in the process of element, method, commodity or equipment.
Each embodiment in this specification is described by the way of progressive, identical similar portion between each embodiment Divide mutually referring to what each embodiment stressed is the difference with other embodiment.It is real especially for system For applying example, because it is substantially similar to embodiment of the method, so description is fairly simple, related part is referring to embodiment of the method Part explanation.

Claims (10)

1. one kind name entity recognition method, it is characterised in that methods described includes:
Obtain list entries;
Vectorization processing is carried out to the character in the list entries, obtains character vector sequence corresponding to the list entries;
The character vector sequence is handled using neural network algorithm, obtains the text feature sequence of the list entries;
The use condition random field processing text feature sequence, obtain name Entity recognition knot corresponding to the list entries Fruit.
2. according to the method for claim 1, it is characterised in that described to handle the character vector using neural network algorithm Sequence, the text feature sequence of the list entries is obtained, including:
The character vector sequence is handled using two-way shot and long term Memory Neural Networks, obtains the text feature of the list entries Sequence.
3. according to the method for claim 1, it is characterised in that the character in the list entries carries out vectorization Processing, obtains character vector sequence corresponding to the list entries, including:
Character-DUAL PROBLEMS OF VECTOR MAPPING dictionary is obtained, wherein, record has character corresponding with vector in the character-DUAL PROBLEMS OF VECTOR MAPPING dictionary Relation;
The vector corresponding to the character in the list entries is searched from the character-DUAL PROBLEMS OF VECTOR MAPPING dictionary;
The vector corresponding to the character in the list entries is handled using notice mechanism, obtains weight corresponding to each vector Value;
Vector weighted value corresponding with the vector corresponding to character in the list entries is subjected to point multiplication operation, obtains institute State character vector sequence corresponding to list entries.
4. according to the method for claim 3, it is characterised in that the generating process of the character-DUAL PROBLEMS OF VECTOR MAPPING dictionary, bag Include:
Obtain training corpus;
In units of character, the training corpus is split, obtains split result;
At least one of following pretreatments are carried out to the split result:Filtering spam character, filtering disable character, filtering low word Meaningless symbol is accorded with and filtered, obtains pre-processed results;
Using pre-processed results described in word2vec Algorithm for Training, obtain obtaining character-DUAL PROBLEMS OF VECTOR MAPPING dictionary.
5. according to the method for claim 4, it is characterised in that described to use pretreatment knot described in word2vec Algorithm for Training Fruit, obtain obtaining character-DUAL PROBLEMS OF VECTOR MAPPING dictionary, including:
Using pre-processed results described in skip-gram model trainings, obtain obtaining character-DUAL PROBLEMS OF VECTOR MAPPING dictionary.
6. one kind name entity recognition device, it is characterised in that described device includes:
Acquiring unit, for obtaining list entries;
First processing units, for carrying out vectorization processing to the character in the list entries, obtain the list entries pair The character vector sequence answered;
Second processing unit, for handling the character vector sequence using neural network algorithm, obtain the list entries Text feature sequence;
3rd processing unit, for the use condition random field processing text feature sequence, it is corresponding to obtain the list entries Name Entity recognition result.
7. device according to claim 6, it is characterised in that the second processing unit, including:
Character vector series processing subelement, for handling the character vector sequence using two-way shot and long term Memory Neural Networks Row, obtain the text feature sequence of the list entries.
8. device according to claim 6, it is characterised in that the first processing units, including:
Map dictionary and obtain subelement, for obtaining character-DUAL PROBLEMS OF VECTOR MAPPING dictionary, wherein, in the character-DUAL PROBLEMS OF VECTOR MAPPING dictionary Record has the corresponding relation of character and vector;
Subelement is searched, for being searched from the character-DUAL PROBLEMS OF VECTOR MAPPING dictionary corresponding to the character in the list entries Vector;
Notice mechanism handles subelement, for handled using notice mechanism corresponding to the character in the list entries to Amount, obtain weighted value corresponding to each vector;
Character vector sequence obtains subelement, for the vector corresponding to the character in the list entries is corresponding with the vector Weighted value carry out point multiplication operation, obtain character vector sequence corresponding to the list entries.
9. a kind of electronic equipment, it is characterised in that including:
Processor;And
It is arranged to store the memory of computer executable instructions, the executable instruction makes the processor when executed The step of performing any one of claim 1-5 methods described.
A kind of 10. computer-readable storage medium, it is characterised in that the computer-readable recording medium storage one or more journey Sequence, one or more of programs are when the electronic equipment for being included multiple application programs performs so that the electronic equipment is held The step of any one of row claim 1-5 methods described.
CN201711102742.5A 2017-11-10 2017-11-10 Name entity recognition method and device Pending CN107797992A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711102742.5A CN107797992A (en) 2017-11-10 2017-11-10 Name entity recognition method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711102742.5A CN107797992A (en) 2017-11-10 2017-11-10 Name entity recognition method and device

Publications (1)

Publication Number Publication Date
CN107797992A true CN107797992A (en) 2018-03-13

Family

ID=61534832

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711102742.5A Pending CN107797992A (en) 2017-11-10 2017-11-10 Name entity recognition method and device

Country Status (1)

Country Link
CN (1) CN107797992A (en)

Cited By (37)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108536679A (en) * 2018-04-13 2018-09-14 腾讯科技(成都)有限公司 Name entity recognition method, device, equipment and computer readable storage medium
CN108628823A (en) * 2018-03-14 2018-10-09 中山大学 In conjunction with the name entity recognition method of attention mechanism and multitask coordinated training
CN108874997A (en) * 2018-06-13 2018-11-23 广东外语外贸大学 A kind of name name entity recognition method towards film comment
CN108920445A (en) * 2018-04-23 2018-11-30 华中科技大学鄂州工业技术研究院 A kind of name entity recognition method and device based on Bi-LSTM-CRF model
CN109002436A (en) * 2018-07-12 2018-12-14 上海金仕达卫宁软件科技有限公司 Medical text terms automatic identifying method and system based on shot and long term memory network
CN109241330A (en) * 2018-08-20 2019-01-18 北京百度网讯科技有限公司 The method, apparatus, equipment and medium of key phrase in audio for identification
CN109241275A (en) * 2018-07-05 2019-01-18 广东工业大学 A kind of text subject clustering algorithm based on natural language processing
CN109389091A (en) * 2018-10-22 2019-02-26 重庆邮电大学 The character identification system and method combined based on neural network and attention mechanism
CN109446514A (en) * 2018-09-18 2019-03-08 平安科技(深圳)有限公司 Construction method, device and the computer equipment of news property identification model
CN109614614A (en) * 2018-12-03 2019-04-12 焦点科技股份有限公司 A kind of BILSTM-CRF name of product recognition methods based on from attention
CN109657239A (en) * 2018-12-12 2019-04-19 电子科技大学 The Chinese name entity recognition method learnt based on attention mechanism and language model
CN109858041A (en) * 2019-03-07 2019-06-07 北京百分点信息科技有限公司 A kind of name entity recognition method of semi-supervised learning combination Custom Dictionaries
CN109871535A (en) * 2019-01-16 2019-06-11 四川大学 A kind of French name entity recognition method based on deep neural network
CN109871538A (en) * 2019-02-18 2019-06-11 华南理工大学 A kind of Chinese electronic health record name entity recognition method
CN109918680A (en) * 2019-03-28 2019-06-21 腾讯科技(上海)有限公司 Entity recognition method, device and computer equipment
CN110321547A (en) * 2018-03-30 2019-10-11 北京四维图新科技股份有限公司 A kind of name entity determines method and device
CN110348016A (en) * 2019-07-15 2019-10-18 昆明理工大学 Text snippet generation method based on sentence association attention mechanism
CN110348021A (en) * 2019-07-17 2019-10-18 湖北亿咖通科技有限公司 Character string identification method, electronic equipment, storage medium based on name physical model
CN110362597A (en) * 2019-06-28 2019-10-22 华为技术有限公司 A kind of structured query language SQL injection detection method and device
CN110516228A (en) * 2019-07-04 2019-11-29 湖南星汉数智科技有限公司 Name entity recognition method, device, computer installation and computer readable storage medium
CN110543638A (en) * 2019-09-10 2019-12-06 杭州橙鹰数据技术有限公司 Named entity identification method and device
WO2020048292A1 (en) * 2018-09-04 2020-03-12 腾讯科技(深圳)有限公司 Method and apparatus for generating network representation of neural network, storage medium, and device
CN111079437A (en) * 2019-12-20 2020-04-28 深圳前海达闼云端智能科技有限公司 Entity identification method, electronic equipment and storage medium
CN111222335A (en) * 2019-11-27 2020-06-02 上海眼控科技股份有限公司 Corpus correction method and device, computer equipment and computer-readable storage medium
CN111222334A (en) * 2019-11-15 2020-06-02 广州洪荒智能科技有限公司 Named entity identification method, device, equipment and medium
CN111291566A (en) * 2020-01-21 2020-06-16 北京明略软件系统有限公司 Event subject identification method and device and storage medium
CN111782768A (en) * 2020-06-30 2020-10-16 首都师范大学 Fine-grained entity identification method based on hyperbolic space representation and label text interaction
CN111885000A (en) * 2020-06-22 2020-11-03 网宿科技股份有限公司 Network attack detection method, system and device based on graph neural network
CN112115258A (en) * 2019-06-20 2020-12-22 腾讯科技(深圳)有限公司 User credit evaluation method, device, server and storage medium
CN112154509A (en) * 2018-04-19 2020-12-29 皇家飞利浦有限公司 Machine learning model with evolving domain-specific dictionary features for text annotation
CN112215005A (en) * 2020-10-12 2021-01-12 小红书科技有限公司 Entity identification method and device
WO2021146831A1 (en) * 2020-01-20 2021-07-29 京东方科技集团股份有限公司 Entity recognition method and apparatus, dictionary creation method, device, and medium
CN113221884A (en) * 2021-05-13 2021-08-06 中国科学技术大学 Text recognition method and system based on low-frequency word storage memory
CN113221885A (en) * 2021-05-13 2021-08-06 中国科学技术大学 Hierarchical modeling method and system based on whole words and radicals
CN113362540A (en) * 2021-06-11 2021-09-07 江苏苏云信息科技有限公司 Traffic ticket business processing device, system and method based on multimode interaction
CN113570480A (en) * 2021-07-19 2021-10-29 北京华宇元典信息服务有限公司 Judging document address information identification method and device and electronic equipment
CN117034942A (en) * 2023-10-07 2023-11-10 之江实验室 Named entity recognition method, device, equipment and readable storage medium

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106569998A (en) * 2016-10-27 2017-04-19 浙江大学 Text named entity recognition method based on Bi-LSTM, CNN and CRF

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106569998A (en) * 2016-10-27 2017-04-19 浙江大学 Text named entity recognition method based on Bi-LSTM, CNN and CRF

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
AKASH BHARADWAJ等: ""Phonologically Aware Neural Model for Named Entity Recognition in Low Resource Transfer Settings"", 《PROCEEDINGS OF THE 2016 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING》 *
CHUANHAI DONG等: ""Character-Based LSTM-CRF with Radical-Level Features for Chinese Named Entity Recognition"", 《NATURAL LANGUAGE UNDERSTANDING AND INTELLIGENT APPLICATIONS(NLPCC 2016)》 *
ROBERT_AI: "" 神经网络结构在命名实体识别(NER)中的应用"", 《HTTPS://WWW.CNBLOGS.COM/ROBERT-DLUT/P/6847401.HTML》 *
机器之心: ""如何用深度学习做自然语言处理?这里有份最佳实践清单"", 《HTTPS://WWW.JIQIZHIXIN.COM/ARTICLES/2017-07-26-5》 *

Cited By (57)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108628823A (en) * 2018-03-14 2018-10-09 中山大学 In conjunction with the name entity recognition method of attention mechanism and multitask coordinated training
CN108628823B (en) * 2018-03-14 2022-07-01 中山大学 Named entity recognition method combining attention mechanism and multi-task collaborative training
CN110321547B (en) * 2018-03-30 2024-06-11 北京四维图新科技股份有限公司 Named entity determination method and device
CN110321547A (en) * 2018-03-30 2019-10-11 北京四维图新科技股份有限公司 A kind of name entity determines method and device
CN108536679B (en) * 2018-04-13 2022-05-20 腾讯科技(成都)有限公司 Named entity recognition method, device, equipment and computer readable storage medium
CN108536679A (en) * 2018-04-13 2018-09-14 腾讯科技(成都)有限公司 Name entity recognition method, device, equipment and computer readable storage medium
CN112154509A (en) * 2018-04-19 2020-12-29 皇家飞利浦有限公司 Machine learning model with evolving domain-specific dictionary features for text annotation
CN108920445A (en) * 2018-04-23 2018-11-30 华中科技大学鄂州工业技术研究院 A kind of name entity recognition method and device based on Bi-LSTM-CRF model
CN108920445B (en) * 2018-04-23 2022-06-17 华中科技大学鄂州工业技术研究院 Named entity identification method and device based on Bi-LSTM-CRF model
CN108874997A (en) * 2018-06-13 2018-11-23 广东外语外贸大学 A kind of name name entity recognition method towards film comment
CN109241275A (en) * 2018-07-05 2019-01-18 广东工业大学 A kind of text subject clustering algorithm based on natural language processing
CN109241275B (en) * 2018-07-05 2022-02-11 广东工业大学 Text topic clustering algorithm based on natural language processing
CN109002436A (en) * 2018-07-12 2018-12-14 上海金仕达卫宁软件科技有限公司 Medical text terms automatic identifying method and system based on shot and long term memory network
US11308937B2 (en) * 2018-08-20 2022-04-19 Beijing Baidu Netcom Science And Technology Co., Ltd. Method and apparatus for identifying key phrase in audio, device and medium
CN109241330A (en) * 2018-08-20 2019-01-18 北京百度网讯科技有限公司 The method, apparatus, equipment and medium of key phrase in audio for identification
US11875220B2 (en) 2018-09-04 2024-01-16 Tencent Technology (Shenzhen) Company Limited Method, apparatus, and storage medium for generating network representation for neural network
WO2020048292A1 (en) * 2018-09-04 2020-03-12 腾讯科技(深圳)有限公司 Method and apparatus for generating network representation of neural network, storage medium, and device
CN109446514A (en) * 2018-09-18 2019-03-08 平安科技(深圳)有限公司 Construction method, device and the computer equipment of news property identification model
CN109389091A (en) * 2018-10-22 2019-02-26 重庆邮电大学 The character identification system and method combined based on neural network and attention mechanism
CN109389091B (en) * 2018-10-22 2022-05-03 重庆邮电大学 Character recognition system and method based on combination of neural network and attention mechanism
CN109614614A (en) * 2018-12-03 2019-04-12 焦点科技股份有限公司 A kind of BILSTM-CRF name of product recognition methods based on from attention
CN109614614B (en) * 2018-12-03 2021-04-02 焦点科技股份有限公司 BILSTM-CRF product name identification method based on self-attention
CN109657239A (en) * 2018-12-12 2019-04-19 电子科技大学 The Chinese name entity recognition method learnt based on attention mechanism and language model
CN109657239B (en) * 2018-12-12 2020-04-21 电子科技大学 Chinese named entity recognition method based on attention mechanism and language model learning
CN109871535B (en) * 2019-01-16 2020-01-10 四川大学 French named entity recognition method based on deep neural network
CN109871535A (en) * 2019-01-16 2019-06-11 四川大学 A kind of French name entity recognition method based on deep neural network
CN109871538A (en) * 2019-02-18 2019-06-11 华南理工大学 A kind of Chinese electronic health record name entity recognition method
CN109858041B (en) * 2019-03-07 2023-02-17 北京百分点科技集团股份有限公司 Named entity recognition method combining semi-supervised learning with user-defined dictionary
CN109858041A (en) * 2019-03-07 2019-06-07 北京百分点信息科技有限公司 A kind of name entity recognition method of semi-supervised learning combination Custom Dictionaries
CN109918680A (en) * 2019-03-28 2019-06-21 腾讯科技(上海)有限公司 Entity recognition method, device and computer equipment
CN112115258A (en) * 2019-06-20 2020-12-22 腾讯科技(深圳)有限公司 User credit evaluation method, device, server and storage medium
CN112115258B (en) * 2019-06-20 2023-09-26 腾讯科技(深圳)有限公司 Credit evaluation method and device for user, server and storage medium
CN110362597A (en) * 2019-06-28 2019-10-22 华为技术有限公司 A kind of structured query language SQL injection detection method and device
CN110516228A (en) * 2019-07-04 2019-11-29 湖南星汉数智科技有限公司 Name entity recognition method, device, computer installation and computer readable storage medium
CN110348016A (en) * 2019-07-15 2019-10-18 昆明理工大学 Text snippet generation method based on sentence association attention mechanism
CN110348016B (en) * 2019-07-15 2022-06-14 昆明理工大学 Text abstract generation method based on sentence correlation attention mechanism
CN110348021A (en) * 2019-07-17 2019-10-18 湖北亿咖通科技有限公司 Character string identification method, electronic equipment, storage medium based on name physical model
CN110543638B (en) * 2019-09-10 2022-12-27 杭州橙鹰数据技术有限公司 Named entity identification method and device
CN110543638A (en) * 2019-09-10 2019-12-06 杭州橙鹰数据技术有限公司 Named entity identification method and device
CN111222334A (en) * 2019-11-15 2020-06-02 广州洪荒智能科技有限公司 Named entity identification method, device, equipment and medium
CN111222335A (en) * 2019-11-27 2020-06-02 上海眼控科技股份有限公司 Corpus correction method and device, computer equipment and computer-readable storage medium
CN111079437A (en) * 2019-12-20 2020-04-28 深圳前海达闼云端智能科技有限公司 Entity identification method, electronic equipment and storage medium
WO2021146831A1 (en) * 2020-01-20 2021-07-29 京东方科技集团股份有限公司 Entity recognition method and apparatus, dictionary creation method, device, and medium
CN111291566A (en) * 2020-01-21 2020-06-16 北京明略软件系统有限公司 Event subject identification method and device and storage medium
CN111291566B (en) * 2020-01-21 2023-04-28 北京明略软件系统有限公司 Event main body recognition method, device and storage medium
CN111885000A (en) * 2020-06-22 2020-11-03 网宿科技股份有限公司 Network attack detection method, system and device based on graph neural network
CN111782768A (en) * 2020-06-30 2020-10-16 首都师范大学 Fine-grained entity identification method based on hyperbolic space representation and label text interaction
WO2022001333A1 (en) * 2020-06-30 2022-01-06 首都师范大学 Hyperbolic space representation and label text interaction-based fine-grained entity recognition method
CN112215005A (en) * 2020-10-12 2021-01-12 小红书科技有限公司 Entity identification method and device
CN113221885A (en) * 2021-05-13 2021-08-06 中国科学技术大学 Hierarchical modeling method and system based on whole words and radicals
CN113221884A (en) * 2021-05-13 2021-08-06 中国科学技术大学 Text recognition method and system based on low-frequency word storage memory
CN113221885B (en) * 2021-05-13 2022-09-06 中国科学技术大学 Hierarchical modeling method and system based on whole words and radicals
CN113221884B (en) * 2021-05-13 2022-09-06 中国科学技术大学 Text recognition method and system based on low-frequency word storage memory
CN113362540A (en) * 2021-06-11 2021-09-07 江苏苏云信息科技有限公司 Traffic ticket business processing device, system and method based on multimode interaction
CN113570480A (en) * 2021-07-19 2021-10-29 北京华宇元典信息服务有限公司 Judging document address information identification method and device and electronic equipment
CN117034942A (en) * 2023-10-07 2023-11-10 之江实验室 Named entity recognition method, device, equipment and readable storage medium
CN117034942B (en) * 2023-10-07 2024-01-09 之江实验室 Named entity recognition method, device, equipment and readable storage medium

Similar Documents

Publication Publication Date Title
CN107797992A (en) Name entity recognition method and device
CN107679039B (en) Method and device for determining statement intention
CN108629687B (en) Anti-money laundering method, device and equipment
CN110276023B (en) POI transition event discovery method, device, computing equipment and medium
CN110619044B (en) Emotion analysis method, system, storage medium and equipment
Wang et al. Dialogue intent classification with character-CNN-BGRU networks
Li et al. A method of emotional analysis of movie based on convolution neural network and bi-directional LSTM RNN
Alexandridis et al. A knowledge-based deep learning architecture for aspect-based sentiment analysis
CN109271624A (en) A kind of target word determines method, apparatus and storage medium
CN108694183A (en) A kind of search method and device
CN112132238A (en) Method, device, equipment and readable medium for identifying private data
CN113947086A (en) Sample data generation method, training method, corpus generation method and apparatus
CN116975271A (en) Text relevance determining method, device, computer equipment and storage medium
Park et al. Sensitive data identification in structured data through genner model based on text generation and ner
US20200110834A1 (en) Dynamic Linguistic Assessment and Measurement
CN114328841A (en) Question-answer model training method and device, question-answer method and device
CN116702784B (en) Entity linking method, entity linking device, computer equipment and storage medium
CN103514194B (en) Determine method and apparatus and the classifier training method of the dependency of language material and entity
CN116955579A (en) Chat reply generation method and device based on keyword knowledge retrieval
CN116719915A (en) Intelligent question-answering method, device, equipment and storage medium
CN117216617A (en) Text classification model training method, device, computer equipment and storage medium
Arbaatun et al. Hate speech detection on Twitter through Natural Language Processing using LSTM model
Zhu et al. A named entity recognition model based on ensemble learning
Porjazovski et al. Attention-based end-to-end named entity recognition from speech
CN115129885A (en) Entity chain pointing method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information

Address after: 100081 No.101, 1st floor, building 14, 27 Jiancai Chengzhong Road, Haidian District, Beijing

Applicant after: Beijing PERCENT Technology Group Co.,Ltd.

Address before: 100081 16 / F, block a, Beichen Century Center, building 2, courtyard 8, Beichen West Road, Chaoyang District, Beijing

Applicant before: BEIJING BAIFENDIAN INFORMATION SCIENCE & TECHNOLOGY Co.,Ltd.

CB02 Change of applicant information
RJ01 Rejection of invention patent application after publication

Application publication date: 20180313

RJ01 Rejection of invention patent application after publication